|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/x86: Change stub page freeing to fix smt=0
On 03.06.2026 16:16, Roger Pau Monné wrote:
> On Wed, Jun 03, 2026 at 10:03:53AM -0400, Jason Andryuk wrote:
>> On 2026-06-02 03:01, Roger Pau Monné wrote:
>>> On Mon, Jun 01, 2026 at 05:07:52PM -0400, Jason Andryuk wrote:
>>>> On 2026-06-01 13:00, Roger Pau Monné wrote:
>>>>> On Tue, May 26, 2026 at 04:31:14PM -0400, Jason Andryuk wrote:
>>>>>> A single stubs page is initialized with 0xcc and re-used, with multiple
>>>>>> CPUs each using a portion of the shared page. In cpu_smpboot_free(),
>>>>>> each stubs area is checked against 0xcc. When all are set to 0xcc, the
>>>>>> page is freed.
>>>>>>
>>>>>> Booting a system with smt=0, CPU0 is initially setup, allocating the
>>>>>> stubs page and initializing to 0xcc. When more CPUs are brought up,
>>>>>> CPU1 is initialized and then immediately brough offline as it is the
>>>>>> sibling of CPU0. Since the page was initially memset with 0xcc,
>>>>>> cpu_smpboot_free() finds all stubs as 0xcc and frees the page.
>>>>>> However, the page is still assigned to CPU0 and continues to be assigned
>>>>>> to other CPUs.
>>>>>>
>>>>>> Meanwhile the page can be reallocated, which can lead to misbehavior.
>>>>>> The particular instance was the stubs page re-used as a page table which
>>>>>> later faulted when the entry was all 0xcc.
>>>>>>
>>>>>> Change to initializing the page as 0xd6/STUB_BUF_FREE, and initializing
>>>>>> individual stubs as 0xcc/STUB_BUF_USED. 0xd6 now indicates unused, and
>>>>>> 0xcc indicates used/assigned. When freeing a CPU, the stub is set to
>>>>>> 0xd6, and the page is freed if all stubs are 0xd6. Initializing with
>>>>>> STUB_BUF_FREE lets cpu_smpboot_free() a page that was only ever
>>>>>> partially used.
>>>>>>
>>>>>> 0xd6/UDB is a 1 byte invalid opcode, which is similar to the existing
>>>>>> use of 0xcc. 0xd6 is used to identify bug frames, but the stub addr
>>>>>> (e.g. 0xffff82d07fffe000) fails the is_active_kernel_text() check. It
>>>>>> should be okay to use here.
>>>>>>
>>>>>> Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
>>>>>> Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
>>>>>> ---
>>>>>> It would be nice to use get_page()/put_page() to let count_info handle
>>>>>> reference counting, but they require an owning domain.
>>>>>>
>>>>>> The listed Fixes introduced the use of 0xcc, but the smt commit may have
>>>>>> made it more problematic.
>>>>>> Fixes: d8f974f1a646 ("x86: command line option to avoid use of secondary
>>>>>> hyper-threads")
>>>>>
>>>>> Speaking with Andrew, we believe it might be easier to simply forego
>>>>> the freeing of the page, possibly something like:
>>>>>
>>>>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>>>>> index ff05955bae40..62c6cbf4b561 100644
>>>>> --- a/xen/arch/x86/smpboot.c
>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>> @@ -990,19 +990,12 @@ static void cpu_smpboot_free(unsigned int cpu, bool
>>>>> remove)
>>>>> {
>>>>> mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
>>>>> unsigned char *stub_page = map_domain_page(mfn);
>>>>> - unsigned int i;
>>>>> memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc,
>>>>> STUB_BUF_SIZE);
>>>>> - for ( i = 0; i < STUBS_PER_PAGE; ++i )
>>>>> - if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
>>>>> - break;
>>>>> unmap_domain_page(stub_page);
>>>>> destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
>>>>> (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) +
>>>>> 1);
>>>>> per_cpu(stubs.addr, cpu) = 0;
>>>>> - per_cpu(stubs.mfn, cpu) = 0;
>>>>> - if ( i == STUBS_PER_PAGE )
>>>>> - free_domheap_page(mfn_to_page(mfn));
>>>>> }
>>>>> if ( IS_ENABLED(CONFIG_PV32) )
>>>
>>> I think I've made an oversight in the code above: if all 32 CPUs
>>> sharing the same stubs page are offlined, the reference to the stubs
>>> page is possibly lost (if CPUs are not parked) and a new stubs page
>>> would be allocated if any of those CPUs is brought back online, thus
>>> leaking the previous allocation. The simplest way to solve this would
>>> be to introduce an array that indexes the stub pages, and replace the
>>> logic in cpu_smpboot_alloc() that figures out whether stubs.mfn is set
>>> for adjacent CPUs.
>>
>> Right, but I thought Andrew's point was that offlining 32 CPUs is
>> unrealistic, so don't even bother tracking. If CPUs are offlined (and you
>> somehow keep running), you can leak the page.
>
> I thin we should avoid freeing the page and ensure it's always reused,
> rather than possibly leaking it. It's also possible there's a single
> trailing CPU using the last stubs page alone, and offlining and
> onlining it would trigger such a page leak, without requiring a block
> of 32 CPUs going offline.
>
> Entering an ACPI sleep state causes all APs to be offlined (see the
> disable_nonboot_cpus() call in enter_state()), and it would be
> undesirable that putting a system to sleep causes page leaking.
Suspend is handled specially by some CPU notifier handlers, see in
particular common/percpu.c:cpu_percpu_callback(). That in particular
means that per-CPU data survives suspend/resume. Which may be possible
to leverage here to avoid a leak across S3 (and perhaps even across
the pretty similar parking), while still accepting a leak for "real"
CPU offlining. (Which isn't to say that avoiding leaks altogether
wouldn't be the most desirable goal.)
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |