[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/x86: Change stub page freeing to fix smt=0


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 3 Jun 2026 16:27:48 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>, Jason Andryuk <jason.andryuk@xxxxxxx>
  • Delivery-date: Wed, 03 Jun 2026 14:27:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.06.2026 16:16, Roger Pau Monné wrote:
> On Wed, Jun 03, 2026 at 10:03:53AM -0400, Jason Andryuk wrote:
>> On 2026-06-02 03:01, Roger Pau Monné wrote:
>>> On Mon, Jun 01, 2026 at 05:07:52PM -0400, Jason Andryuk wrote:
>>>> On 2026-06-01 13:00, Roger Pau Monné wrote:
>>>>> On Tue, May 26, 2026 at 04:31:14PM -0400, Jason Andryuk wrote:
>>>>>> A single stubs page is initialized with 0xcc and re-used, with multiple
>>>>>> CPUs each using a portion of the shared page.  In cpu_smpboot_free(),
>>>>>> each stubs area is checked against 0xcc.  When all are set to 0xcc, the
>>>>>> page is freed.
>>>>>>
>>>>>> Booting a system with smt=0, CPU0 is initially setup, allocating the
>>>>>> stubs page and initializing to 0xcc.  When more CPUs are brought up,
>>>>>> CPU1 is initialized and then immediately brough offline as it is the
>>>>>> sibling of CPU0.  Since the page was initially memset with 0xcc,
>>>>>> cpu_smpboot_free() finds all stubs as 0xcc and frees the page.
>>>>>> However, the page is still assigned to CPU0 and continues to be assigned
>>>>>> to other CPUs.
>>>>>>
>>>>>> Meanwhile the page can be reallocated, which can lead to misbehavior.
>>>>>> The particular instance was the stubs page re-used as a page table which
>>>>>> later faulted when the entry was all 0xcc.
>>>>>>
>>>>>> Change to initializing the page as 0xd6/STUB_BUF_FREE, and initializing
>>>>>> individual stubs as 0xcc/STUB_BUF_USED.  0xd6 now indicates unused, and
>>>>>> 0xcc indicates used/assigned.  When freeing a CPU, the stub is set to
>>>>>> 0xd6, and the page is freed if all stubs are 0xd6.  Initializing with
>>>>>> STUB_BUF_FREE lets cpu_smpboot_free() a page that was only ever
>>>>>> partially used.
>>>>>>
>>>>>> 0xd6/UDB is a 1 byte invalid opcode, which is similar to the existing
>>>>>> use of 0xcc.  0xd6 is used to identify bug frames, but the stub addr
>>>>>> (e.g. 0xffff82d07fffe000) fails the is_active_kernel_text() check.  It
>>>>>> should be okay to use here.
>>>>>>
>>>>>> Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
>>>>>> Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
>>>>>> ---
>>>>>> It would be nice to use get_page()/put_page() to let count_info handle
>>>>>> reference counting, but they require an owning domain.
>>>>>>
>>>>>> The listed Fixes introduced the use of 0xcc, but the smt commit may have
>>>>>> made it more problematic.
>>>>>> Fixes: d8f974f1a646 ("x86: command line option to avoid use of secondary 
>>>>>> hyper-threads")
>>>>>
>>>>> Speaking with Andrew, we believe it might be easier to simply forego
>>>>> the freeing of the page, possibly something like:
>>>>>
>>>>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>>>>> index ff05955bae40..62c6cbf4b561 100644
>>>>> --- a/xen/arch/x86/smpboot.c
>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>> @@ -990,19 +990,12 @@ static void cpu_smpboot_free(unsigned int cpu, bool 
>>>>> remove)
>>>>>        {
>>>>>            mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
>>>>>            unsigned char *stub_page = map_domain_page(mfn);
>>>>> -        unsigned int i;
>>>>>            memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc, 
>>>>> STUB_BUF_SIZE);
>>>>> -        for ( i = 0; i < STUBS_PER_PAGE; ++i )
>>>>> -            if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
>>>>> -                break;
>>>>>            unmap_domain_page(stub_page);
>>>>>            destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
>>>>>                                 (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 
>>>>> 1);
>>>>>            per_cpu(stubs.addr, cpu) = 0;
>>>>> -        per_cpu(stubs.mfn, cpu) = 0;
>>>>> -        if ( i == STUBS_PER_PAGE )
>>>>> -            free_domheap_page(mfn_to_page(mfn));
>>>>>        }
>>>>>        if ( IS_ENABLED(CONFIG_PV32) )
>>>
>>> I think I've made an oversight in the code above: if all 32 CPUs
>>> sharing the same stubs page are offlined, the reference to the stubs
>>> page is possibly lost (if CPUs are not parked) and a new stubs page
>>> would be allocated if any of those CPUs is brought back online, thus
>>> leaking the previous allocation.  The simplest way to solve this would
>>> be to introduce an array that indexes the stub pages, and replace the
>>> logic in cpu_smpboot_alloc() that figures out whether stubs.mfn is set
>>> for adjacent CPUs.
>>
>> Right, but I thought Andrew's point was that offlining 32 CPUs is
>> unrealistic, so don't even bother tracking.  If CPUs are offlined (and you
>> somehow keep running), you can leak the page.
> 
> I thin we should avoid freeing the page and ensure it's always reused,
> rather than possibly leaking it.  It's also possible there's a single
> trailing CPU using the last stubs page alone, and offlining and
> onlining it would trigger such a page leak, without requiring a block
> of 32 CPUs going offline.
> 
> Entering an ACPI sleep state causes all APs to be offlined (see the
> disable_nonboot_cpus() call in enter_state()), and it would be
> undesirable that putting a system to sleep causes page leaking.

Suspend is handled specially by some CPU notifier handlers, see in
particular common/percpu.c:cpu_percpu_callback(). That in particular
means that per-CPU data survives suspend/resume. Which may be possible
to leverage here to avoid a leak across S3 (and perhaps even across
the pretty similar parking), while still accepting a leak for "real"
CPU offlining. (Which isn't to say that avoiding leaks altogether
wouldn't be the most desirable goal.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.