[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: allow non-BIGMEM configs to boot on >= 16Tb systems


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 18 Dec 2023 15:59:08 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 18 Dec 2023 14:59:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.12.2023 12:13, Roger Pau Monné wrote:
> On Mon, Dec 18, 2023 at 09:26:24AM +0100, Jan Beulich wrote:
>> On 15.12.2023 15:54, Roger Pau Monné wrote:
>>> On Wed, Jun 07, 2023 at 08:17:30AM +0200, Jan Beulich wrote:
>>>> While frame table setup, directmap init, and boot allocator population
>>>> respect all intended bounds, the logic passing memory to the heap
>>>> allocator which wasn't passed to the boot allocator fails to respect
>>>> max_{pdx,pfn}. This then typically triggers the BUG() in
>>>> free_heap_pages() after checking page state, because of hitting a struct
>>>> page_info instance which was set to all ~0.
>>>>
>>>> Of course all the memory above the 16Tb boundary is still going to
>>>> remain unused; using it requires BIGMEM=y. And of course this fix
>>>> similarly ought to help BIGMEM=y configurations on >= 123Tb systems
>>>> (where all the memory beyond that boundary continues to be unused).
>>>>
>>>> Fixes: bac2000063ba ("x86-64: reduce range spanned by 1:1 mapping and 
>>>> frame table indexes")
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>
>>> Acked-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>
>> Thanks.
>>
>>>> --- a/xen/arch/x86/setup.c
>>>> +++ b/xen/arch/x86/setup.c
>>>> @@ -1722,15 +1722,16 @@ void __init noreturn __start_xen(unsigne
>>>>  
>>>>      if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
>>>>      {
>>>> -        unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
>>>> +        unsigned long lo = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
>>>> +        unsigned long hi = pdx_to_pfn(max_pdx - 1) + 1;
>>>
>>> Maybe use max_page to avoid the pdx_to_pfn() call?  (And is also more
>>> in context with the condition on the outside if).
>>
>> You mean
>>
>>         unsigned long hi = min(pdx_to_pfn(max_pdx - 1) + 1, max_page);
>>
>> ? I could switch to that, yes. I wouldn't feel well switching to using
>> just max_page, especially with me having nowhere to (reasonably) test.
> 
> Isn't max_page derived from max_pdx (see setup_max_pdx()), and
> hence we could avoid the pdx_to_pfn() conversion by just using it?
> 
> max_page = pdx_to_pfn(max_pdx - 1) + 1;
> 
> So hi == max_page in your proposed code.
> 
> Maybe there are further restrictions applied to max_pdx that are not
> propagated into max_page, the meaning of all those variables is very
> opaque, and hard to follow in the source code.

Looking more closely, the two appear to be properly in sync once
setup_max_pdx() was called the first time. I guess I was in part
mislead by

            e = (pdx_to_pfn(max_pdx - 1) + 1ULL) << PAGE_SHIFT;

just a few lines past an update to both variables. I'll switch to
max_page here, and I may also make a patch to tidy the line quoted
above.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.