[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PV32: restore PAE-extended-CR3 logic


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 4 Apr 2023 16:21:07 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bHyzEfPI/zvXf0RCfyg2mKP5kKbOBRpTIxetoWwctGc=; b=YexxhU6wCTE2CvMpprh6bSX4hQRhZQH7hpTRGKGUtf2Eh+l8J9npDodirU0B05IRp8tkqsT/8pjzvBVoaHlg+taD4o03hIGAwR13COwepumDdJcWbfPcYQ9VlNH/vVCzhWzkMw6ltvO+zC8IZ4UTIndCpTRPo3r+oPlcsarkm7wX4Q6k2fKMxsTgUAmZbT/FFD3VRU9svaM6kBaQ/wE8uSffBRe7ARV2j7wdrY81W0rvFOkYLPXTGDJmV9DzuNYrRtTpqgJs+qXlHY1/sHP+JqTJMJZ6JhpFeBWYlHbutG9fNFZx30Q5jhUxGn7zBghGGPstdF+pddGixRzkp/28Iw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZLumDTvjYmBBEicEU0W6WDIxirkjIhnXI5vpIqaPT7dMG7NMsPunJMrVgAveEZLRLdEJY9bPG9pSy8GJh+htYDYGiix5D9vpyuO2QHZyjP32AzpeGVX4Kd+FSZ3E6evqikndHZPJUdFZ6UDeGh18JUR/mSUEbLroOVAPImJBUHzZC+RKN/dnZkYuUGHk65tO51P5zEOjGXlg1nM9HvYeN2kPDnb0Igd4Idekbjzelj78c0+GanuGpGpRDaA2QwjnP5FGtZ1CPSLZVVVNHtHYTDqaGSqxyvOEbE2asV6j/mv3+KyMTd8y0DfCXdVJpf8vT+3R7pcwbcP1q4ScxzzQPQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 04 Apr 2023 14:21:35 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.04.2023 15:08, Andrew Cooper wrote:
> On 15/02/2023 2:54 pm, Jan Beulich wrote:
>> While the PAE-extended-CR3 VM assist is a 32-bit only concept, it still
>> applies to guests also when run on a 64-bit hypervisor:
> 
> Is this really true?  Even when looking at Xen 4.2, 32bit guests are
> required to pass a full 4k page, not a 32b quad.

The full-page vs 32b-quad aspect is orthogonal. This VM-assist is solely
about where that data structure is, not what size it is.

> Which makes complete sense.  It was a hard requirement of 32bit non-PAE
> guests, so it was a natural restriction to maintain into 32bit PAE guests.
> 
> This is *only* a 32-on-64 issue, because this is the only case a 32bit
> guest could in principle use an L3 placed above the 4G boundary.

Not exactly. 32-bit Xen maintained a 4-entry "shadow" array below 4G
that it would copy (massage) the guest entries into upon CR3 reload
(just look for struct pae_l3_cache in the old sources). So above-4G
page table base was possible there as well.

>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_
>>      unsigned int   partial_flags = page->partial_flags;
>>      l3_pgentry_t   l3e = l3e_empty();
>>  
>> +    /*
>> +     * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does not
>> +     * understand the weird 'extended cr3' format for dealing with 
>> high-order
>> +     * address bits. We cut some slack for control tools (before vcpu0 is
>> +     * initialised).
>> +     */
>> +    if ( is_pv_32bit_domain(d) &&
>> +         unlikely(!VM_ASSIST(d, pae_extended_cr3)) &&
>> +         mfn_x(l3mfn) >= 0x100000 &&
>> +         d->vcpu[0] && d->vcpu[0]->is_initialised )
>> +    {
>> +        gdprintk(XENLOG_WARNING,
>> +                 "PAE pgd must be below 4GB (%#lx >= 0x100000)",
>> +                 mfn_x(l3mfn));
>> +        return -ERANGE;
>> +    }
> 
> Having dug through source history, I see this is largely the form that
> it used to be.
> 
> But I'm unconvinced by the "cut control tools some slack".  I'm quite
> tired of different bits of Xen taking on unnecessary complexity because
> people are unwilling to fix the problem at the correct layer.

But anything tools do before having created the first vCPU would not
have had any means to engage the VM-assist. I.e. ...

> A toolstack which has non-pae_extended_cr3 guest on its hand will know
> this before any pagetables get allocated.

... this knowledge buys it nothing: It would need to move the table
to below 4G irrespective of knowing that the guest can deal with
bigger addresses, just to get past this check.

> For this check specifically, I'd suggest prohibiting non-32p guests from
> setting pae_extended_cr3 in the first place (I see no limit currently),
> and then simplifying the check to just
> 
> if ( unlikely(!VM_ASSIST(d, pae_extended_cr3)) &&
>      mfn_x(l3mfn) >= PFN_DOWN(GB(4)) )

Dropping the is_pv_32bit_domain() check isn't possible because we can't,
all of the sudden, fail 64-bit guests' requests to enable this VM-
assist (no matter that we know that it is of no use to them). Dropping
the control-tools part of the condition is at least problematic as well,
as per above. Albeit I'll admit I didn't check whether nowadays vCPU 0
is initialized before page tables are built. But I think it's more
sensible the other way around: CR3 setting (in the hypervisor) is less
involved when the page was already validated as an L3 one.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.