[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PV32: restore PAE-extended-CR3 logic


  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 4 Apr 2023 14:08:39 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mlemeQ7aWqQk8Ow5+xbGNCu6nFUcKoMWH3TsirNkFks=; b=dpyCRInhvZO+CBErJZN6GMaiDlpXgiT8qki7n85VCz7I6d6V9wREhAoHqtSQxuV+nVrUZZvfZcgcOQp3qls6tj84v84DmHMNmuoOlll7RFCjsiBqLunAz5EAT1yMgZmT31pWoz0Wxyfhe65LHymW+xrRE3AS9zQ+q7LxfSHIzMhJVN6fv9tFXlkX7gc8mgzC9UnQH61dymiiRUkUNE1qMN24fbNOcjxIZwqllAfTnBaJvL8mo53dz/riZ9GeVvCppLW5nnbLS0nCkQce1gJQHJZSswvffNppAyucBG/0hscQbi8W2Yu9pt3TrDMjFZoDZNRooRMiUBAu6/4/b0aSAg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CKVqkx0Baz9qaPqvPEZWjAy/JF4w4Qh/1Y/mzk7WParBecgAD8AVorJTCGQ9LSOlwvbH04ABdSCq2Z2Uyce2gv8NsdJUlYEBis1lnYD5psmaHVQMBWcXFcNCh5QJa5OLu6E9jLyBu1XbXlscoudF4WltyEnmlAioQD4P9PZMmvOpVCcBPeKCoaDWNowGnBH/d4PFmAN6nUVTR/SCAbOWWNjVmomyArJ9IR1hDt5YQn5RhZyX3CB/n1pWhU0lHrBLJVlUy6Om6XTHBTZJVXk76awSM8KQ8TBwClwXE9B6DgKFc3PFDhOBkf43ZnvLKfWFPOEKFZYrzUstevmo+4pYbA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 04 Apr 2023 13:08:57 +0000
  • Ironport-data: A9a23:ZwLyJasDrrBr06p4mPij/K9smOfnVHtfMUV32f8akzHdYApBsoF/q tZmKWmDOK6NMTPxfosib9izoEpS6MWHn4NnHgdr/ixjQygV+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVKiffHg3HVQ+IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4bKj6Fv0gnRkPaoQ5AOGyyFMZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwAg4mYTmku7qMmpmld7I2hZ0DD+y1M9ZK0p1g5Wmx4fcOZ7nmGvyPz/kImTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osgP60bou9lt+iHK25mm6xo G7c8nu/KRYdLNGFkhKO8262h/+JliT+MG4XPOTgqqc63ATJmQT/DjUdBFv8nfW4jXSlXuxGL 0BPynow95MboRnDot7VGkfQTGS/lhwWVsdUEuY6wBqQ0aeS6AGcbkAUQzgEZNE4ucseQT0xy kTPj97vHSZosrCeVTSa7Lj8hSy2ETgYKykFfyBscOcey9zqoYV2hBSfSN9mSfexloesR2C2x C2Wpi8jgblVldQMy6iw4VHAhXSru4TNSQk2oA7QWwpJ8z9EWWJsXKTwgXCz0BqKBN/xooWp1 JTcp/Wj0Q==
  • Ironport-hdrordr: A9a23:WhdKqqM6ZHK6M8BcTsWjsMiBIKoaSvp037BN7SxMoH1uHfBw8v rEoB1173HJYVoqOU3I++rwW5VoMEm9yXcd2+B4V9qftWLdyQmVxe9ZnO/f6gylNyri9vNMkY dMGpIOb+EY1GIK6PrH3A==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15/02/2023 2:54 pm, Jan Beulich wrote:
> While the PAE-extended-CR3 VM assist is a 32-bit only concept, it still
> applies to guests also when run on a 64-bit hypervisor:

Is this really true?  Even when looking at Xen 4.2, 32bit guests are
required to pass a full 4k page, not a 32b quad.

Which makes complete sense.  It was a hard requirement of 32bit non-PAE
guests, so it was a natural restriction to maintain into 32bit PAE guests.

This is *only* a 32-on-64 issue, because this is the only case a 32bit
guest could in principle use an L3 placed above the 4G boundary.

>  The "extended
> CR3" format has to be used there as well, to fit the address in the only
> 32-bit wide register there. As a result it was a mistake that the check
> was never enabled for that case, and was then mistakenly deleted in the
> course of removal of 32-bit-Xen code (218adf199e68 ["x86: We can assume
> CONFIG_PAGING_LEVELS==4"]).
>
> Similarly during Dom0 construction kernel awareness needs to be taken
> into account, and respective code was again mistakenly never enabled for
> 32-bit Dom0 when running on 64-bit Xen (and thus wrongly deleted by
> 5d1181a5ea5e ["xen: Remove x86_32 build target"]).
>
> At the same time restrict enabling of the assist for Dom0 to just the
> 32-bit case. Furthermore there's no need for an atomic update there.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> I was uncertain whether to add a check to the CR3 guest read path,
> raising e.g. #GP(0) when the value read wouldn't fit but also may not
> be converted to "extended" format (overflow is possible there in
> principle because of the control tools "slack" in promote_l3_table()).
>
> In that context I was puzzled to find no check on the CR3 guest write
> path even in 4.2: A guest (bogusly) setting the PCD or PWT bits (or any
> of the low reserved ones) could observe anomalous behavior rather than
> plain failure.
>
> As to a Fixes: tag - it's pretty unclear which of the many original
> 32-on-64 changes to blame. I don't think the two cited commits should
> be referenced there, as they didn't break anything that wasn't already
> broken.
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_
>      unsigned int   partial_flags = page->partial_flags;
>      l3_pgentry_t   l3e = l3e_empty();
>  
> +    /*
> +     * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does not
> +     * understand the weird 'extended cr3' format for dealing with high-order
> +     * address bits. We cut some slack for control tools (before vcpu0 is
> +     * initialised).
> +     */
> +    if ( is_pv_32bit_domain(d) &&
> +         unlikely(!VM_ASSIST(d, pae_extended_cr3)) &&
> +         mfn_x(l3mfn) >= 0x100000 &&
> +         d->vcpu[0] && d->vcpu[0]->is_initialised )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "PAE pgd must be below 4GB (%#lx >= 0x100000)",
> +                 mfn_x(l3mfn));
> +        return -ERANGE;
> +    }

Having dug through source history, I see this is largely the form that
it used to be.

But I'm unconvinced by the "cut control tools some slack".  I'm quite
tired of different bits of Xen taking on unnecessary complexity because
people are unwilling to fix the problem at the correct layer.

A toolstack which has non-pae_extended_cr3 guest on its hand will know
this before any pagetables get allocated.

That said...

I don't recall encountering this in migration v2, and looking at the
logic now, I'm pretty sure it will malfunction with a
non-pae_extended_cr3 guest.  When interpreting the guest cr3 value, we
blindly make the transform on the save and restore side, and I can't
spot anything limiting the L3 tables to below the 4G boundary.

So I'm reasonably sure I accidentally broke such guests in Xen 4.6(?)
and the absence of complaints in the intervening 8(?) years shows how
many are in use in practice.

For this check specifically, I'd suggest prohibiting non-32p guests from
setting pae_extended_cr3 in the first place (I see no limit currently),
and then simplifying the check to just

if ( unlikely(!VM_ASSIST(d, pae_extended_cr3)) &&
     mfn_x(l3mfn) >= PFN_DOWN(GB(4)) )


And I suppose I need to make a non-pae_extended_cr3 XTF test which is
migrate-capable...

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.