[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] x86/PV32: restore PAE-extended-CR3 logic


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 15 Feb 2023 15:54:11 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YGuwNLop/ql4FzbLLIEGIhrzXrHY2ndkGysncLs6Gr4=; b=bfXl6kL1bn66VmSwobQwUThJYlU5QShPXe80DpAzJC+VzLAx3h/YzY+gaPytoFS0yX2YZ7j+DFsjh9IKlN7ZFj87webg8ajflQHrCiao7ZGeoHGG0OeGVV9Zu15UgawzSN0qG4VzPwU8WyPhMohmObocKzCVpSsWH9RlSt9Ysv/BGE7lsGbGYxZItj09sHGTCeILFxI7UH/PtCiErhaJTHX2FJdxq7CLWNng6G1OJJYvWqb6URbsS0EtKqwqWEo3d41R4yYF1yjbXXlYOh6o8Fd/6MgotX0/2FyHzWgAb0K0edmyXfrjrRETohHinGfAwNChiaHkzK0bompl838wXg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jCHWvI6SPHmBM+QHw8gd8htG9zGzrYZy3hJa07Kw5PgoZCuzWsprkjqByrkN0YgRwwASG8WGF+fCEJpJFrA3DyvPy8tKpoz8jAw2kvq6DquMvwuIF3G6jK8snk1se3V3qzGNmUtorCgRQB80eWWLYNdbYgG2Bon+ddtjOIVsJAqZOy7HjziwONkBXBOpLlfF1xxIh0pOAlXtQMQM9e4eQk0o5sY5x1Gwr09dlNrcFVkvmuPWRTdPkeNxBPrw44AIxAtXP+l/3OTe4sjrZH2VBrjLC8FYgYw/t1bG/EWat3DQogzshHA/7PIg3T2FDx+4UK5LGcsbqZ4CWDKYNzujyw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 15 Feb 2023 14:54:50 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

While the PAE-extended-CR3 VM assist is a 32-bit only concept, it still
applies to guests also when run on a 64-bit hypervisor: The "extended
CR3" format has to be used there as well, to fit the address in the only
32-bit wide register there. As a result it was a mistake that the check
was never enabled for that case, and was then mistakenly deleted in the
course of removal of 32-bit-Xen code (218adf199e68 ["x86: We can assume
CONFIG_PAGING_LEVELS==4"]).

Similarly during Dom0 construction kernel awareness needs to be taken
into account, and respective code was again mistakenly never enabled for
32-bit Dom0 when running on 64-bit Xen (and thus wrongly deleted by
5d1181a5ea5e ["xen: Remove x86_32 build target"]).

At the same time restrict enabling of the assist for Dom0 to just the
32-bit case. Furthermore there's no need for an atomic update there.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
I was uncertain whether to add a check to the CR3 guest read path,
raising e.g. #GP(0) when the value read wouldn't fit but also may not
be converted to "extended" format (overflow is possible there in
principle because of the control tools "slack" in promote_l3_table()).

In that context I was puzzled to find no check on the CR3 guest write
path even in 4.2: A guest (bogusly) setting the PCD or PWT bits (or any
of the low reserved ones) could observe anomalous behavior rather than
plain failure.

As to a Fixes: tag - it's pretty unclear which of the many original
32-on-64 changes to blame. I don't think the two cited commits should
be referenced there, as they didn't break anything that wasn't already
broken.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_
     unsigned int   partial_flags = page->partial_flags;
     l3_pgentry_t   l3e = l3e_empty();
 
+    /*
+     * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does not
+     * understand the weird 'extended cr3' format for dealing with high-order
+     * address bits. We cut some slack for control tools (before vcpu0 is
+     * initialised).
+     */
+    if ( is_pv_32bit_domain(d) &&
+         unlikely(!VM_ASSIST(d, pae_extended_cr3)) &&
+         mfn_x(l3mfn) >= 0x100000 &&
+         d->vcpu[0] && d->vcpu[0]->is_initialised )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "PAE pgd must be below 4GB (%#lx >= 0x100000)",
+                 mfn_x(l3mfn));
+        return -ERANGE;
+    }
+
     pl3e = map_domain_page(l3mfn);
 
     /*
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -490,12 +490,12 @@ int __init dom0_construct_pv(struct doma
 
     nr_pages = dom0_compute_nr_pages(d, &parms, initrd_len);
 
-    if ( parms.pae == XEN_PAE_EXTCR3 )
-            set_bit(VMASST_TYPE_pae_extended_cr3, &d->vm_assist);
-
 #ifdef CONFIG_PV32
     if ( elf_32bit(&elf) )
     {
+        if ( parms.pae == XEN_PAE_EXTCR3 )
+            __set_bit(VMASST_TYPE_pae_extended_cr3, &d->vm_assist);
+
         if ( !pv_shim && (parms.virt_hv_start_low != UNSET_ADDR) )
         {
             unsigned long value = ROUNDUP(parms.virt_hv_start_low,
@@ -594,7 +594,10 @@ int __init dom0_construct_pv(struct doma
         vphysmap_start = parms.p2m_base;
         vphysmap_end   = vphysmap_start + nr_pages * sizeof(unsigned long);
     }
-    page = alloc_domheap_pages(d, order, MEMF_no_scrub);
+    page = alloc_domheap_pages(d, order,
+                               MEMF_no_scrub |
+                               (VM_ASSIST(d, pae_extended_cr3) ||
+                                !compat ? 0 : MEMF_bits(32)));
     if ( page == NULL )
         panic("Not enough RAM for domain 0 allocation\n");
     alloc_spfn = mfn_x(page_to_mfn(page));



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.