[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/CPU/AMD: avoid printing reset reasons on Xen domU



On Fri, Dec 19, 2025, Andrew Cooper wrote:
> On 19/12/2025 4:09 pm, Sean Christopherson wrote:
> > On Fri, Dec 19, 2025, Borislav Petkov wrote:
> >> On December 19, 2025 1:01:31 AM UTC, Ariadne Conill 
> >> <ariadne@ariadne.space> wrote:
> >>> Xen domU cannot access the given MMIO address for security reasons,
> >>> resulting in a failed hypercall in ioremap() due to permissions.
> > Why does that matter though?  Ah, because set_pte() assumes success, and so
> > presumably the failed hypercall goes unnoticed and trying to access the MMIO
> > #PFs due to !PRESENT mapping.
> >
> >>> Fixes: ab8131028710 ("x86/CPU/AMD: Print the reason for the last reset")
> >>> Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
> >>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
> >>> Cc: stable@xxxxxxxxxxxxxxx
> >>> ---
> >>> arch/x86/kernel/cpu/amd.c | 6 ++++++
> >>> 1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> >>> index a6f88ca1a6b4..99308fba4d7d 100644
> >>> --- a/arch/x86/kernel/cpu/amd.c
> >>> +++ b/arch/x86/kernel/cpu/amd.c
> >>> @@ -29,6 +29,8 @@
> >>> # include <asm/mmconfig.h>
> >>> #endif
> >>>
> >>> +#include <xen/xen.h>
> >>> +
> >>> #include "cpu.h"
> >>>
> >>> u16 invlpgb_count_max __ro_after_init = 1;
> >>> @@ -1333,6 +1335,10 @@ static __init int print_s5_reset_status_mmio(void)
> >>>   if (!cpu_feature_enabled(X86_FEATURE_ZEN))
> >>>           return 0;
> >>>
> >>> + /* Xen PV domU cannot access hardware directly, so bail for domU case */
> > Heh, Xen on Zen crime.
> >
> >>> + if (cpu_feature_enabled(X86_FEATURE_XENPV) && !xen_initial_domain())
> >>> +         return 0;
> >>> +
> >>>   addr = ioremap(FCH_PM_BASE + FCH_PM_S5_RESET_STATUS, sizeof(value));
> >>>   if (!addr)
> >>>           return 0;
> >> Sean, looka here. The other hypervisor wants other checks.
> >>
> >> Time to whip out the X86_FEATURE_HYPERVISOR check.
> > LOL, Ariadne, be honest, how much did Boris pay you?  :-D
> >
> > Jokes aside, I suppose I'm fine adding a HYPERVISOR check, but at the same 
> > time,
> > how is this not a Xen bug?  Refusing to create a mapping because the VM 
> > doesn't
> > have a device defined at a given GPA is pretty hostile behavior for a 
> > hypervisor.
>
> This is a Xen PV guest.  No SVM/VT-x.
> 
> A PV Guest (by it's contract with Xen) cannot create mappings to host
> physical addresses it doesn't own.

Huh, assuming wiki.xenproject.org[*] can be trusted, TIL Xen PV has no notion
of a virtual motherboard/chipset:

  In a paravirtualized VM, guests run with fully paravirtualized disk and
  network interfaces; interrupts and timers are paravirtualized; there is no
  emulated motherboard or device bus; guests boot directly into the kernel
  in the mode the kernel wishes to run in (32-bit or 64-bit), without needing
  to start in 16-bit mode or go through a BIOS; all privileged instructions
  are replaced with paravirtualized equivalents (hypercalls), and access to
  the page tables was paravirtualized as well.

[*] https://wiki.xenproject.org/wiki/Understanding_the_Virtualization_Spectrum

> Xen rejects the attempt, and Linux is ignoring the failure in this case. 
> This has been ABI for 25 years.

Heh, I suspected as much.

> From a more practical point of view, because guests can read their own
> pagetables, Xen can't swap the requested PTE for safe alternative to
> trap the MMIO access, because that violates Linux's model of what's
> mapped in this position.

That all makes sense, but shouldn't the inability to ioremap() chipset MMIO be
addressed in ioremap()?  E.g. after poking around a bit, commit 6a92b11169a6
("x86/EISA: Don't probe EISA bus for Xen PV guests") fudged around the same
underlying flaw in eisa_bus_probe().  Ha!  Though maybe that's no longer 
necessary
as of 80a4da05642c ("x86/EISA: Use memremap() to probe for the EISA BIOS 
signature")?

Anyways, I'm still opposed to using HYPERVISOR.  E.g. given that Dom0 can 
apparently
access host MMIO just fine, and that it's perfectly reasonable for a KVM-based 
VMM
to emulate the chipset, assuming a guest doesn't have access to some asset is 
simply
wrong.

But rather than play whack-a-mole, can't we deal with the ignore PTE write 
failures
in ioremap()?  E.g. something like this?

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 12c8180ca1ba..b7e2c752af1d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -29,6 +29,8 @@
 #include <asm/memtype.h>
 #include <asm/setup.h>
 
+#include <xen/xen.h>
+
 #include "physaddr.h"
 
 /*
@@ -301,6 +303,20 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long 
size,
        if (ioremap_page_range(vaddr, vaddr + size, phys_addr, prot))
                goto err_free_area;
 
+       /*
+        * Verify the range was actually mapped when running as a Xen PV DomU
+        * guest.  Xen PV doesn't emulate a virtual chipset/motherboard, and
+        * disallows DomU from mapping host physical addresses that the domain
+        * doesn't own.  Unfortunately, the PTE APIs assume success, and so
+        * Xen's rejection of the mapping is ignored.
+        */
+       if (xen_pv_domain() && !xen_initial_domain()) {
+               int level;
+
+               if (!lookup_address(vaddr, &level))
+                       goto err_free_area;
+       }
+
        ret_addr = (void __iomem *) (vaddr + offset);
        mmiotrace_ioremap(unaligned_phys_addr, unaligned_size, ret_addr);



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.