Xen project Mailing List

[PATCH v2 21/23] x86/pv: ERETU error handling

To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 28 Aug 2025 16:04:07 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 28 Aug 2025 15:11:27 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

ERETU can fault for guest reasons, and like IRET needs special handling to forward the error into the guest. As this is largely written in C, take the opportunity to better classify the sources of error, and in particilar, not forward errors that are actually Xen's fault into the guest, opting for a domain crash instead. Because ERETU does not enable NMIs if it faults, a corner case exists if an NMI was taken while in guest context, and the ERETU back out faults. Recovery must involve an ERETS with the interrupted context's NMI flag. See the comments for full details. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> v2: * New --- xen/arch/x86/traps.c | 115 +++++++++++++++++++++++++++++++ xen/arch/x86/x86_64/entry-fred.S | 13 ++++ 2 files changed, 128 insertions(+) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 72df446a6a78..e10b4e771824 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -2345,6 +2345,113 @@ void asmlinkage entry_from_pv(struct cpu_user_regs *regs) fatal_trap(regs, false); } +void nocall eretu_error_dom_crash(void); + +/* + * Classify an event at the ERETU instruction, and handle if possible. + * Returns @true if handled, @false if the event should continue down the + * normal handlers. + */ +static bool handle_eretu_event(struct cpu_user_regs *regs) +{ + unsigned long recover; + + /* + * WARNING: The GPRs in gregs overlaps with regs. Only gregs->error_code + * and later are legitimate to access. + */ + struct cpu_user_regs *gregs = + _p(regs->rsp - offsetof(struct cpu_user_regs, error_code)); + + /* + * The asynchronous or fatal events (INTR, NMI, #MC, #DF) have been dealt + * with, meaning we only have syncrhonous ones to consider. Anything + * which isn't a hardware exception wants handling normally. + */ + if ( regs->fred_ss.type != X86_ET_HW_EXC ) + return false; + + /* + * Guests are permitted to write non-present GDT/LDT entries. Therefore + * #NP[sel] (%cs) and #SS[sel] (%ss) must be handled as guest errors. The + * only other source of #SS is for a bad %ss-relative memory access in + * Xen, and if the stack is that bad, we'll have escalated to #DF. + * + * #PF can happen from ERETU accessing the GDT/LDT. Xen may translate + * these into #GP for the guest, so must be handled as guest errors. In + * theory we can get #PF for a bad instruction fetch or bad stack access, + * but either of these will be fatal and not end up here. + */ + switch ( regs->fred_ss.vector ) + { + case X86_EXC_GP: + /* + * #GP[0] can occur because of a NULL %cs or %ss (which are a guest + * error), but some #GP[0]'s are errors in Xen (ERETU at SL != 0), or + * errors of Xen handling guest state (bad metadata). These magic + * numbers came from the FRED Spec; they check that ERETU is trying to + * return to Ring 3, and that reserved or inapplicable bits are 0. + */ + if ( regs->error_code == 0 && (gregs->cs & ~3) && (gregs->ss & ~3) && + (regs->fred_cs.sl != 0 || + (gregs->csx & 0xffffffffffff0003UL) != 3 || + (gregs->rflags & 0xffffffffffc2b02aUL) != 2 || + (gregs->ssx & 0xfff80003UL) != 3) ) + { + recover = (unsigned long)eretu_error_dom_crash; + + if ( regs->fred_cs.sl ) + gprintk(XENLOG_ERR, "ERETU at SL %u\n", regs->fred_cs.sl); + else + gprintk(XENLOG_ERR, "Bad return state: csx %#lx, rflags %#lx, ssx %#x\n", + gregs->csx, gregs->rflags, (unsigned int)gregs->ssx); + break; + } + fallthrough; + case X86_EXC_NP: + case X86_EXC_SS: + case X86_EXC_PF: + recover = (unsigned long)entry_FRED_R3; + break; + + /* + * Handle everything else normally. #BP and #DB would be debugging + * activities in Xen. In theory we can get #UD if CR4.FRED gets + * cleared, but in practice if that were the case we wouldn't be here + * handling the result. + */ + default: + return false; + } + + this_cpu(last_extable_addr) = regs->rip; + + /* + * Everything else is recoverable, one way or another. + * + * If an NMI was taken in guest context and the ERETU faulted, NMIs will + * still be blocked. Therefore we copy the interrupted frame's NMI status + * into our own, and must ERETS as part of recovery. + */ + regs->fred_ss.nmi = gregs->fred_ss.nmi; + + /* + * Next, copy the exception information from the current frame back onto + * the interrupted frame, preserving the interrupted frame's %cs and %ss. + */ + *cpu_regs_fred_info(regs) = *cpu_regs_fred_info(gregs); + gregs->ssx = (regs->ssx & ~0xffff) | gregs->ss; + gregs->csx = (regs->csx & ~0xffff) | gregs->cs; + gregs->error_code = regs->error_code; + gregs->entry_vector = regs->entry_vector; + + fixup_exception_return(regs, recover, 0); + + return true; +} + +void nocall eretu(void); + void asmlinkage entry_from_xen(struct cpu_user_regs *regs) { struct fred_info *fi = cpu_regs_fred_info(regs); @@ -2383,6 +2490,14 @@ void asmlinkage entry_from_xen(struct cpu_user_regs *regs) if ( regs->eflags & X86_EFLAGS_IF ) local_irq_enable(); + /* + * An event taken at the ERETU instruction may be because of guest state + * and in that case will need special handling. + */ + if ( unlikely(regs->rip == (unsigned long)eretu) && + handle_eretu_event(regs) ) + return; + switch ( type ) { case X86_ET_HW_EXC: diff --git a/xen/arch/x86/x86_64/entry-fred.S b/xen/arch/x86/x86_64/entry-fred.S index 07684f38a078..8b5cafb866e2 100644 --- a/xen/arch/x86/x86_64/entry-fred.S +++ b/xen/arch/x86/x86_64/entry-fred.S @@ -27,9 +27,22 @@ END(entry_FRED_R3) FUNC(eretu_exit_to_guest) POP_GPRS + + /* + * Exceptions here are handled by redirecting either to + * entry_FRED_R3() (for an error to be passed to the guest), or to + * eretu_error_dom_crash() (for a Xen error handling guest state). + */ +LABEL(eretu, 0) eretu END(eretu_exit_to_guest) +FUNC(eretu_error_dom_crash) + PUSH_AND_CLEAR_GPRS + sti + call asm_domain_crash_synchronous /* Does not return */ +END(eretu_error_dom_crash) + /* The Ring0 entrypoint is at Ring3 + 0x100. */ .org entry_FRED_R3 + 0x100, 0xcc -- 2.39.5

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.