|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 19/22] x86/pv: Guest exception handling in FRED mode
On 08/10/2025 1:28 pm, Jan Beulich wrote:
> On 04.10.2025 00:53, Andrew Cooper wrote:
>> Under FRED, entry_from_pv() handles everything. To start with, implement
>> exception handling in the same manner as entry_from_xen(), although we can
>> unconditionally enable interrupts after the async/fatal events.
>>
>> After entry_from_pv() returns, test_all_events() needs to run to perform
>> exception and interrupt injection. Split entry_FRED_R3() into two and
>> introduce eretu_exit_to_guest() as the latter half, coming unilaterally from
>> restore_all_guest().
>>
>> For all of this, there is a slightly complicated relationship with CONFIG_PV.
>> entry_FRED_R3() must exist irrespective of CONFIG_PV, because it's the
>> entrypoint registered with hardware. For simplicity, entry_from_pv() is
>> always called, but it collapses into fatal_trap() in the !PV case.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
Thanks.
>
> Nevertheless ...
>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -2266,9 +2266,82 @@ void asmlinkage check_ist_exit(const struct
>> cpu_user_regs *regs, bool ist_exit)
>>
>> void asmlinkage entry_from_pv(struct cpu_user_regs *regs)
>> {
>> + struct fred_info *fi = cpu_regs_fred_info(regs);
>> + uint8_t type = regs->fred_ss.type;
>> + uint8_t vec = regs->fred_ss.vector;
>> +
>> /* Copy fred_ss.vector into entry_vector as IDT delivery would have
>> done. */
>> - regs->entry_vector = regs->fred_ss.vector;
>> + regs->entry_vector = vec;
>> +
>> + if ( !IS_ENABLED(CONFIG_PV) )
>> + goto fatal;
>> +
>> + /*
>> + * First, handle the asynchronous or fatal events. These are either
>> + * unrelated to the interrupted context, or may not have valid context
>> + * recorded, and all have special rules on how/whether to re-enable
>> IRQs.
>> + */
>> + switch ( type )
>> + {
>> + case X86_ET_EXT_INTR:
>> + return do_IRQ(regs);
>> +
>> + case X86_ET_NMI:
>> + return do_nmi(regs);
>> +
>> + case X86_ET_HW_EXC:
>> + switch ( vec )
>> + {
>> + case X86_EXC_DF: return do_double_fault(regs);
>> + case X86_EXC_MC: return do_machine_check(regs);
>> + }
>> + break;
>> + }
>> +
>> + /*
>> + * With the asynchronous events handled, what remains are the
>> synchronous
>> + * ones. PV guest context always had interrupts enabled.
>> + */
>> + local_irq_enable();
>> +
>> + switch ( type )
>> + {
>> + case X86_ET_HW_EXC:
>> + case X86_ET_PRIV_SW_EXC:
>> + case X86_ET_SW_EXC:
>> + switch ( vec )
>> + {
>> + case X86_EXC_PF: handle_PF(regs, fi->edata); break;
>> + case X86_EXC_GP: do_general_protection(regs); break;
>> + case X86_EXC_UD: do_invalid_op(regs); break;
>> + case X86_EXC_NM: do_device_not_available(regs); break;
>> + case X86_EXC_BP: do_int3(regs); break;
>> + case X86_EXC_DB: handle_DB(regs, fi->edata); break;
>> + case X86_EXC_CP: do_entry_CP(regs); break;
>> +
>> + case X86_EXC_DE:
>> + case X86_EXC_OF:
>> + case X86_EXC_BR:
>> + case X86_EXC_NP:
>> + case X86_EXC_SS:
>> + case X86_EXC_MF:
>> + case X86_EXC_AC:
>> + case X86_EXC_XM:
>> + do_trap(regs);
>> + break;
>>
>> + default:
>> + goto fatal;
>> + }
>> + break;
>> +
>> + default:
>> + goto fatal;
>> + }
>> +
>> + return;
>> +
>> + fatal:
>> fatal_trap(regs, false);
>> }
> ... I'm still somewhat bothered by this almost entirely duplicating the
> other entry function, i.e. I continue to wonder if we wouldn't be better
> off by eliminating that duplication (say by way of an always_inline
> helper with a suitable extra parameter).
They are not sufficiently similar.
By the end of this series alone, they differ by IS_ENABLED(CONFIG_PV),
the condition for enabling local interrupts, the ERETU fixup, and the
SYSCALL/SYSENTER handling.
NMI handling is still an open question (deferred for now, because it
functions, albeit inefficiently) and adds a further difference.
>
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -63,7 +63,7 @@ UNLIKELY_END(syscall_no_callback)
>> /* Conditionally clear DF */
>> and %esi, UREGS_eflags(%rsp)
>> /* %rbx: struct vcpu */
>> -test_all_events:
>> +LABEL(test_all_events, 0)
>> ASSERT_NOT_IN_ATOMIC
>> cli # tests must not race interrupts
>> /*test_softirqs:*/
>> @@ -152,6 +152,8 @@ END(switch_to_kernel)
>> FUNC_LOCAL(restore_all_guest)
>> ASSERT_INTERRUPTS_DISABLED
>>
>> + ALTERNATIVE "", "jmp eretu_exit_to_guest", X86_FEATURE_XEN_FRED
>> +
>> /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
>> mov VCPU_arch_msrs(%rbx), %rdx
> I also continue to wonder if we wouldn't do a tiny bit better by using
>
> ALTERNATIVE "mov VCPU_arch_msrs(%rbx), %rdx", \
> "jmp eretu_exit_to_guest", \
> X86_FEATURE_XEN_FRED
>
> Or by converting the few jumps to restore_all_guest to alternatives
> (duplicating the ASSERT_INTERRUPTS_DISABLED there).
I'm quite firmly against this.
Sure, we could save a 5 byte nop, but the cost of doing that is merging
two unrelated pieces of logic in a construct explicitly to signal two
related things.
The added complexity to follow the logic is not worth the 5 byte nop saving.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |