[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 19/22] x86/pv: Guest exception handling in FRED mode


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 10 Feb 2026 12:41:07 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+c3Vo6K1nplnoVPLQhq8r6K31zwiP+dmss/NZ0iDDfc=; b=zO7rvjHFUJFnwiLNCQcql6YIHKLOl389Xo78tLOx5kQ3WVL8QNWPeOgPIs0GILsGpzBSMU14xMxZEVMghwK/PkuKGIgB6OtDU6EwzSUPRwK7E4ix+FVmDZRg7F9ClaHLxip2roEVJS1ZAIGnHN2WjaK3LZTmgjcKRXPcX1I57V/2IFhvMMwjecrVo5eIY2r+OVN0dX1mbTsVSbQ2upLjUYMw9hBqZoyFS2UPZefYPyj3/r6gp+Ulgy6sc0kMRQSdF4fDraGXmU9WXWLJt+GhCcMZMqIj69qXdXEaPj4BhyGRiAl1yEeJ4WFaI1eA6c/o9inWMHeYqT7Ewvt8pnarAw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=vBA3dAod8fysZ2ycGj6uAr3DSCW2N+wc2fMtHFf05mnBnSivb/ynSO7z7kn6q9Xs84gvieI170u3sdESr8M09wiBPtxjJESDV1zX46bWTDJlW053IwrjAXT2ACkKSsL4KEBHmZ9EwUA7uKcq5BXQjGr7rdYrBFEtNPbJL5YrS5tXEYIPXr61X4UDTiMukBJHHal6fBhBWyVX0Kmo92lu9e6NhGlJ4xWGLaGbIQHRdAO0fUFI57JHjZvCCxzr6zBzulN6VgbaZzzeAQSV+0AFRFsR+PS6HjuqlDsbtIS4vDZJNbMkfQjGdAqKTK5QGYSvbKZAVkfwW3QZcxrIIDcJsw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 10 Feb 2026 12:41:32 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08/10/2025 1:28 pm, Jan Beulich wrote:
> On 04.10.2025 00:53, Andrew Cooper wrote:
>> Under FRED, entry_from_pv() handles everything.  To start with, implement
>> exception handling in the same manner as entry_from_xen(), although we can
>> unconditionally enable interrupts after the async/fatal events.
>>
>> After entry_from_pv() returns, test_all_events() needs to run to perform
>> exception and interrupt injection.  Split entry_FRED_R3() into two and
>> introduce eretu_exit_to_guest() as the latter half, coming unilaterally from
>> restore_all_guest().
>>
>> For all of this, there is a slightly complicated relationship with CONFIG_PV.
>> entry_FRED_R3() must exist irrespective of CONFIG_PV, because it's the
>> entrypoint registered with hardware.  For simplicity, entry_from_pv() is
>> always called, but it collapses into fatal_trap() in the !PV case.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

Thanks.

>
> Nevertheless ...
>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -2266,9 +2266,82 @@ void asmlinkage check_ist_exit(const struct 
>> cpu_user_regs *regs, bool ist_exit)
>>  
>>  void asmlinkage entry_from_pv(struct cpu_user_regs *regs)
>>  {
>> +    struct fred_info *fi = cpu_regs_fred_info(regs);
>> +    uint8_t type = regs->fred_ss.type;
>> +    uint8_t vec = regs->fred_ss.vector;
>> +
>>      /* Copy fred_ss.vector into entry_vector as IDT delivery would have 
>> done. */
>> -    regs->entry_vector = regs->fred_ss.vector;
>> +    regs->entry_vector = vec;
>> +
>> +    if ( !IS_ENABLED(CONFIG_PV) )
>> +        goto fatal;
>> +
>> +    /*
>> +     * First, handle the asynchronous or fatal events.  These are either
>> +     * unrelated to the interrupted context, or may not have valid context
>> +     * recorded, and all have special rules on how/whether to re-enable 
>> IRQs.
>> +     */
>> +    switch ( type )
>> +    {
>> +    case X86_ET_EXT_INTR:
>> +        return do_IRQ(regs);
>> +
>> +    case X86_ET_NMI:
>> +        return do_nmi(regs);
>> +
>> +    case X86_ET_HW_EXC:
>> +        switch ( vec )
>> +        {
>> +        case X86_EXC_DF: return do_double_fault(regs);
>> +        case X86_EXC_MC: return do_machine_check(regs);
>> +        }
>> +        break;
>> +    }
>> +
>> +    /*
>> +     * With the asynchronous events handled, what remains are the 
>> synchronous
>> +     * ones.  PV guest context always had interrupts enabled.
>> +     */
>> +    local_irq_enable();
>> +
>> +    switch ( type )
>> +    {
>> +    case X86_ET_HW_EXC:
>> +    case X86_ET_PRIV_SW_EXC:
>> +    case X86_ET_SW_EXC:
>> +        switch ( vec )
>> +        {
>> +        case X86_EXC_PF:  handle_PF(regs, fi->edata); break;
>> +        case X86_EXC_GP:  do_general_protection(regs); break;
>> +        case X86_EXC_UD:  do_invalid_op(regs); break;
>> +        case X86_EXC_NM:  do_device_not_available(regs); break;
>> +        case X86_EXC_BP:  do_int3(regs); break;
>> +        case X86_EXC_DB:  handle_DB(regs, fi->edata); break;
>> +        case X86_EXC_CP:  do_entry_CP(regs); break;
>> +
>> +        case X86_EXC_DE:
>> +        case X86_EXC_OF:
>> +        case X86_EXC_BR:
>> +        case X86_EXC_NP:
>> +        case X86_EXC_SS:
>> +        case X86_EXC_MF:
>> +        case X86_EXC_AC:
>> +        case X86_EXC_XM:
>> +            do_trap(regs);
>> +            break;
>>  
>> +        default:
>> +            goto fatal;
>> +        }
>> +        break;
>> +
>> +    default:
>> +        goto fatal;
>> +    }
>> +
>> +    return;
>> +
>> + fatal:
>>      fatal_trap(regs, false);
>>  }
> ... I'm still somewhat bothered by this almost entirely duplicating the
> other entry function, i.e. I continue to wonder if we wouldn't be better
> off by eliminating that duplication (say by way of an always_inline
> helper with a suitable extra parameter).

They are not sufficiently similar.

By the end of this series alone, they differ by IS_ENABLED(CONFIG_PV),
the condition for enabling local interrupts, the ERETU fixup, and the
SYSCALL/SYSENTER handling.

NMI handling is still an open question (deferred for now, because it
functions, albeit inefficiently) and adds a further difference.


>
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -63,7 +63,7 @@ UNLIKELY_END(syscall_no_callback)
>>          /* Conditionally clear DF */
>>          and   %esi, UREGS_eflags(%rsp)
>>  /* %rbx: struct vcpu */
>> -test_all_events:
>> +LABEL(test_all_events, 0)
>>          ASSERT_NOT_IN_ATOMIC
>>          cli                             # tests must not race interrupts
>>  /*test_softirqs:*/
>> @@ -152,6 +152,8 @@ END(switch_to_kernel)
>>  FUNC_LOCAL(restore_all_guest)
>>          ASSERT_INTERRUPTS_DISABLED
>>  
>> +        ALTERNATIVE "", "jmp eretu_exit_to_guest", X86_FEATURE_XEN_FRED
>> +
>>          /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
>>          mov VCPU_arch_msrs(%rbx), %rdx
> I also continue to wonder if we wouldn't do a tiny bit better by using
>
>         ALTERNATIVE "mov VCPU_arch_msrs(%rbx), %rdx", \
>                     "jmp eretu_exit_to_guest", \
>                     X86_FEATURE_XEN_FRED
>
> Or by converting the few jumps to restore_all_guest to alternatives
> (duplicating the ASSERT_INTERRUPTS_DISABLED there).

I'm quite firmly against this.

Sure, we could save a 5 byte nop, but the cost of doing that is merging
two unrelated pieces of logic in a construct explicitly to signal two
related things.

The added complexity to follow the logic is not worth the 5 byte nop saving.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.