|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 21/22] x86/traps: Introduce FRED entrypoints
On 14/08/2025 4:57 pm, Jan Beulich wrote:
> On 08.08.2025 22:23, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/asm_defns.h
>> +++ b/xen/arch/x86/include/asm/asm_defns.h
>> @@ -315,6 +315,71 @@ static always_inline void stac(void)
>> subq $-(UREGS_error_code-UREGS_r15+\adj), %rsp
>> .endm
>>
>> +/*
>> + * Push and clear GPRs
>> + */
>> +.macro PUSH_AND_CLEAR_GPRS
>> + push %rdi
>> + xor %edi, %edi
>> + push %rsi
>> + xor %esi, %esi
>> + push %rdx
>> + xor %edx, %edx
>> + push %rcx
>> + xor %ecx, %ecx
>> + push %rax
>> + xor %eax, %eax
>> + push %r8
>> + xor %r8d, %r8d
>> + push %r9
>> + xor %r9d, %r9d
>> + push %r10
>> + xor %r10d, %r10d
>> + push %r11
>> + xor %r11d, %r11d
>> + push %rbx
>> + xor %ebx, %ebx
>> + push %rbp
>> +#ifdef CONFIG_FRAME_POINTER
>> +/* Indicate special exception stack frame by inverting the frame pointer. */
>> + mov %rsp, %rbp
>> + notq %rbp
>> +#else
>> + xor %ebp, %ebp
>> +#endif
>> + push %r12
>> + xor %r12d, %r12d
>> + push %r13
>> + xor %r13d, %r13d
>> + push %r14
>> + xor %r14d, %r14d
>> + push %r15
>> + xor %r15d, %r15d
>> +.endm
>> +
>> +/*
>> + * POP GPRs from a UREGS_* frame on the stack. Does not modify flags.
>> + *
>> + * @rax: Alternative destination for the %rax value on the stack.
>> + */
>> +.macro POP_GPRS rax=%rax
>> + pop %r15
>> + pop %r14
>> + pop %r13
>> + pop %r12
>> + pop %rbp
>> + pop %rbx
>> + pop %r11
>> + pop %r10
>> + pop %r9
>> + pop %r8
>> + pop \rax
>> + pop %rcx
>> + pop %rdx
>> + pop %rsi
>> + pop %rdi
>> +.endm
> Hmm, yes, differences are apparently large enough to warrant the redundancy
> with SAVE_ALL / RESTORE_ALL.
You may recall this construct from prior attempts I've had to remove
SAVE_ALL/RESTORE_ALL, even with the \rax parameter for SVM. I still
intend to complete that work at some point.
>
>> --- a/xen/arch/x86/include/asm/msr.h
>> +++ b/xen/arch/x86/include/asm/msr.h
>> @@ -202,9 +202,9 @@ static inline unsigned long read_gs_base(void)
>>
>> static inline unsigned long read_gs_shadow(void)
>> {
>> - unsigned long base;
>> + unsigned long base, cr4 = read_cr4();
>>
>> - if ( read_cr4() & X86_CR4_FSGSBASE )
>> + if ( !(cr4 & X86_CR4_FRED) && (cr4 & X86_CR4_FSGSBASE) )
>> {
>> asm volatile ( "swapgs" );
>> base = __rdgsbase();
>> @@ -234,7 +234,9 @@ static inline void write_gs_base(unsigned long base)
>>
>> static inline void write_gs_shadow(unsigned long base)
>> {
>> - if ( read_cr4() & X86_CR4_FSGSBASE )
>> + unsigned long cr4 = read_cr4();
>> +
>> + if ( !(cr4 & X86_CR4_FRED) && (cr4 & X86_CR4_FSGSBASE) )
>> {
>> asm volatile ( "swapgs\n\t"
>> "wrgsbase %0\n\t"
> I don't quite get how these changes fit into this patch.
Without the change, read_registers() suffers #UD because of the SWAPGS.
This recurses until hitting the guard page, then repeats the same on the
#DF stack. And because stacks work nicely under FRED, you eventually
hit #DF's guard page.
Strictly speaking it's only read_gs_shadow() which we need to change to
make exception handling work, but I fixed both at the same time.
That said, I have actually cleaned this codepath up with the MSR work
because the code gen in read_registers() is terrible. Due to
no-strict-aliasing, every store into state-> forces a recalculation of
get_cpu_info(), meaning that read_cr4() cannot be hoisted, and there's a
branch in every helper.
I'm still not sure how best to fit it into this series.
>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1013,6 +1013,32 @@ void show_execution_state_nmi(const cpumask_t *mask,
>> bool show_all)
>> printk("Non-responding CPUs: {%*pbl}\n",
>> CPUMASK_PR(&show_state_mask));
>> }
>>
>> +static const char *x86_et_name(unsigned int type)
>> +{
>> + static const char *const names[] = {
>> + [X86_ET_EXT_INTR] = "EXT_INTR",
>> + [X86_ET_NMI] = "NMI",
>> + [X86_ET_HW_EXC] = "HW_EXC",
>> + [X86_ET_SW_INT] = "SW_INT",
>> + [X86_ET_PRIV_SW_EXC] = "PRIV_SW_EXEC",
>> + [X86_ET_SW_EXC] = "SW_EXEC",
>> + [X86_ET_OTHER] = "OTHER",
>> + };
>> +
>> + return (type < ARRAY_SIZE(names) && names[type]) ? names[type] : "???";
>> +}
>> +
>> +static const char *x86_et_other_name(unsigned int vec)
> This isn't really a vector, is it?
Well - you are decoding the field name regs->fred_ss.vector.
Also I see I've typo'd EXEC in two of those names.
>
>> +{
>> + static const char *const names[] = {
>> + [0] = "MTF",
>> + [1] = "SYSCALL",
>> + [2] = "SYSENTER",
>> + };
>> +
>> + return (vec < ARRAY_SIZE(names) && names[vec][0]) ? names[vec] : "???";
> Did you mean to check names[ves] for being NULL? Or is this a leftover
> from the array being something like names[][10]?
Oh, bad copy&paste. Will fix.
>
>> --- a/xen/arch/x86/x86_64/Makefile
>> +++ b/xen/arch/x86/x86_64/Makefile
>> @@ -1,6 +1,7 @@
>> obj-$(CONFIG_PV32) += compat/
>>
>> obj-bin-y += entry.o
>> +obj-bin-y += entry-fred.o
> For the ordering here, ...
>
>> --- /dev/null
>> +++ b/xen/arch/x86/x86_64/entry-fred.S
>> @@ -0,0 +1,35 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +
>> + .file "x86_64/entry-fred.S"
>> +
>> +#include <asm/asm_defns.h>
>> +#include <asm/page.h>
>> +
>> + .section .text.entry, "ax", @progbits
>> +
>> + /* The Ring3 entry point is required to be 4k aligned. */
>> +
>> +FUNC(entry_FRED_R3, 4096)
> ... doesn't this 4k-alignment requirement suggest we want to put
> entry-fred.o first?
Perhaps, but that is quite subtle. I did also consider a
.text.entry.page_aligned section, but .text.entry only matters for XPTI
which (as agreed), I'm not intending to implement in FRED mode unless it
proves to be necessary.
Also IIRC there's still a symbol bug where _sentrytext takes priority
over entry_FRED_R3, so the backtrace is effectively wrong.
(These are all bad excuses, but some parts of this series are rather old.)
> Also, might it be more natural to use PAGE_SIZE
> here?
I did debate that, but the spec uses 0xfff, not pages, even if the
pipline surely does have an optimisation for chopping 12 metadata bits
off the bottom of a pointer.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |