[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/elf: Improve code generation in elf_core_save_regs()
On 26/02/2025 8:44 am, Jan Beulich wrote: > On 26.02.2025 08:44, Jan Beulich wrote: >> On 25.02.2025 23:45, Andrew Cooper wrote: >>> A CALL with 0 displacement is handled specially, and is why this logic >>> functions even with CET Shadow Stacks active. Nevertheless a rip-relative >>> LEA >>> is the more normal way of doing this in 64bit code. >>> >>> The retrieval of flags modifies the stack pointer so needs to state a >>> dependency on the stack pointer. Despite it's name, ASM_CALL_CONSTRAINT is >>> the way to do this. >>> >>> read_sreg() forces the answer through a register, causing code generation of >>> the form: >>> >>> mov %gs, %eax >>> mov %eax, %eax >>> mov %rax, 0x140(%rsi) >>> >>> Encode the reads directly with a memory operand. This results in a 16bit >>> store instead of an 64bit store, but the backing memory is zeroed. >> Raises the question whether we shouldn't change read_sreg(). At least the >> emulator uses of it would also benefit from storing straight to memory. And >> the remaining uses ought to be optimizable by the compiler, except that I >> don't expect we'd be able to express the zero-extending nature when the >> destination is a register. Or wait, maybe I can make up something (whether >> that's going to be liked is a separate question). > Here you go. > > Jan > > x86: make read_sreg() "bi-modal" > > Permit use sites to control whether to store directly to memory; right > now both elf_core_save_regs() and the insn emulator's put_fpu() > needlessly go through an intermediate GPR. Note that in both cases the > apparent loss of zero-extension isn't a problem: The fields written to > start out zero-initialized anyway. > > No change in generated code for the use sites not being touched. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > Whether to make the change to put_fpu() is up for discussion: In my > build it increases code size slightly, despite the reduction of number > of insns emitted. An alternative (leaving the decision to the compiler) > might be to drop the if() and use "=g" as constraint. > > I was considering to omit the assignment to sel_ on the if() branch, > expecting the compiler to then flag uses of the return value (as > consuming uninitialized data) when a 2nd argument is passed. However, > gcc14 then already flags the "sel_;" at the end of the macro as > consuming uninitialized data. > > --- a/xen/arch/x86/include/asm/regs.h > +++ b/xen/arch/x86/include/asm/regs.h > @@ -16,10 +16,20 @@ > !diff || ((r)->cs != __HYPERVISOR_CS); > \ > }) > > -#define read_sreg(name) ({ \ > - unsigned int __sel; \ > - asm ( "mov %%" STR(name) ",%0" : "=r" (__sel) ); \ > - __sel; \ > +#define read_sreg(name, dst...) ({ \ > + unsigned int sel_; \ > + BUILD_BUG_ON(count_args(dst) > 1); \ > + if ( count_args(dst) ) \ > + { \ > + typeof(LASTARG(&sel_, ## dst)) dst_ = \ > + LASTARG(&sel_, ## dst); \ > + asm ( "mov %%" STR(name) ",%0" : "=m" (*dst_) ); \ > + /* The compiler ought to optimize this out. */ \ > + sel_ = *dst_; \ > + } \ > + else \ > + asm ( "mov %%" STR(name) ",%0" : "=r" (sel_) ); \ > + sel_; \ > }) This doesn't fix the register promotion problem. That can be fixed by unsigned long rather than int, as you did for rdmsr. https://godbolt.org/z/K5hKz7KvM But the fundamental problem is that the sreg instructions with mem16 encodings are weird. They don't even follow normal x86 rules for operand size. By the end of the FRED series (for which this patch was misc cleanup), I've almost removed read_sreg(), and was intending to purge it completely. Even in it's current form, it's not normal C semantics, because it looks to take a variable which isn't a variable. Clever as this trick is, I feel it's a backwards step in terms of legibility, and that plain asm()'s are the lesser evil when it comes to mem16 instructions. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |