[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/4] x86: record SSP at non-guest entry points
On 28/02/2024 1:52 pm, Jan Beulich wrote: > We will want to use that value for call trace generation, and likely > also to eliminate the somewhat fragile shadow stack searching done in > fixup_exception_return(). For those purposes, guest-only entry points do > not need to record that value. > > To keep the saving code simple, record our own SSP that corresponds to > an exception frame, pointing to the top of the shadow stack counterpart > of what the CPU has saved on the regular stack. Consuming code can then > work its way from there. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > To record the full 64-bit value, some of the unused bits in the %cs slot > could be used. Sadly that slot has saved_upcall_mask in an unhelpful > location, otherwise simply storing low and high 32 bits in those two > separate half-slots would be a pretty obvious choice. As long as > "entry_ssp" is used in non-guest-entry frames only, we could of course > put half of it into a union with saved_upcall_mask ... > > Else may want to put a BUILD_BUG_ON(VADDR_BITS > 48) somewhere, but I'm > afraid I can't really identify a good place for such to live. Perhaps in reinit_bsp_stack() when we enable SHSTK on the BSP? Having it anywhere vaguely relevant is better than not having it. > > Leveraging that the CPU stores zero in the upper bits of the selector > register slots, the save sequence could also be > > shl $16, %rcx > or %rcx, UREGS_entry_ssp-2(%rsp) > > That's shorter and avoids a 16-bit operation, but may be less desirable, > for being a read-modify-write access. I doubt you'll be able to measure a difference between the two options (it's into the active stack, after all), but the two stores is probably marginally better. When shstks are active, we're taking a large hit from the busy token handling. I was concerned by the misaligned access, but it's not misaligned, its it? It's the start of entry_ssp which is misaligned and the -2 brings it back to being properly aligned. > --- a/xen/arch/x86/include/asm/asm_defns.h > +++ b/xen/arch/x86/include/asm/asm_defns.h > @@ -221,7 +221,7 @@ static always_inline void stac(void) > #endif > > #ifdef __ASSEMBLY__ > -.macro SAVE_ALL compat=0 > +.macro SAVE_ALL compat=0 ssp=IS_ENABLED(CONFIG_XEN_SHSTK) I'm not sure this is what you want to do. Because it's only the default, we'll still.... > addq $-(UREGS_error_code-UREGS_r15), %rsp > cld > movq %rdi,UREGS_rdi(%rsp) > @@ -235,6 +235,9 @@ static always_inline void stac(void) > movq %rax,UREGS_rax(%rsp) > xor %eax, %eax > .if !\compat > +.if \ssp > + rdsspq %rcx ... pick this up even in !CONFIG_XEN_SHSTK builds, and ... > +.endif > movq %r8,UREGS_r8(%rsp) > movq %r9,UREGS_r9(%rsp) > movq %r10,UREGS_r10(%rsp) > @@ -264,6 +267,11 @@ static always_inline void stac(void) > xor %r13d, %r13d > xor %r14d, %r14d > xor %r15d, %r15d > +.if \ssp && !\compat > + mov %cx, UREGS_entry_ssp(%rsp) > + shr $16, %rcx > + mov %ecx, UREGS_entry_ssp+2(%rsp) ... store it here. I think you need to use ssp=1 by default, and #ifdef CONFIG_XEN_SHSTK .if ... for these two blocks, so they disappear properly in !SHSTK builds. But for the rest of the behaviour, there are two overlapping things, because you end up getting entry_ssp=0 in SHSTK builds running on hardware without shstk active. And with that in mind, to confirm, the RDSSP block depends on the xor %ecx,%ecx between the two hunks in order to function as intended? > --- a/xen/include/public/arch-x86/xen-x86_64.h > +++ b/xen/include/public/arch-x86/xen-x86_64.h > @@ -183,7 +183,19 @@ struct cpu_user_regs { > uint8_t _pad1[3]; > __DECL_REG_LO16(flags); /* rflags.IF == !saved_upcall_mask */ > __DECL_REG_LO8(sp); > - uint16_t ss, _pad2[3]; > + uint16_t ss; > +#if !defined(__XEN__) > + uint16_t _pad2[3]; > +#elif defined(COMPILE_OFFSETS) > + uint16_t entry_ssp[3]; > +#else > + /* > + * This points _at_ the corresponding shadow stack frame; it is _not_ the > + * outer context's SSP. That, if the outer context has CET-SS enabled, > + * is stored in the top slot of the pointed to shadow stack frame. > + */ > + signed long entry_ssp:48; > +#endif I have to admit that I dislike this. (And only some of that is because it's work I'm going to have to revert in order to make FRED support work...) We desperately need to break our use of this structure to start with, and with that done, we don't need to play games about hiding SSP in a spare 6 bytes in an ABI dubiously made public nearly two decades ago. How hard would it be just: #define cpu_user_regs abi_cpu_user_regs #include <public/xen.h> #undef cpu_user_regs and make a copy of cpu_user_regs that we can really put an SSP field into? It would make this patch more simple, and we could finally get the vm86 block off the stack (which also fixes OoB accesses in the #DF handler - cant remember if I finished the bodge around that or not.) Irrespective of anything else, why do we have COMPILE_OFFSETS getting in here? ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |