|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/5] x86/pv: Optimise to the segment context switching paths
On 11/09/2020 10:49, Jan Beulich wrote:
> On 09.09.2020 11:59, Andrew Cooper wrote:
>> Save the segment selectors with explicit asm, rather than with read_sreg().
>> This permits the use of MOV's m16 encoding, which avoids indirecting the
>> selector value through a register.
> Instead of this, how about making read_sreg() look like
>
> #define read_sreg(val, name) ({ \
> if ( sizeof(val) == 2 ) \
> asm volatile ( "mov %%" STR(name) ",%0" : "=m" (val) ); \
> else \
> asm volatile ( "mov %%" STR(name) ",%k0" : "=r" (val) ); \
> })
>
> which then also covers read_registers()? I have a full patch
> ready to send, if you agree.
That will go wrong for
uint16_t ds; read_sreg(ds, ds);
as it will force the variable to be spilled onto the stack.
I don't think this is a clever move.
Furthermore, it is bad enough that read_sreg() already takes one magic
parameter which doesn't follow normal C rules - renaming to READ_SREG()
would be an improvement - but this is now adding a second which is a
capture by name.
I'm afraid that is a firm no from me.
There is one other place where we read all segment registers at once.
Maybe having a static inline save_sregs(struct cpu_user_regs *) hiding
the asm block, but I'm not necessarily convinced of this plan either.
At least it would cleanly separate the "I've obviously got a memory
operand" and "I almost certainly want it in a register anyway" logic.
>
>> For {save,load}_segments(), opencode the fs/gs helpers, as the optimiser is
>> unable to rearrange the logic down to a single X86_CR4_FSGSBASE check. This
>> removes several jumps and creates bigger basic blocks.
>>
>> In load_segments(), optimise GS base handling substantially. The call to
>> svm_load_segs() already needs gsb/gss the correct way around, so hoist the
>> logic for the later path to use it as well. Swapping the inputs in GPRs is
>> far more efficient than using SWAPGS.
>>
>> Previously, there was optionally one SWAPGS from the user/kernel mode check,
>> two SWAPGS's in write_gs_shadow() and two WRGSBASE's. Updates to GS (4 or 5
>> here) in quick succession stall all contemporary pipelines repeatedly.
>> (Intel
>> Core/Xeon pipelines have segment register renaming[1], so can continue to
>> speculatively execute with one GS update in flight. Other pipelines cannot
>> have two updates in flight concurrently, and must stall dispatch of the
>> second
>> until the first has retired.)
>>
>> Rewrite the logic to have exactly two WRGSBASEs and one SWAPGS, which removes
>> two stalles all contemporary processors.
>>
>> Although modest, the resulting delta is:
>>
>> add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-106 (-106)
>> Function old new delta
>> paravirt_ctxt_switch_from 235 198 -37
>> context_switch 3582 3513 -69
>>
>> in a common path.
>>
>> [1]
>> https://software.intel.com/security-software-guidance/insights/deep-dive-intel-analysis-speculative-behavior-swapgs-and-segment-registers
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Preferably re-based onto said change, and maybe with another adjustment
> (see below)
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>
>> @@ -1556,18 +1557,24 @@ static void load_segments(struct vcpu *n)
>> : [ok] "+r" (all_segs_okay) \
>> : [_val] "rm" (val) )
>>
>> -#ifdef CONFIG_HVM
>> - if ( cpu_has_svm && !compat && (uregs->fs | uregs->gs) <= 3 )
>> + if ( !compat )
>> {
>> - unsigned long gsb = n->arch.flags & TF_kernel_mode
>> - ? n->arch.pv.gs_base_kernel : n->arch.pv.gs_base_user;
>> - unsigned long gss = n->arch.flags & TF_kernel_mode
>> - ? n->arch.pv.gs_base_user : n->arch.pv.gs_base_kernel;
>> + gsb = n->arch.pv.gs_base_kernel;
>> + gss = n->arch.pv.gs_base_user;
>> +
>> + /*
>> + * Figure out which way around gsb/gss want to be. gsb needs to be
>> + * the active context, and gss needs to be the inactive context.
>> + */
>> + if ( !(n->arch.flags & TF_kernel_mode) )
>> + SWAP(gsb, gss);
>>
>> - fs_gs_done = svm_load_segs(n->arch.pv.ldt_ents, LDT_VIRT_START(n),
>> - n->arch.pv.fs_base, gsb, gss);
>> + if ( IS_ENABLED(CONFIG_HVM) && cpu_has_svm &&
> The change from #ifdef to IS_ENABLED() wants mirroring to the
> prefetching site imo. I wonder though whether the adjustment is a
> good idea: The declaration lives in svm.h, and I would view it as
> quite reasonable for that header to not get included in !HVM builds
> (there may be a lot of disentangling to do to get there, but still).
I'm not overly fussed, but there will absolutely have to be HVM stubs
for normal code. I don't see why we should special case svm_load_segs()
to not have a stub.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |