|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] RFC x86/msr: Use WRMSRNS $imm when available
On 09.08.2025 00:20, Andrew Cooper wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>
> This is on top of the FRED series for the wrmsrns() cleanup, but otherwise
> unrelated.
>
> The code generation isn't entirely ideal
>
> Function old new delta
> init_fred 255 274 +19
> vmx_set_reg 248 256 +8
> enter_state_helper.cold 1014 1018 +4
> __start_xen 8893 8897 +4
>
> but made worse by the the prior codegen for wrmsrns(MSR_STAR, ...) being mad:
>
> mov $0xc0000081,%ecx
> mov $0xe023e008,%edx
> movabs $0xe023e00800000000,%rax
> cs wrmsr
>
> The two sources of code expansion come from the compiler not being able to
> construct %eax and %edx separately, and not being able propagate constants.
>
> Loading 0 is possibly common enough to warrant another specialisation where we
> can use "a" (0), "d" (0) and forgo the MOV+SHR.
>
> I'm probably overthinking things. The addition will be in the noise in
> practice, and Intel are sure the advantage of MSR_IMM will not be.
It's not entirely clear to me what the overall effects are now with your
02/22 reply on the FRED series. Nevertheless a nit or two here.
> --- a/xen/arch/x86/include/asm/msr.h
> +++ b/xen/arch/x86/include/asm/msr.h
> @@ -38,9 +38,46 @@ static inline void wrmsrl(unsigned int msr, uint64_t val)
> wrmsr(msr, lo, hi);
> }
>
> +/*
> + * Non-serialising WRMSR with a compile-time constant index, when available.
> + * Falls back to plain WRMSRNS, or to a serialising WRMSR.
> + */
> +static always_inline void __wrmsrns_imm(uint32_t msr, uint64_t val)
> +{
> + /*
> + * For best performance, WRMSRNS %r64, $msr is recommended. For
> + * compatibility, we need to fall back to plain WRMSRNS, or to WRMSR.
> + *
> + * The combined ABI is awkward, because WRMSRNS $imm takes a single r64,
> + * whereas WRMSR{,NS} takes a split edx:eax pair.
> + *
> + * Always use WRMSRNS %rax, $imm, because it has the most in common with
> + * the legacy forms. When MSR_IMM isn't available, emit setup logic for
> + * %ecx and %edx too.
> + */
> + alternative_input_2(
> + "mov $%c[msr], %%ecx\n\t"
Simply %[msr] here?
And then, might it make sense to pull out this and ...
> + "mov %%rax, %%rdx\n\t"
> + "shr $32, %%rdx\n\t"
> + ".byte 0x2e; wrmsr",
> +
> + /* WRMSRNS %rax, $msr */
> + ".byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]", X86_FEATURE_MSR_IMM,
> +
> + "mov $%c[msr], %%ecx\n\t"
... this, to ...
> + "mov %%rax, %%rdx\n\t"
> + "shr $32, %%rdx\n\t"
> + ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS,
> +
> + [msr] "i" (msr), "a" (val) : "rcx", "rdx");
[msr] "i" (msr), "a" (val), "c" (msr) : "rdx");
allowing the compiler to actually know what's put in %ecx? That'll make
original and 2nd replacement code 10 bytes, better balancing with the 9
bytes of the 1st replacement. And I'd guess that the potentially dead
MOV to %ecx would be hidden in the noise as well.
Then, seeing your use of a CS: prefix on WRMSR, why not also add one to
the 1st replacement, thus not requiring any NOP padding?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |