|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] x86: improve MSR_SHADOW_GS accesses
>>> On 01.03.18 at 16:47, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 07/12/17 10:45, Jan Beulich wrote:
>>>>> On 06.12.17 at 20:34, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 06/12/17 16:37, Jan Beulich wrote:
>>>> @@ -196,6 +215,19 @@ static inline void wrgsbase(unsigned lon
>>>> wrmsrl(MSR_GS_BASE, base);
>>>> }
>>>>
>>>> +static inline void wrgsshadow(unsigned long base)
>>>> +{
>>>> + alternative_input("mov %[msr], %%ecx\n\t"
>>>> + "shld $32, %[val], %%rdx\n\t"
>>> This is a vector path instruction and specifically called out to be
>>> avoided in the AMD optimisation guide. On all hardware (according to
>>> Agner's latency mesurements) it alone has a longer latency to execute
>>> that the mov/shl pair you've replaced, and that is before accounting for
>>> mov elimination.
>> For one I doubt the latency of the SHLD will be noticable with
>> the latency of the following WRMSR. And then I think the main
>> goal should be to have optimal performance on modern CPUs.
>> That means size-optimizing original code, to reduce the amount
>> of NOPs needed when the alternative is being installed.
>
> I'm sorry if this comes across as blunt, but you are too focused on the
> wrong details.
>
> I agree that `swap;{rd,wr}gsbase;swap` is better than wrmsr, and we
> should be using it when available.
>
> However, it is simply not the case that squeezing every possible byte
> out of .text makes the best result. Just as with inc and dec, there is
> a very good reason why compilers don't emit shld, and that is because
> the ORM's recommend against their use. In this specific case, a mov/shl
> pair is better than shld on all hardware, even in older pipelines which
> don't do mov elimination.
Well, okay I'll switch away from using SHLD. It'll be a single padding
NOP only anyway, just a slightly larger one. And note that this wasn't
about size optimization, but about NOP padding reduction.
> You are going to need a more convincing argument than currently provided
> as to why the helpers aren't implemented like this:
>
> static inline unsigned long rdgsshadow(void)
> {
> unsigned long base;
>
> if ( cpu_has_fsgsbase )
> asm volatile ( "swapgs\n\t"
> #ifdef HAVE_GAS_FSGSBASE
> "rdgsbase %0\n\t"
> #else
> ".byte 0xf3, 0x48, 0x0f, 0xae, 0xc8\n\t"
> #endif
> "swapgs\n\t"
> :
> #ifdef HAVE_GAS_FSGSBASE
> "=r" (base)
> #else
> "=a" (base)
> #endif
> );
> else
> rdmsrl(MSR_SHADOW_GS_BASE, base);
>
> return base;
> }
>
> static inline void wrgsshadow(unsigned long base)
> {
> if ( cpu_has_fsgsbase )
> asm volatile ( "swapgs\n\t"
> #ifdef HAVE_GAS_FSGSBASE
> "wrgsbase %0\n\t"
> #else
> ".byte 0xf3, 0x48, 0x0f, 0xae, 0xd8\n\t"
> #endif
> "swapgs\n\t"
> ::
> #ifdef HAVE_GAS_FSGSBASE
> "r" (base)
> #else
> "a" (base)
> #endif
> );
> else
> wrmsrl(MSR_SHADOW_GS_BASE, base);
> }
>
> In particular:
> 1) This is consistent with existing {rd,wr}{fs,gs}base helpers.
Which, in due course, I would have meant to convert to alternatives
as well.
> 2) cpu_has_fsgsbase is constant after boot, so the branch will be
> predicted correctly.
If it has an entry in the BTB in the first place.
I can certainly also switch away from using alternatives here, but
considering especially my response to 1) I would do this quite
hesitantly.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |