[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/current: Provide additional information to optimise get_cpu_info()



On 01/09/14 12:24, Jan Beulich wrote:
>>>> On 01.09.14 at 12:58, <andrew.cooper3@xxxxxxxxxx> wrote:
>> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
>> inline assembly "or" in get_cpu_info()", this is achieved by providing more
>> information to the compiler.
>>
>> With this modification, gcc replaces the older:
>>     mov imm, %reg
>>     and %rsp, %reg
>>
>> with:
>>     mov %rsp, %reg
>>     and imm, %reg
>>
>> which is one byte shorter.
> I'm in no way opposed to the change, but is that really true? Afaict
> it can be 1 byte shorter only when %rax gets selected as the register
> here.

Oh - quite possibly only %rax, but that still makes up the majority of
instances in shorter functions, where %rax was previously chosen as well.

I also note that the exact position of the lookup gets deferred in some
cases until after an early exit from the function.

>
>>  It also considers all general purpose registers
>> for %reg rather than just the legacy ones (i.e. will now use %r12 etc), 
>> which
>> allows for better register scheduling in larger functions.
> Same here - why would with the old code not all registers be
> available for selection by the compiler?

I suspect it has something to do with the choices available from the asm
parameter.  There no mnemonics to specify the newer registers, which is
a holdover from the 32bit days.  I suspect there is some implicit limit
to just the legacy GPRs.

Either way, my observations of the change in generated asm is that
before the change, no REX.R registers were used, whereas they are used
afterwards.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.