[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] SVM: streamline entry.S code



>>> On 26.08.13 at 23:47, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 23/08/2013 15:04, Jan Beulich wrote:
>> --- a/xen/arch/x86/hvm/svm/entry.S
>> +++ b/xen/arch/x86/hvm/svm/entry.S
>> @@ -32,28 +32,34 @@
>>  #define CLGI   .byte 0x0F,0x01,0xDD
>>  
>>  ENTRY(svm_asm_do_resume)
>> +        GET_CURRENT(%rbx)
>> +.Lsvm_do_resume:
>>          call svm_intr_assist
>>          mov  %rsp,%rdi
>>          call nsvm_vcpu_switch
>>          ASSERT_NOT_IN_ATOMIC
>>  
>> -        GET_CURRENT(%rbx)
>> -        CLGI
>> -
>>          mov  VCPU_processor(%rbx),%eax
>> -        shl  $IRQSTAT_shift,%eax
>>          lea  irq_stat+IRQSTAT_softirq_pending(%rip),%rdx
>> -        cmpl $0,(%rdx,%rax,1)
>> +        xor  %ecx,%ecx
>> +        shl  $IRQSTAT_shift,%eax
>> +        CLGI
>> +        cmp  %ecx,(%rdx,%rax,1)
>>          jne  .Lsvm_process_softirqs
>>  
>> -        testb $0, VCPU_nsvm_hap_enabled(%rbx)
>> -UNLIKELY_START(nz, nsvm_hap)
> 
> Isn't this as much a bugfix as an optimisation?  test $0, <anything>
> should unconditionally set the zero flag, making the UNLIKELY_START()
> unreachable code.

Indeed, I shall add a note about this to the description.

>>          call svm_asid_handle_vmrun
>>  
> 
> The next instruction after this hunk is "cmpb $0,tl_init_done(%rip)",
> where %cl could be used, in the same style of changes as above.

Yeah, but the call clobbers %rcx.

>> @@ -72,13 +78,12 @@ UNLIKELY_END(svm_trace)
>>          mov  UREGS_eflags(%rsp),%rax
>>          mov  %rax,VMCB_rflags(%rcx)
> 
> This mov could be moved down as well, which will break the data hazard
> on %rax.  It might however want a comment with it, as that would
> certainly make the "mov  UREGS_eflags(%rsp),%rax; mov 
> %rax,VMCB_rflags(%rcx)" pair less obvious.

That would be applicable to the whole series of moves here.
However, I don't think there's a dependency issue here on
modern CPUs (sadly neither Intel's nor AMD's optimization docs
go into enough detail about this), largely thanks to register
renaming.

>> @@ -92,25 +97,26 @@ UNLIKELY_END(svm_trace)
>>  
>>          VMRUN
>>  
>> +        GET_CURRENT(%rax)
>>          push %rdi
>>          push %rsi
>>          push %rdx
>>          push %rcx
>> +        mov  VCPU_svm_vmcb(%rax),%rcx
>>          push %rax
> 
> As regs->rax is overwritten just below, this could be replaced with "sub
> $8,%rsp" which mirrors the the add before VMRUN, and would also help to
> break up the queue of pushes in the pipeline.

But whether the CPU can better handle successive pushes or
pushes (producing and consuming %rsp) interleaved with an
ALU adjustment to %rsp is not clear.

> All of those points were just nitpicking, and the changes are mainly
> along a similar line to the VMX.  I still have the same concern about
> jumping back to the top of the function, but I shall investigate that
> and start a new thread.
> 
> Therefore,
> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Thanks.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.