[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/6] xen/arm: smccc: Add wrapper to automatically select the calling convention





On 28/08/18 15:40, Volodymyr Babchuk wrote:
Hi Julien,

On 28.08.18 17:05, Julien Grall wrote:
Hi Volodymyr,

On 27/08/18 17:50, Volodymyr Babchuk wrote:
On 27.08.18 18:29, Julien Grall wrote:
On 27/08/2018 15:15, Volodymyr Babchuk wrote:
On 24.08.18 19:58, Julien Grall wrote:
Signed-off-by: Julien Grall <julien.grall@xxxxxxx>
This is indeed increasing the size of the function, but with a better performance when you perform an SMC with 1.1.

The main goal of SMCCC 1.1 is to limit the number of registers saved when calling an SMC. So if you provide provide a wrapper using a function, then you are better off sticking with SMCCC 1.0.

The idea of this code is to provide a faster call on the presence of SMCCC 1.1 while providing a fallback for other case. The function cpus_have_const_cap is implemented using an ALTERNATIVE (similar to static key) that will make if close to a NOP. I am saying close because this is not quite a static key as we don't have it in Xen. Instead, you replace a memory load by a constant.
Ah, yes, true.

I checked how static keys are working. Seems, they use asm goto feature. Now I think: can we optimize this more? Something like that:

#define arm_smccc_smc(...)
     do {

             asm goto (ALTERNATIVE(nop",
                                   "b %l[smccc_1_1]",
                                   ARM_SMCCC_1_1));

You would need to list the label in GotoLabels (see 6.45.2.7 [1]).
Yes, but as you said, we can use __LINE__ there.

             arm_smccc_1_0_smc(__VA_ARGS__);
             break;
smccc_1_1:
The label will get redefined if you have multiple SMC call in the same function. We could possibly generate label based on the file/line.

             arm_smccc_1_1_smc(__VA_ARGS__);
    } while ( 0 )

This will save use additional conditional branch.
The code with this patch looks like:

mov x0, #0        mov x0, #1
cbz w0, label        cbz w0, label

The mov + conditional branch is likely to be negligible over the cost of switching to EL3. Overall, I am not really convinced that it would be worth the cost of open-coding it. I would prefer if we keep the call to cpus_have_const_cap() and see if it can be optimized.
Okay. I'm fine with this. I just saw one more way to optimize and wanted
to share it with you. Anyways:

Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>

Thank you!

I have looked at cpus_have_const_cap() and haven't found good way to optimize it with the current infrastructure in Xen. Feel free to suggest improvement.

Another thing: maybe it is worth to branch to 1.0 code and leave 1.1 in a straight path of execution? This will save you one more instruction for SMCCC 1.1 call.

I am not sure to understand your suggestion here. Could you expand?

However, why SMCCC 1.1 should be in the straight path of execution?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.