Re: [Xen-devel] [ARM] Native application design and discussion (I hope)

Hi George,

On 05/10/2017 11:03 AM, George Dunlap wrote:
On 10/05/17 11:00, Julien Grall wrote:

On 05/10/2017 10:56 AM, George Dunlap wrote:
On 09/05/17 19:29, Stefano Stabellini wrote:
On Tue, 9 May 2017, Dario Faggioli wrote:
And it should not be hard to give such code access to the context
the vCPU that was previously running (in x86, given we implement
we call lazy context switch, it's most likely still loaded in the

I agree with Stefano, switching to the idle vCPU is a pretty bad

the idle vCPU is a fake vCPU on ARM to stick with the common code
never leave the hypervisor). In the case of the EL0 app, we want to
change exception level to run the code with lower privilege.

Also IHMO, it should only be used when there are nothing to run and
re-purposed for running EL0 app.

It's already purposed for running when there is nothing to do _or_ when
there are tasklets.

I do see your point about privilege level, though. And I agree with
George that it looks very similar to when, in the x86 world, we tried
to put the infra together for switching to Ring3 to run some pieces of
Xen code.

Right, and just to add to it, context switching to the idle vcpu has a
cost, but it doesn't give us any security benefits whatsever. If Xen is
going to spend time on context switching, it is better to do it in a
way that introduces a security boundary.

"Context switching" to the idle vcpu doesn't actually save or change any
registers, nor does it flush the TLB.  It's more or less just accounting
for the scheduler.  So it has a cost (going through the scheduler) but
not a very large one.

It depends on the architecture. For ARM we don't yet support lazy
context switch. So effectively, the cost to "context switch" to the idle
vCPU will be quite high.

Oh, right.  Sorry, I thought I had seen code implementing lazy context
switch in ARM, but I must have imagined it.  That is indeed a material

Is there a particular reason that lazy context switch is difficult on
ARM?  If not it should be a fairly important bit of low-hanging fruit
from a performance perspective.

I am not entirely sure what you are doing on x86. Let me explain what we do and why context switch is heavy on ARM.

In the case of ARM, when entering to the hypervisor, we only save the bare minimum (all non-banked registers + registers useful for handling guest request), and left the rest untouched.

Our save/restore functions are quite big because it involving saving/restore state of the interrupt controller, FPU... So we have a fast exit/entry but slow context switch.

What we currently do is avoiding save/restore the idle vCPU because we always stay in the hypervisor exception level. However we still restore all the registers of the previous running vCPU and restore the one of the next running vCPU.

This has a big impact on the workload when running vCPU and waiting for interrupts (hence the patch from Stefano to limit entering in the hypervisor though it is not by default).

I made the assumption the idle vCPU is only running when nothing has to be done. But as you mentioned tasklet can be done there too. So running tasklet on Xen ARM will have an high cost.

A list of optimization we could do on ARM is:
        - Avoiding restore if the vCPU stay the same before and after idle vPCU
        - Avoiding save/restore if vCPU is dedicated to a pCPU

Do you have any other optimization on x86?


Julien Grall

