[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [ARM] Native application design and discussion (I hope)

Hi Julien,

Returning back to Native apps, I think we can make ctx switch even
faster by dropping p2m code. Imagine that we already created stage 1
MMU for native application. Then to switch to app it we need only:

1. Enable TGE bit in HCR
2. Disable VM bit in HCR
3. Save/Program EL1_TTBR and friends
3.5 (optionally) save/restore FPU state
4. Save/Restore general purpose registers + SP + CSR + PC to jump to
an app in EL0 state.

This can be done in "real" vcpu or in idle vcpu context. No differences there.

Exception handling in hypervisor would became tricky because of vcpu
absence for native app. Current implementation of entry.S always says
general purpose registers to a vcpu structure. Basically, we should
teach entry.S and traps.c about native apps.
Am I missing something?

On 10 May 2017 at 13:48, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi George,
> On 05/10/2017 11:03 AM, George Dunlap wrote:
>> On 10/05/17 11:00, Julien Grall wrote:
>>> On 05/10/2017 10:56 AM, George Dunlap wrote:
>>>> On 09/05/17 19:29, Stefano Stabellini wrote:
>>>>> On Tue, 9 May 2017, Dario Faggioli wrote:
>>>>>>>> And it should not be hard to give such code access to the context
>>>>>>>> of
>>>>>>>> the vCPU that was previously running (in x86, given we implement
>>>>>>>> what
>>>>>>>> we call lazy context switch, it's most likely still loaded in the
>>>>>>>> pCPU!).
>>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>>>>>>> idea.
>>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>>>>>>> (we
>>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to
>>>>>>> change exception level to run the code with lower privilege.
>>>>>>> Also IHMO, it should only be used when there are nothing to run and
>>>>>>> not
>>>>>>> re-purposed for running EL0 app.
>>>>>> It's already purposed for running when there is nothing to do _or_
>>>>>> when
>>>>>> there are tasklets.
>>>>>> I do see your point about privilege level, though. And I agree with
>>>>>> George that it looks very similar to when, in the x86 world, we tried
>>>>>> to put the infra together for switching to Ring3 to run some pieces of
>>>>>> Xen code.
>>>>> Right, and just to add to it, context switching to the idle vcpu has a
>>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is
>>>>> going to spend time on context switching, it is better to do it in a
>>>>> way that introduces a security boundary.
>>>> "Context switching" to the idle vcpu doesn't actually save or change any
>>>> registers, nor does it flush the TLB.  It's more or less just accounting
>>>> for the scheduler.  So it has a cost (going through the scheduler) but
>>>> not a very large one.
>>> It depends on the architecture. For ARM we don't yet support lazy
>>> context switch. So effectively, the cost to "context switch" to the idle
>>> vCPU will be quite high.
>> Oh, right.  Sorry, I thought I had seen code implementing lazy context
>> switch in ARM, but I must have imagined it.  That is indeed a material
>> consideration.
>> Is there a particular reason that lazy context switch is difficult on
>> ARM?  If not it should be a fairly important bit of low-hanging fruit
>> from a performance perspective.
> I am not entirely sure what you are doing on x86. Let me explain what we do
> and why context switch is heavy on ARM.
> In the case of ARM, when entering to the hypervisor, we only save the bare
> minimum (all non-banked registers + registers useful for handling guest
> request),  and left the rest untouched.
> Our save/restore functions are quite big because it involving saving/restore
> state of the interrupt controller, FPU... So we have a fast exit/entry but
> slow context switch.
> What we currently do is avoiding save/restore the idle vCPU because we
> always stay in the hypervisor exception level. However we still restore all
> the registers of the previous running vCPU and restore the one of the next
> running vCPU.
> This has a big impact on the workload when running vCPU and waiting for
> interrupts (hence the patch from Stefano to limit entering in the hypervisor
> though it is not by default).
> I made the assumption the idle vCPU is only running when nothing has to be
> done. But as you mentioned tasklet can be done there too. So running tasklet
> on Xen ARM will have an high cost.
> A list of optimization we could do on ARM is:
>         - Avoiding restore if the vCPU stay the same before and after idle
> vPCU
>         - Avoiding save/restore if vCPU is dedicated to a pCPU
> Do you have any other optimization on x86?
> Cheers,
> --
> Julien Grall

WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@xxxxxxxxx

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.