Xen project Mailing List

Re: [Xen-devel] [ARM] Native application design and discussion (I hope)

From: Volodymyr Babchuk <vlad.babchuk@xxxxxxxxx>

Date: Wed, 10 May 2017 20:37:10 +0300

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrii Anisov <andrii_anisov@xxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxx>, Artem Mygaiev <joculator@xxxxxxxxx>

Delivery-date: Wed, 10 May 2017 17:37:44 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi Julien, Returning back to Native apps, I think we can make ctx switch even faster by dropping p2m code. Imagine that we already created stage 1 MMU for native application. Then to switch to app it we need only: 1. Enable TGE bit in HCR 2. Disable VM bit in HCR 3. Save/Program EL1_TTBR and friends 3.5 (optionally) save/restore FPU state 4. Save/Restore general purpose registers + SP + CSR + PC to jump to an app in EL0 state. This can be done in "real" vcpu or in idle vcpu context. No differences there. Exception handling in hypervisor would became tricky because of vcpu absence for native app. Current implementation of entry.S always says general purpose registers to a vcpu structure. Basically, we should teach entry.S and traps.c about native apps. Am I missing something? On 10 May 2017 at 13:48, Julien Grall <julien.grall@xxxxxxx> wrote: > Hi George, > > > On 05/10/2017 11:03 AM, George Dunlap wrote: >> >> On 10/05/17 11:00, Julien Grall wrote: >>> >>> >>> >>> On 05/10/2017 10:56 AM, George Dunlap wrote: >>>> >>>> On 09/05/17 19:29, Stefano Stabellini wrote: >>>>> >>>>> On Tue, 9 May 2017, Dario Faggioli wrote: >>>>>>>> >>>>>>>> And it should not be hard to give such code access to the context >>>>>>>> of >>>>>>>> the vCPU that was previously running (in x86, given we implement >>>>>>>> what >>>>>>>> we call lazy context switch, it's most likely still loaded in the >>>>>>>> pCPU!). >>>>>>> >>>>>>> >>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad >>>>>>> idea. >>>>>>> >>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code >>>>>>> (we >>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to >>>>>>> change exception level to run the code with lower privilege. >>>>>>> >>>>>>> Also IHMO, it should only be used when there are nothing to run and >>>>>>> not >>>>>>> re-purposed for running EL0 app. >>>>>>> >>>>>> It's already purposed for running when there is nothing to do _or_ >>>>>> when >>>>>> there are tasklets. >>>>>> >>>>>> I do see your point about privilege level, though. And I agree with >>>>>> George that it looks very similar to when, in the x86 world, we tried >>>>>> to put the infra together for switching to Ring3 to run some pieces of >>>>>> Xen code. >>>>> >>>>> >>>>> Right, and just to add to it, context switching to the idle vcpu has a >>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is >>>>> going to spend time on context switching, it is better to do it in a >>>>> way that introduces a security boundary. >>>> >>>> >>>> "Context switching" to the idle vcpu doesn't actually save or change any >>>> registers, nor does it flush the TLB. It's more or less just accounting >>>> for the scheduler. So it has a cost (going through the scheduler) but >>>> not a very large one. >>> >>> >>> It depends on the architecture. For ARM we don't yet support lazy >>> context switch. So effectively, the cost to "context switch" to the idle >>> vCPU will be quite high. >> >> >> Oh, right. Sorry, I thought I had seen code implementing lazy context >> switch in ARM, but I must have imagined it. That is indeed a material >> consideration. >> >> Is there a particular reason that lazy context switch is difficult on >> ARM? If not it should be a fairly important bit of low-hanging fruit >> from a performance perspective. > > > I am not entirely sure what you are doing on x86. Let me explain what we do > and why context switch is heavy on ARM. > > In the case of ARM, when entering to the hypervisor, we only save the bare > minimum (all non-banked registers + registers useful for handling guest > request), and left the rest untouched. > > Our save/restore functions are quite big because it involving saving/restore > state of the interrupt controller, FPU... So we have a fast exit/entry but > slow context switch. > > What we currently do is avoiding save/restore the idle vCPU because we > always stay in the hypervisor exception level. However we still restore all > the registers of the previous running vCPU and restore the one of the next > running vCPU. > > This has a big impact on the workload when running vCPU and waiting for > interrupts (hence the patch from Stefano to limit entering in the hypervisor > though it is not by default). > > I made the assumption the idle vCPU is only running when nothing has to be > done. But as you mentioned tasklet can be done there too. So running tasklet > on Xen ARM will have an high cost. > > A list of optimization we could do on ARM is: > - Avoiding restore if the vCPU stay the same before and after idle > vPCU > - Avoiding save/restore if vCPU is dedicated to a pCPU > > Do you have any other optimization on x86? > > Cheers, > > -- > Julien Grall -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@xxxxxxxxx _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.