[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [ARM] Native application design and discussion (I hope)
On Wed, 10 May 2017, Volodymyr Babchuk wrote: > Hi Julien, > > Returning back to Native apps, I think we can make ctx switch even > faster by dropping p2m code. Imagine that we already created stage 1 > MMU for native application. Then to switch to app it we need only: > > 1. Enable TGE bit in HCR > 2. Disable VM bit in HCR > 3. Save/Program EL1_TTBR and friends > 3.5 (optionally) save/restore FPU state > 4. Save/Restore general purpose registers + SP + CSR + PC to jump to > an app in EL0 state. > > This can be done in "real" vcpu or in idle vcpu context. No differences there. > > Exception handling in hypervisor would became tricky because of vcpu > absence for native app. Current implementation of entry.S always says > general purpose registers to a vcpu structure. Basically, we should > teach entry.S and traps.c about native apps. > Am I missing something? The nicest way to do this is probably to create another saved_context in arch_vcpu for EL0 apps. That way, changes to traps.c and entry.S will be almost nothing. > > On 10 May 2017 at 13:48, Julien Grall <julien.grall@xxxxxxx> wrote: > > Hi George, > > > > > > On 05/10/2017 11:03 AM, George Dunlap wrote: > >> > >> On 10/05/17 11:00, Julien Grall wrote: > >>> > >>> > >>> > >>> On 05/10/2017 10:56 AM, George Dunlap wrote: > >>>> > >>>> On 09/05/17 19:29, Stefano Stabellini wrote: > >>>>> > >>>>> On Tue, 9 May 2017, Dario Faggioli wrote: > >>>>>>>> > >>>>>>>> And it should not be hard to give such code access to the context > >>>>>>>> of > >>>>>>>> the vCPU that was previously running (in x86, given we implement > >>>>>>>> what > >>>>>>>> we call lazy context switch, it's most likely still loaded in the > >>>>>>>> pCPU!). > >>>>>>> > >>>>>>> > >>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad > >>>>>>> idea. > >>>>>>> > >>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code > >>>>>>> (we > >>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to > >>>>>>> change exception level to run the code with lower privilege. > >>>>>>> > >>>>>>> Also IHMO, it should only be used when there are nothing to run and > >>>>>>> not > >>>>>>> re-purposed for running EL0 app. > >>>>>>> > >>>>>> It's already purposed for running when there is nothing to do _or_ > >>>>>> when > >>>>>> there are tasklets. > >>>>>> > >>>>>> I do see your point about privilege level, though. And I agree with > >>>>>> George that it looks very similar to when, in the x86 world, we tried > >>>>>> to put the infra together for switching to Ring3 to run some pieces of > >>>>>> Xen code. > >>>>> > >>>>> > >>>>> Right, and just to add to it, context switching to the idle vcpu has a > >>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is > >>>>> going to spend time on context switching, it is better to do it in a > >>>>> way that introduces a security boundary. > >>>> > >>>> > >>>> "Context switching" to the idle vcpu doesn't actually save or change any > >>>> registers, nor does it flush the TLB. It's more or less just accounting > >>>> for the scheduler. So it has a cost (going through the scheduler) but > >>>> not a very large one. > >>> > >>> > >>> It depends on the architecture. For ARM we don't yet support lazy > >>> context switch. So effectively, the cost to "context switch" to the idle > >>> vCPU will be quite high. > >> > >> > >> Oh, right. Sorry, I thought I had seen code implementing lazy context > >> switch in ARM, but I must have imagined it. That is indeed a material > >> consideration. > >> > >> Is there a particular reason that lazy context switch is difficult on > >> ARM? If not it should be a fairly important bit of low-hanging fruit > >> from a performance perspective. > > > > > > I am not entirely sure what you are doing on x86. Let me explain what we do > > and why context switch is heavy on ARM. > > > > In the case of ARM, when entering to the hypervisor, we only save the bare > > minimum (all non-banked registers + registers useful for handling guest > > request), and left the rest untouched. > > > > Our save/restore functions are quite big because it involving saving/restore > > state of the interrupt controller, FPU... So we have a fast exit/entry but > > slow context switch. > > > > What we currently do is avoiding save/restore the idle vCPU because we > > always stay in the hypervisor exception level. However we still restore all > > the registers of the previous running vCPU and restore the one of the next > > running vCPU. > > > > This has a big impact on the workload when running vCPU and waiting for > > interrupts (hence the patch from Stefano to limit entering in the hypervisor > > though it is not by default). > > > > I made the assumption the idle vCPU is only running when nothing has to be > > done. But as you mentioned tasklet can be done there too. So running tasklet > > on Xen ARM will have an high cost. > > > > A list of optimization we could do on ARM is: > > - Avoiding restore if the vCPU stay the same before and after idle > > vPCU > > - Avoiding save/restore if vCPU is dedicated to a pCPU > > > > Do you have any other optimization on x86? > > > > Cheers, > > > > -- > > Julien Grall > > > > -- > WBR Volodymyr Babchuk aka lorc [+380976646013] > mailto: vlad.babchuk@xxxxxxxxx > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |