[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [ARM] Native application design and discussion (I hope)
On 26/04/17 22:44, Volodymyr Babchuk wrote: Hi Julien, Hi Volodymyr, On 25 April 2017 at 14:43, Julien Grall <julien.grall@xxxxxxx> wrote:We will also need another type of application: one which is periodically called by XEN itself, not actually servicing any domain request. This is needed for a coprocessor sharing framework scheduler implementation.EL0 apps can be a powerful new tool for us to use, but they are not the solution to everything. This is where I would draw the line: if the workload needs to be scheduled periodically, then it is not a good fit for an EL0 app.From my last conversation with Volodymyr I've got a feeling that notions "EL0" and "XEN native application" must be pretty orthogonal. In [1] Volodymyr got no performance gain from changing domain's exception level from EL1 to EL0. Only when Volodymyr stripped the domain's context abstraction (i.e. dropped GIC context store/restore) some noticeable results were reached.Do you have numbers for part that take times in the save/restore? You mention GIC and I am a bit surprised you don't mention FPU.I did it in the other thread. Check out [1]. The most speed up I got after removing vGIC context handlingOh, yes. Sorry I forgot this thread. Continuing on that, you said that "Now profiler shows that hypervisor spends time in spinlocks and p2m code." Could you expand here? How the EL0 app will spend time in p2m code?I don't quite remember. It was somewhere around p2m save/restore context functions. I'll try to restore that setup and will provide more details.Similarly, why spinlocks take time? Are they contented?Problem is that my profiler does not show stack, so I can't say which spinlock causes this. But profiler didn't showed that CPU spend much time in spinlock wait loop. So looks like there are no contention.I would have a look at optimizing the context switch path. Some ideas: - there are a lot of unnecessary isb/dsb. The registers used by the guests only will be synchronized by eret.I have removed (almost) all of them. No significant changes in latency.- FPU is taking time to save/restore, you could make it lazyThis also does not takes much time.- It might be possible to limit the number of LRs saved/restored depending on the number of LRs used by a domain.Excuse me, what is LR in this context?Sorry I meant GIC LRs (see GIC save/restore code). They are used to list the interrupts injected to the guest. All of they may not be used at the time of the context switch.As I said, I don't call GIC save and restore routines, So, that should no be an issue (if I got that right). Well, my point was that maybe you can limit the time in gic save/restore code rather than completely ignore them. For instance, if you don't save/restore the GIC you will need to disable the vGIC (GICH_HCR.En) to avoid interrupt injection when running the EL0 app. I don't see this code here. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |