Xen project Mailing List

Re: IRQ latency measurements in hypervisor

On Fri, 15 Jan 2021, Julien Grall wrote: > On 15/01/2021 15:45, Volodymyr Babchuk wrote: > > > > Hi Julien, > > > > Julien Grall writes: > > > > > Hi Volodymyr, Stefano, > > > > > > On 14/01/2021 23:33, Stefano Stabellini wrote: > > > > + Bertrand, Andrew (see comment on alloc_heap_pages()) > > > > > > Long running hypercalls are usually considered security issues. > > > > > > In this case, only the control domain can issue large memory > > > allocation (2GB at a time). Guest, would only be able to allocate 2MB > > > at the time, so from the numbers below, it would only take 1ms max. > > > > > > So I think we are fine here. Next time, you find a large loop, please > > > provide an explanation why they are not security issues (e.g. cannot > > > be used by guests) or send an email to the Security Team in doubt. > > > > Sure. In this case I took into account that only control domain can > > issue this call, I just didn't stated this explicitly. Next time will > > do. > > I am afraid that's not correct. The guest can request to populate a region. > This is used for instance in the ballooning case. > > The main difference is a non-privileged guest will not be able to do > allocation larger than 2MB. > > [...] > > > > > This is very interestingi too. Did you get any spikes with the > > > > period > > > > set to 100us? It would be fantastic if there were none. > > > > > > > > > 3. Huge latency spike during domain creation. I conducted some > > > > > additional tests, including use of PV drivers, but this didn't > > > > > affected the latency in my "real time" domain. But attempt to > > > > > create another domain with relatively large memory size of 2GB > > > > > led > > > > > to huge spike in latency. Debugging led to this call path: > > > > > > > > > > XENMEM_populate_physmap -> populate_physmap() -> > > > > > alloc_domheap_pages() -> alloc_heap_pages()-> huge > > > > > "for ( i = 0; i < (1 << order); i++ )" loop. > > > > > > There are two for loops in alloc_heap_pages() using this syntax. Which > > > one are your referring to? > > > > I did some tracing with Lautrebach. It pointed to the first loop and > > especially to flush_page_to_ram() call if I remember correctly. > > Thanks, I am not entirely surprised because we are clean and invalidating the > region line by line and across all the CPUs. > > If we are assuming 128 bytes cacheline, we will need to issue 32 cache > instructions per page. This going to involve quite a bit of traffic on the > system. I think Julien is most likely right. It would be good to verify this with an experiment. For instance, you could remove the flush_page_to_ram() call for one test and see if you see any latency problems. > One possibility would be to defer the cache flush when the domain is created > and use the hypercall XEN_DOMCTL_cacheflush to issue the flush. > > Note that XEN_DOMCTL_cacheflush would need some modification to be > preemptible. But at least, it will work on a GFN which is easier to track. This looks like a solid suggestion. XEN_DOMCTL_cacheflush is already used by the toolstack in a few places. I am also wondering if we can get away with fewer flush_page_to_ram() calls from alloc_heap_pages() for memory allocations done at boot time soon after global boot memory scrubbing. > > > > > I managed to overcome the issue #3 by commenting out all calls to > > > > > populate_one_size() except the populate_one_size(PFN_4K_SHIFT) in > > > > > xg_dom_arm.c. This lengthened domain construction, but my "RT" domain > > > > > didn't experienced so big latency issues. Apparently all other > > > > > hypercalls which are used during domain creation are either fast or > > > > > preemptible. No doubts that my hack lead to page tables inflation and > > > > > overall performance drop. > > > > I think we need to follow this up and fix this. Maybe just by adding > > > > a hypercall continuation to the loop. > > > > > > When I read "hypercall continuation", I read we will return to the > > > guest context so it can process interrupts and potentially switch to > > > another task. > > > > > > This means that the guest could issue a second populate_physmap() from > > > the vCPU. Therefore any restart information should be part of the > > > hypercall parameters. So far, I don't see how this would be possible. > > > > > > Even if we overcome that part, this can be easily abuse by a guest as > > > the memory is not yet accounted to the domain. Imagine a guest that > > > never request the continuation of the populate_physmap(). So we would > > > need to block the vCPU until the allocation is finished. > > > > Moreover, most of the alloc_heap_pages() sits under spinlock, so first > > step would be to split this function into smaller atomic parts. > > Do you have any suggestion how to split it? > > > > > > I think the first step is we need to figure out which part of the > > > allocation is slow (see my question above). From there, we can figure > > > out if there is a way to reduce the impact. > > > > I'll do more tracing and will return with more accurate numbers. But as far > > as I can see, any loop on 262144 pages will take some time.. > . > > It really depends on the content of the loop. On any modern processors, you > are very likely not going to notice a loop that update just a flag. > > However, you are likely going to be see an impact if your loop is going to > clean & invalidate the cache for each page.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.