[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IRQ latency measurements in hypervisor

To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
From: Julien Grall <julien@xxxxxxx>
Date: Fri, 15 Jan 2021 17:13:57 +0000
Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Julien Grall <jgrall@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxx>, "Bertrand.Marquis@xxxxxxx" <Bertrand.Marquis@xxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>
Delivery-date: Fri, 15 Jan 2021 17:14:12 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>



On 15/01/2021 15:45, Volodymyr Babchuk wrote:


Hi Julien,

Julien Grall writes:

Hi Volodymyr, Stefano,

On 14/01/2021 23:33, Stefano Stabellini wrote:

+ Bertrand, Andrew (see comment on alloc_heap_pages())


Long running hypercalls are usually considered security issues.

In this case, only the control domain can issue large memory
allocation (2GB at a time). Guest, would only be able to allocate 2MB
at the time, so from the numbers below, it would only take 1ms max.

So I think we are fine here. Next time, you find a large loop, please
provide an explanation why they are not security issues (e.g. cannot
be used by guests) or send an email to the Security Team in doubt.


Sure. In this case I took into account that only control domain can
issue this call, I just didn't stated this explicitly. Next time will
do.

I am afraid that's not correct. The guest can request to populate aregion. This is used for instance in the ballooning case.

The main difference is a non-privileged guest will not be able to doallocation larger than 2MB.


[...]

This is very interestingi too. Did you get any spikes with the
period
set to 100us? It would be fantastic if there were none.

3. Huge latency spike during domain creation. I conducted some
     additional tests, including use of PV drivers, but this didn't
     affected the latency in my "real time" domain. But attempt to
     create another domain with relatively large memory size of 2GB led
     to huge spike in latency. Debugging led to this call path:

     XENMEM_populate_physmap -> populate_physmap() ->
     alloc_domheap_pages() -> alloc_heap_pages()-> huge
     "for ( i = 0; i < (1 << order); i++ )" loop.


There are two for loops in alloc_heap_pages() using this syntax. Which
one are your referring to?


I did some tracing with Lautrebach. It pointed to the first loop and
especially to flush_page_to_ram() call if I remember correctly.

Thanks, I am not entirely surprised because we are clean andinvalidating the region line by line and across all the CPUs.

If we are assuming 128 bytes cacheline, we will need to issue 32 cacheinstructions per page. This going to involve quite a bit of traffic onthe system.

One possibility would be to defer the cache flush when the domain iscreated and use the hypercall XEN_DOMCTL_cacheflush to issue the flush.

Note that XEN_DOMCTL_cacheflush would need some modification to bepreemptible. But at least, it will work on a GFN which is easier to track.

I managed to overcome the issue #3 by commenting out all calls to
populate_one_size() except the populate_one_size(PFN_4K_SHIFT) in
xg_dom_arm.c. This lengthened domain construction, but my "RT" domain
didn't experienced so big latency issues. Apparently all other
hypercalls which are used during domain creation are either fast or
preemptible. No doubts that my hack lead to page tables inflation and
overall performance drop.

I think we need to follow this up and fix this. Maybe just by adding
a hypercall continuation to the loop.


When I read "hypercall continuation", I read we will return to the
guest context so it can process interrupts and potentially switch to
another task.

This means that the guest could issue a second populate_physmap() from
the vCPU. Therefore any restart information should be part of the
hypercall parameters. So far, I don't see how this would be possible.

Even if we overcome that part, this can be easily abuse by a guest as
the memory is not yet accounted to the domain. Imagine a guest that
never request the continuation of the populate_physmap(). So we would
need to block the vCPU until the allocation is finished.


Moreover, most of the alloc_heap_pages() sits under spinlock, so first
step would be to split this function into smaller atomic parts.


Do you have any suggestion how to split it?

I think the first step is we need to figure out which part of the
allocation is slow (see my question above). From there, we can figure
out if there is a way to reduce the impact.
I'll do more tracing and will return with more accurate numbers.But as far as I can see, any loop on 262144 pages will take some time..

It really depends on the content of the loop. On any modern processors,you are very likely not going to notice a loop that update just a flag.

However, you are likely going to be see an impact if your loop is goingto clean & invalidate the cache for each page.


Cheers,

--
Julien Grall

Follow-Ups:
- Re: IRQ latency measurements in hypervisor
  - From: Volodymyr Babchuk
- Re: IRQ latency measurements in hypervisor
  - From: Stefano Stabellini

References:
- IRQ latency measurements in hypervisor
  - From: Volodymyr Babchuk
- Re: IRQ latency measurements in hypervisor
  - From: Stefano Stabellini
- Re: IRQ latency measurements in hypervisor
  - From: Julien Grall
- Re: IRQ latency measurements in hypervisor
  - From: Volodymyr Babchuk

Prev by Date: Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
Next by Date: Re: [PATCH] libs/light: fix uuid on NetBSD
Previous by thread: Re: IRQ latency measurements in hypervisor
Next by thread: Re: IRQ latency measurements in hypervisor
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.