Xen project Mailing List

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression

From: Juergen Gross <jgross@xxxxxxxx>

Date: Tue, 30 May 2017 12:33:55 +0200

Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 30 May 2017 10:33:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 30/05/17 09:24, Jan Beulich wrote: >>>> On 29.05.17 at 21:05, <jgross@xxxxxxxx> wrote: >> Creating the domains with >> >> xl -vvv create ... >> >> showed the numbers of superpages and normal pages allocated for the >> domain. >> >> The following allocation pattern resulted in a slow domain: >> >> xc: detail: PHYSICAL MEMORY ALLOCATION: >> xc: detail: 4KB PAGES: 0x0000000000000600 >> xc: detail: 2MB PAGES: 0x00000000000003f9 >> xc: detail: 1GB PAGES: 0x0000000000000000 >> >> And this one was fast: >> >> xc: detail: PHYSICAL MEMORY ALLOCATION: >> xc: detail: 4KB PAGES: 0x0000000000000400 >> xc: detail: 2MB PAGES: 0x00000000000003fa >> xc: detail: 1GB PAGES: 0x0000000000000000 >> >> I ballooned dom0 down in small steps to be able to create those >> test cases. >> >> I believe the main reason is that some data needed by the benchmark >> is located near the end of domain memory resulting in a rather high >> TLB miss rate in case of not all (or nearly all) memory available in >> form of 2MB pages. > > Did you double check this by creating some other (persistent) > process prior to running your benchmark? I find it rather > unlikely that you would consistently see space from the top of > guest RAM allocated to your test, unless it consumes all RAM > that's available at the time it runs (but then I'd consider it > quite likely for overhead of using the few smaller pages to be > mostly hidden in the noise). > > Or are you suspecting some crucial kernel structures to live > there? Yes, I do. When onlining memory at boot time the kernel is using the new memory chunk to add the page structures and if needed new kernel page tables. It is normally allocating that memory from the end of the new chunk. > >>>> What makes the whole problem even more mysterious is that the >>>> regression was detected first with SLE12 SP3 (guest and dom0, Xen 4.9 >>>> and Linux 4.4) against older systems (guest and dom0). While trying >>>> to find out whether the guest or the Xen version are the culprit I >>>> found that the old guest (based on kernel 3.12) showed the mentioned >>>> performance drop with above commit. The new guest (based on kernel >>>> 4.4) shows the same bad performance regardless of the Xen version or >>>> amount of free memory. I haven't found the Linux kernel commit yet >>>> being responsible for that performance drop. >> >> And this might be result of a different memory usage of more recent >> kernels: I suspect the critical data is now at the very end of the >> domain's memory. As there are always some pages allocated in 4kB >> chunks the last pages of the domain will never be part of a 2MB page. > > But if the OS allocated large pages internally for relevant data > structures, those obviously won't come from that necessarily 4k- > mapped tail range. Sure? I think the kernel is using 1GB pages if possible for direct kernel mappings of the physical memory. It doesn't care for the last page mapping some space not populated. > >> Looking at meminit_hvm() in libxc doing the allocation of the memory >> I realized it is kind of sub-optimal: shouldn't it try to allocate >> the largest pages first and the smaller pages later? > > Indeed this seems sub-optimal, yet the net effect isn't that > dramatic (at least for sufficiently large guests): There may be up > to two unnecessarily shattered 1G pages and at most one 2M > one afaict. Right. So there might be nearly 1 GB allocated using 2MB pages until the first GB page is tried. This will rise the probability for a failing GB allocation quite notably in case of dom0 having been ballooned down for guest creation. >> Would it be possible to make memory holes larger sometimes to avoid >> having to use 4kB pages (with the exception of the first 2MB of the >> domain, of course)? > > Which holes are you thinking about here? The pre-determined > one is at 0xF0000000 (i.e. is 2M-aligned already), and without > pass-through devices with large BARs hvmloader won't do any > relocation of RAM. Granted, when it does, doing so in larger > than 64k chunks may be advantageous. To have any effect, > that would require hypervisor side changes though, as > xenmem_add_to_physmap() acts on individual 4k pages right > now. Okay. >> Maybe it would even make sense to be able to tweak the allocation >> pattern depending on the guest type: preferring large pages either >> at the top or at the bottom of the domain's physical address space. > > Why would top and bottom be better candidates for using large > pages than the middle part of address space? Any such heuristic > would surely need tailoring to the guest OS in order to not > adversely affect some while helping others. Right. This would have to be a guest configuration item. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.