[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression



On 30/05/17 09:24, Jan Beulich wrote:
>>>> On 29.05.17 at 21:05, <jgross@xxxxxxxx> wrote:
>> Creating the domains with
>>
>> xl -vvv create ...
>>
>> showed the numbers of superpages and normal pages allocated for the
>> domain.
>>
>> The following allocation pattern resulted in a slow domain:
>>
>> xc: detail: PHYSICAL MEMORY ALLOCATION:
>> xc: detail:   4KB PAGES: 0x0000000000000600
>> xc: detail:   2MB PAGES: 0x00000000000003f9
>> xc: detail:   1GB PAGES: 0x0000000000000000
>>
>> And this one was fast:
>>
>> xc: detail: PHYSICAL MEMORY ALLOCATION:
>> xc: detail:   4KB PAGES: 0x0000000000000400
>> xc: detail:   2MB PAGES: 0x00000000000003fa
>> xc: detail:   1GB PAGES: 0x0000000000000000
>>
>> I ballooned dom0 down in small steps to be able to create those
>> test cases.
>>
>> I believe the main reason is that some data needed by the benchmark
>> is located near the end of domain memory resulting in a rather high
>> TLB miss rate in case of not all (or nearly all) memory available in
>> form of 2MB pages.
> 
> Did you double check this by creating some other (persistent)
> process prior to running your benchmark? I find it rather
> unlikely that you would consistently see space from the top of
> guest RAM allocated to your test, unless it consumes all RAM
> that's available at the time it runs (but then I'd consider it
> quite likely for overhead of using the few smaller pages to be
> mostly hidden in the noise).
> 
> Or are you suspecting some crucial kernel structures to live
> there?

Yes, I do. When onlining memory at boot time the kernel is using the new
memory chunk to add the page structures and if needed new kernel page
tables. It is normally allocating that memory from the end of the new
chunk.

> 
>>>> What makes the whole problem even more mysterious is that the
>>>> regression was detected first with SLE12 SP3 (guest and dom0, Xen 4.9
>>>> and Linux 4.4) against older systems (guest and dom0). While trying
>>>> to find out whether the guest or the Xen version are the culprit I
>>>> found that the old guest (based on kernel 3.12) showed the mentioned
>>>> performance drop with above commit. The new guest (based on kernel
>>>> 4.4) shows the same bad performance regardless of the Xen version or
>>>> amount of free memory. I haven't found the Linux kernel commit yet
>>>> being responsible for that performance drop.
>>
>> And this might be result of a different memory usage of more recent
>> kernels: I suspect the critical data is now at the very end of the
>> domain's memory. As there are always some pages allocated in 4kB
>> chunks the last pages of the domain will never be part of a 2MB page.
> 
> But if the OS allocated large pages internally for relevant data
> structures, those obviously won't come from that necessarily 4k-
> mapped tail range.

Sure? I think the kernel is using 1GB pages if possible for direct
kernel mappings of the physical memory. It doesn't care for the last
page mapping some space not populated.

> 
>> Looking at meminit_hvm() in libxc doing the allocation of the memory
>> I realized it is kind of sub-optimal: shouldn't it try to allocate
>> the largest pages first and the smaller pages later?
> 
> Indeed this seems sub-optimal, yet the net effect isn't that
> dramatic (at least for sufficiently large guests): There may be up
> to two unnecessarily shattered 1G pages and at most one 2M
> one afaict.

Right. So there might be nearly 1 GB allocated using 2MB pages until
the first GB page is tried. This will rise the probability for a
failing GB allocation quite notably in case of dom0 having been
ballooned down for guest creation.

>> Would it be possible to make memory holes larger sometimes to avoid
>> having to use 4kB pages (with the exception of the first 2MB of the
>> domain, of course)?
> 
> Which holes are you thinking about here? The pre-determined
> one is at 0xF0000000 (i.e. is 2M-aligned already), and without
> pass-through devices with large BARs hvmloader won't do any
> relocation of RAM. Granted, when it does, doing so in larger
> than 64k chunks may be advantageous. To have any effect,
> that would require hypervisor side changes though, as
> xenmem_add_to_physmap() acts on individual 4k pages right
> now.

Okay.

>> Maybe it would even make sense to be able to tweak the allocation
>> pattern depending on the guest type: preferring large pages either
>> at the top or at the bottom of the domain's physical address space.
> 
> Why would top and bottom be better candidates for using large
> pages than the middle part of address space? Any such heuristic
> would surely need tailoring to the guest OS in order to not
> adversely affect some while helping others.

Right. This would have to be a guest configuration item.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.