Xen project Mailing List

Re: [Xen-devel] [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map

To: Roger Pau Monne <roger.pau@xxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 11 Aug 2016 19:28:03 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Thu, 11 Aug 2016 18:28:17 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 05/08/16 10:40, Roger Pau Monne wrote: > On Thu, Aug 04, 2016 at 07:43:39PM +0100, Andrew Cooper wrote: >> On 02/08/16 10:19, Roger Pau Monne wrote: >>> On Fri, Jul 29, 2016 at 08:04:12PM +0100, Andrew Cooper wrote: >>>> On 29/07/16 17:29, Roger Pau Monne wrote: >>>>> +/* Calculate the biggest usable order given a size in bytes. */ >>>>> +static inline unsigned int get_order(uint64_t size) >>>>> +{ >>>>> + unsigned int order; >>>>> + uint64_t pg; >>>>> + >>>>> + ASSERT((size & ~PAGE_MASK) == 0); >>>>> + pg = PFN_DOWN(size); >>>>> + for ( order = 0; pg >= (1 << (order + 1)); order++ ); >>>>> + >>>>> + return order; >>>>> +} >>>> We already have get_order_from_bytes() and get_order_from_pages(), the >>>> latter of which looks like it will suit your usecase. >>> Not really, or at least they don't do the same as get_order. This function >>> calculates the maximum order you can use so that there are no pages left >>> over, (ie: if you have a size of 3145728bytes (3MiB), this function will >>> return order 9 (2MiB), while the other ones will return order 10 (4MiB)). I >>> don't really understand while other places in code request bigger orders >>> and >>> then free the leftovers, isn't this also causing memory shattering? >> Sounds like we want something like get_order_{floor,ceil}() which makes >> it obvious which way non-power-of-two get rounded. > Right, that makes sense, will rename the current one to ceil, and add the > floor variant. > >>>>> + if ( order == 0 && memflags ) >>>>> + { >>>>> + /* Try again without any memflags. */ >>>>> + memflags = 0; >>>>> + order = MAX_ORDER; >>>>> + continue; >>>>> + } >>>>> + if ( order == 0 ) >>>>> + panic("Unable to allocate memory with order 0!\n"); >>>>> + order--; >>>>> + continue; >>>>> + } >>>> It would be far more efficient to try and allocate only 1G and 2M >>>> blocks. Most of memory is free at this point, and it would allow the >>>> use of HAP superpage mappings, which will be a massive performance boost >>>> if they aren't shattered. >>> That's what I'm trying to do, but we might have to use pages of lower order >>> when filling the smaller gaps. >> As a general principle, we should try not to have any gaps. This also >> extends to guests using more intelligence when deciding now to mutate >> its physmap. > Yes, but in this case we are limited by the original e820 from the host. > A DomU (without passthrough) will have all it's memory contiguously. Ah yes - that is a legitimate restriction. > >>> As an example, this are the stats when >>> building a domain with 6048M of RAM: >>> >>> (XEN) Memory allocation stats: >>> (XEN) Order 18: 5GB >>> (XEN) Order 17: 512MB >>> (XEN) Order 15: 256MB >>> (XEN) Order 14: 128MB >>> (XEN) Order 12: 16MB >>> (XEN) Order 10: 8MB >>> (XEN) Order 9: 4MB >>> (XEN) Order 8: 2MB >>> (XEN) Order 7: 1MB >>> (XEN) Order 6: 512KB >>> (XEN) Order 5: 256KB >>> (XEN) Order 4: 128KB >>> (XEN) Order 3: 64KB >>> (XEN) Order 2: 32KB >>> (XEN) Order 1: 16KB >>> (XEN) Order 0: 4KB >>> >>> IMHO, they are quite good. >> What are the RAM characteristics of the host? Do you have any idea what >> the hap superpage characteristics are like after the guest has booted? > This is the host RAM map: > > (XEN) 0000000000000000 - 000000000009c800 (usable) > (XEN) 000000000009c800 - 00000000000a0000 (reserved) > (XEN) 00000000000e0000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 00000000ad662000 (usable) > (XEN) 00000000ad662000 - 00000000adb1f000 (reserved) > (XEN) 00000000adb1f000 - 00000000b228b000 (usable) > (XEN) 00000000b228b000 - 00000000b2345000 (reserved) > (XEN) 00000000b2345000 - 00000000b236a000 (ACPI data) > (XEN) 00000000b236a000 - 00000000b2c9a000 (ACPI NVS) > (XEN) 00000000b2c9a000 - 00000000b2fff000 (reserved) > (XEN) 00000000b2fff000 - 00000000b3000000 (usable) > (XEN) 00000000b3800000 - 00000000b8000000 (reserved) > (XEN) 00000000f8000000 - 00000000fc000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fed00000 - 00000000fed04000 (reserved) > (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ff000000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 0000000247000000 (usable) > > No idea about the HAP superpage characteristics, how can I fetch this > information? (I know I can dump the guest EPT tables, but that just > saturates the console). Not easily, and also not the first time I have run into this problem. We really should have at least a debug way of identifying this. For now, you can count how many time ept_split_super_page() get called at each level. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.