Xen project Mailing List

Re: [Xen-devel] [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap

On Mon, 1 Dec 2014, Don Slutz wrote: > On 11/27/14 05:48, Stefano Stabellini wrote: > > On Wed, 26 Nov 2014, Don Slutz wrote: > > > On 11/26/14 13:17, Stefano Stabellini wrote: > > > > On Tue, 25 Nov 2014, Andrew Cooper wrote: > > > > > On 25/11/14 17:45, Stefano Stabellini wrote: > > > > > > Increase maxmem before calling xc_domain_populate_physmap_exact to > > > > > > avoid > > > > > > the risk of running out of guest memory. This way we can also avoid > > > > > > complex memory calculations in libxl at domain construction time. > > > > > > > > > > > > This patch fixes an abort() when assigning more than 4 NICs to a VM. > > > > > > > > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> > > > > > > > > > > > > diff --git a/xen-hvm.c b/xen-hvm.c > > > > > > index 5c69a8d..38e08c3 100644 > > > > > > --- a/xen-hvm.c > > > > > > +++ b/xen-hvm.c > > > > > > @@ -218,6 +218,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, > > > > > > ram_addr_t > > > > > > size, MemoryRegion *mr) > > > > > > unsigned long nr_pfn; > > > > > > xen_pfn_t *pfn_list; > > > > > > int i; > > > > > > + xc_dominfo_t info; > > > > > > if (runstate_check(RUN_STATE_INMIGRATE)) { > > > > > > /* RAM already populated in Xen */ > > > > > > @@ -240,6 +241,13 @@ void xen_ram_alloc(ram_addr_t ram_addr, > > > > > > ram_addr_t > > > > > > size, MemoryRegion *mr) > > > > > > pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i; > > > > > > } > > > > > > + if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) { > > > > > xc_domain_getinfo()'s interface is mad, and provides no guarantee that > > > > > it returns the information for the domain you requested. It also > > > > > won't > > > > > return -1 on error. The correct error handing is: > > > > > > > > > > (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || (info.domid > > > > > != > > > > > xen_domid) > > > > It might be wiser to switch to xc_domain_getinfolist > > > Either needs the same tests, since both return an vector of info. > > Right > > > > > > > > > ~Andrew > > > > > > > > > > > + hw_error("xc_domain_getinfo failed"); > > > > > > + } > > > > > > + if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + > > > > > > + (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { > > > There are two big issues and 1 minor one with this. > > > 1) You will allocate the videoram again. > > > 2) You will never use the 1 MB already allocated for option ROMs. > > > > > > And the minor one is that you can increase maxmem more then is needed. > > I don't understand: are you aware that setmaxmem doesn't allocate any > > memory, just raises the maximum amount of memory allowed for the domain > > to have? > > Yes. > > > But you are right that we would raise the limit more than it could be, > > specifically the videoram would get accounted for twice and we wouldn't > > need LIBXL_MAXMEM_CONSTANT. I guess we would have to write a patch for > > that. > > > > > > > > > Here is a better if: > > > > > > - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + > > > - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { > > > + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE; > > > + free_pages = max_pages - info.nr_pages; > > > + need_pages = nr_pfn - free_pages; > > > + if ((free_pages < nr_pfn) && > > > + (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + > > > + (need_pages * XC_PAGE_SIZE / 1024)) < 0)) { > > That's an interesting idea, but I am not sure if it is safe in all > > configurations. > > > > It could make QEMU work better with older libxl and avoid increasing > > maxmem more than necessary. > > On the other hand I guess it could break things when PoD is used, or in > > general when the user purposely sets maxmem on the vm config file. > > > > Works fine in both claim modes and with PoD used (maxmem > memory). Do > not know how to test with tmem. I do not see how it would be worse then > current > code that does not auto increase. I.E. even without a xen change, I think > something > like this could be done. OK, good to know. I am OK with increasing maxmem only if it is strictly necessary. > > > > My testing shows a free 32 pages that I am not sure where they come from. > > > But > > > the code about is passing my 8 nics of e1000. > > I think that raising maxmem a bit higher than necessary is not too bad. > > If we really care about it, we could lower the limit after QEMU's > > initialization is completed. > > Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have which > includes > a lot of extra printf. In QEMU I would prefer not to assume that libxl increased maxmem for the vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole than tie QEMU to a particular maxmem allocation scheme in libxl. In libxl I would like to avoid increasing mamxem for anything QEMU will allocate later, that includes rom and vga vram. I am not sure how to make that work with older QEMU versions that don't call xc_domain_setmaxmem by themselves yet though. Maybe we could check the specific QEMU version in libxl and decide based on that. Or we could export a feature flag in QEMU. > --- a/xen-hvm.c > +++ b/xen-hvm.c > @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t > *shared_page, int vcpu) > #endif > > #define BUFFER_IO_MAX_DELAY 100 > +#define VGA_HOLE_SIZE (0x20) > > typedef struct XenPhysmap { > hwaddr start_addr; > @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, > MemoryRegion *mr) > xen_pfn_t *pfn_list; > int i; > xc_dominfo_t info; > + unsigned long max_pages, free_pages, real_free; > + long need_pages; > + uint64_t tot_pages, pod_cache_pages, pod_entries; > + > + trace_xen_ram_alloc(ram_addr, size, mr->name); > > if (runstate_check(RUN_STATE_INMIGRATE)) { > /* RAM already populated in Xen */ > @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, > MemoryRegion *mr) > return; > } > > - fprintf(stderr, "%s: alloc "RAM_ADDR_FMT > - " bytes (%ld Kib) of ram at "RAM_ADDR_FMT > - " mr.name=%s\n", > - __func__, size, (long)(size>>10), ram_addr, mr->name); > - > - trace_xen_ram_alloc(ram_addr, size); > - > nr_pfn = size >> TARGET_PAGE_BITS; > pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn); > > @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, > MemoryRegion *mr) > pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i; > } > > - if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) { > + if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || > + (info.domid != xen_domid)) { > hw_error("xc_domain_getinfo failed"); > } > - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + > - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { > + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE; > + free_pages = max_pages - info.nr_pages; > + real_free = free_pages; > + if (free_pages > VGA_HOLE_SIZE) { > + free_pages -= VGA_HOLE_SIZE; > + } else { > + free_pages = 0; > + } > + need_pages = nr_pfn - free_pages; > + fprintf(stderr, "%s: alloc "RAM_ADDR_FMT > + " bytes (%ld KiB) of ram at "RAM_ADDR_FMT > + " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n", > + __func__, size, (long)(size>>10), ram_addr, > + max_pages, free_pages, nr_pfn, need_pages, > + mr->name); > + if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages, > + &pod_cache_pages, &pod_entries) >= 0) { > + unsigned long populated = tot_pages - pod_cache_pages; > + long delta_tot = tot_pages - info.nr_pages; > + > + fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld nop=%ld > free=%ld\n", > + __func__, populated, (long)tot_pages, delta_tot, > + (long)pod_cache_pages, (long)pod_entries, > + (long)info.nr_outstanding_pages, real_free); > + } What is the purpose of calling xc_domain_get_pod_target here? It doesn't look like is affecting the parameters passed to xc_domain_setmaxmem. > + if ((free_pages < nr_pfn) && > + (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + > + (need_pages * XC_PAGE_SIZE / 1024)) < 0)) { > hw_error("xc_domain_setmaxmem failed"); > } > if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0, > pfn_list)) { > > > Note: I already had a trace_xen_ram_alloc, here is the delta in trace-events > (which > has others not yet sent): > > > > diff --git a/trace-events b/trace-events > index eaeecd5..a58fc11 100644 > --- a/trace-events > +++ b/trace-events > @@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s > page: %"PRIx64"" > pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s > pages: %u" > > # xen-all.c > -xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, > size %#lx" > +xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char* > mr_name) "requested: %#lx size %#lx mr->name=%s" > xen_client_set_memory(uint64_t start_addr, unsigned long size, bool > log_dirty) "%#"PRIx64" size %#lx, log_dirty %i" > handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t > data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) > "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d > size=%d" > handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t > data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) > "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d > size=%d" > > > -Don Slutz > > > > > > -Don Slutz > > > > > > > > > > > > + hw_error("xc_domain_setmaxmem failed"); > > > > > > + } > > > > > > if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, > > > > > > nr_pfn, 0, > > > > > > 0, pfn_list)) { > > > > > > hw_error("xen: failed to populate ram at " RAM_ADDR_FMT, > > > > > > ram_addr); > > > > > > } > > > > > > > > > > > > _______________________________________________ > > > > > > Xen-devel mailing list > > > > > > Xen-devel@xxxxxxxxxxxxx > > > > > > http://lists.xen.org/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.