[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap



On Mon, 1 Dec 2014, Don Slutz wrote:
> On 11/27/14 05:48, Stefano Stabellini wrote:
> > On Wed, 26 Nov 2014, Don Slutz wrote:
> > > On 11/26/14 13:17, Stefano Stabellini wrote:
> > > > On Tue, 25 Nov 2014, Andrew Cooper wrote:
> > > > > On 25/11/14 17:45, Stefano Stabellini wrote:
> > > > > > Increase maxmem before calling xc_domain_populate_physmap_exact to
> > > > > > avoid
> > > > > > the risk of running out of guest memory. This way we can also avoid
> > > > > > complex memory calculations in libxl at domain construction time.
> > > > > > 
> > > > > > This patch fixes an abort() when assigning more than 4 NICs to a VM.
> > > > > > 
> > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> > > > > > 
> > > > > > diff --git a/xen-hvm.c b/xen-hvm.c
> > > > > > index 5c69a8d..38e08c3 100644
> > > > > > --- a/xen-hvm.c
> > > > > > +++ b/xen-hvm.c
> > > > > > @@ -218,6 +218,7 @@ void xen_ram_alloc(ram_addr_t ram_addr,
> > > > > > ram_addr_t
> > > > > > size, MemoryRegion *mr)
> > > > > >        unsigned long nr_pfn;
> > > > > >        xen_pfn_t *pfn_list;
> > > > > >        int i;
> > > > > > +    xc_dominfo_t info;
> > > > > >          if (runstate_check(RUN_STATE_INMIGRATE)) {
> > > > > >            /* RAM already populated in Xen */
> > > > > > @@ -240,6 +241,13 @@ void xen_ram_alloc(ram_addr_t ram_addr,
> > > > > > ram_addr_t
> > > > > > size, MemoryRegion *mr)
> > > > > >            pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
> > > > > >        }
> > > > > >    +    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
> > > > > xc_domain_getinfo()'s interface is mad, and provides no guarantee that
> > > > > it returns the information for the domain you requested.  It also
> > > > > won't
> > > > > return -1 on error.  The correct error handing is:
> > > > > 
> > > > > (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || (info.domid
> > > > > !=
> > > > > xen_domid)
> > > > It might be wiser to switch to xc_domain_getinfolist
> > > Either needs the same tests, since both return an vector of info.
> > Right
> > 
> > 
> > > > > ~Andrew
> > > > > 
> > > > > > +        hw_error("xc_domain_getinfo failed");
> > > > > > +    }
> > > > > > +    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
> > > > > > +                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
> > > There are two big issues and 1 minor one with this.
> > > 1) You will allocate the videoram again.
> > > 2) You will never use the 1 MB already allocated for option ROMs.
> > > 
> > > And the minor one is that you can increase maxmem more then is needed.
> > I don't understand: are you aware that setmaxmem doesn't allocate any
> > memory, just raises the maximum amount of memory allowed for the domain
> > to have?
> 
> Yes.
> 
> > But you are right that we would raise the limit more than it could be,
> > specifically the videoram would get accounted for twice and we wouldn't
> > need LIBXL_MAXMEM_CONSTANT. I guess we would have to write a patch for
> > that.
> > 
> > 
> > 
> > > Here is a better if:
> > > 
> > > -    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
> > > -                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
> > > +    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
> > > +    free_pages = max_pages - info.nr_pages;
> > > +    need_pages = nr_pfn - free_pages;
> > > +    if ((free_pages < nr_pfn) &&
> > > +       (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
> > > +                            (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
> > That's an interesting idea, but I am not sure if it is safe in all
> > configurations.
> > 
> > It could make QEMU work better with older libxl and avoid increasing
> > maxmem more than necessary.
> > On the other hand I guess it could break things when PoD is used, or in
> > general when the user purposely sets maxmem on the vm config file.
> > 
> 
> Works fine in both claim modes and with PoD used (maxmem > memory).  Do
> not know how to test with tmem.  I do not see how it would be worse then
> current
> code that does not auto increase.  I.E. even without a xen change, I think
> something
> like this could be done.

OK, good to know. I am OK with increasing maxmem only if it is strictly
necessary.


> 
> > > My testing shows a free 32 pages that I am not sure where they come from.
> > > But
> > > the code about is passing my 8 nics of e1000.
> > I think that raising maxmem a bit higher than necessary is not too bad.
> > If we really care about it, we could lower the limit after QEMU's
> > initialization is completed.
> 
> Ok.  I did find the 32 it is VGA_HOLE_SIZE.  So here is what I have which
> includes
> a lot of extra printf.

In QEMU I would prefer not to assume that libxl increased maxmem for the
vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole
than tie QEMU to a particular maxmem allocation scheme in libxl.

In libxl I would like to avoid increasing mamxem for anything QEMU will
allocate later, that includes rom and vga vram. I am not sure how to
make that work with older QEMU versions that don't call
xc_domain_setmaxmem by themselves yet though. Maybe we could check the
specific QEMU version in libxl and decide based on that. Or we could
export a feature flag in QEMU.



> --- a/xen-hvm.c
> +++ b/xen-hvm.c
> @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
> *shared_page, int vcpu)
>  #endif
> 
>  #define BUFFER_IO_MAX_DELAY  100
> +#define VGA_HOLE_SIZE (0x20)
> 
>  typedef struct XenPhysmap {
>      hwaddr start_addr;
> @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
> MemoryRegion *mr)
>      xen_pfn_t *pfn_list;
>      int i;
>      xc_dominfo_t info;
> +    unsigned long max_pages, free_pages, real_free;
> +    long need_pages;
> +    uint64_t tot_pages, pod_cache_pages, pod_entries;
> +
> +    trace_xen_ram_alloc(ram_addr, size, mr->name);
> 
>      if (runstate_check(RUN_STATE_INMIGRATE)) {
>          /* RAM already populated in Xen */
> @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
> MemoryRegion *mr)
>          return;
>      }
> 
> -    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
> -            " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
> -            " mr.name=%s\n",
> -            __func__, size, (long)(size>>10), ram_addr, mr->name);
> -
> -    trace_xen_ram_alloc(ram_addr, size);
> -
>      nr_pfn = size >> TARGET_PAGE_BITS;
>      pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
> 
> @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
> MemoryRegion *mr)
>          pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>      }
> 
> -    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
> +    if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
> +       (info.domid != xen_domid)) {
>          hw_error("xc_domain_getinfo failed");
>      }
> -    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
> -                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
> +    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
> +    free_pages = max_pages - info.nr_pages;
> +    real_free = free_pages;
> +    if (free_pages > VGA_HOLE_SIZE) {
> +        free_pages -= VGA_HOLE_SIZE;
> +    } else {
> +        free_pages = 0;
> +    }
> +    need_pages = nr_pfn - free_pages;
> +    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
> +            " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
> +            " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
> +            __func__, size, (long)(size>>10), ram_addr,
> +           max_pages, free_pages, nr_pfn, need_pages,
> +            mr->name);
> +    if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
> +                                 &pod_cache_pages, &pod_entries) >= 0) {
> +        unsigned long populated = tot_pages - pod_cache_pages;
> +        long delta_tot = tot_pages - info.nr_pages;
> +
> +        fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld nop=%ld
> free=%ld\n",
> +                __func__, populated, (long)tot_pages, delta_tot,
> +                (long)pod_cache_pages, (long)pod_entries,
> +               (long)info.nr_outstanding_pages, real_free);
> +    }

What is the purpose of calling xc_domain_get_pod_target here? It doesn't
look like is affecting the parameters passed to xc_domain_setmaxmem.


> +    if ((free_pages < nr_pfn) &&
> +       (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
> +                            (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
>          hw_error("xc_domain_setmaxmem failed");
>      }
>      if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0,
> pfn_list)) {
> 
> 
> Note: I already had a trace_xen_ram_alloc, here is the delta in trace-events
> (which
> has others not yet sent):
> 
> 
> 
> diff --git a/trace-events b/trace-events
> index eaeecd5..a58fc11 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s
> page: %"PRIx64""
>  pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s
> pages: %u"
> 
>  # xen-all.c
> -xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx,
> size %#lx"
> +xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char*
> mr_name) "requested: %#lx size %#lx mr->name=%s"
>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>  handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t
> data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size)
> "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d
> size=%d"
>  handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t
> data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size)
> "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d
> size=%d"
> 
> 
>    -Don Slutz
> 
> > 
> > >     -Don Slutz
> > > 
> > > 
> > > > > > +        hw_error("xc_domain_setmaxmem failed");
> > > > > > +    }
> > > > > >        if (xc_domain_populate_physmap_exact(xen_xc, xen_domid,
> > > > > > nr_pfn, 0,
> > > > > > 0, pfn_list)) {
> > > > > >            hw_error("xen: failed to populate ram at " RAM_ADDR_FMT,
> > > > > > ram_addr);
> > > > > >        }
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Xen-devel mailing list
> > > > > > Xen-devel@xxxxxxxxxxxxx
> > > > > > http://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.