[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap



On 12/02/14 07:26, Stefano Stabellini wrote:
On Mon, 1 Dec 2014, Don Slutz wrote:
On 12/01/14 10:37, Stefano Stabellini wrote:
On Mon, 1 Dec 2014, Don Slutz wrote:
On 11/27/14 05:48, Stefano Stabellini wrote:
[...]

Works fine in both claim modes and with PoD used (maxmem > memory).  Do
not know how to test with tmem.  I do not see how it would be worse then
current
code that does not auto increase.  I.E. even without a xen change, I think
something
like this could be done.
OK, good to know. I am OK with increasing maxmem only if it is strictly
necessary.


My testing shows a free 32 pages that I am not sure where they come
from.
But
the code about is passing my 8 nics of e1000.
I think that raising maxmem a bit higher than necessary is not too bad.
If we really care about it, we could lower the limit after QEMU's
initialization is completed.
Ok.  I did find the 32 it is VGA_HOLE_SIZE.  So here is what I have which
includes
a lot of extra printf.
In QEMU I would prefer not to assume that libxl increased maxmem for the
vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole
than tie QEMU to a particular maxmem allocation scheme in libxl.
Ok.  The area we are talking about is 0x000a0000 to 0x000c0000.
It is in libxc (xc_hvm_build_x86), not libxl.   I have no issue with a name
change to
some thing like QEMU_EXTRA_FREE_PAGES.
Sorry, I was thinking about the relocated videoram region, the one
allocated by xen_add_to_physmap in QEMU.

Regarding 0x000a0000-0x000c0000, according to this comment:

     /*
      * Allocate memory for HVM guest, skipping VGA hole 0xA0000-0xC0000.
      *

xc_hvm_build_x86 shouldn't be allocating memory for it.
In fact if I remember correctly 0xa0000-0xc0000 is trapped and emulated.


Clearly I am not writing clear enough statements.

xc_hvm_build_x86 is not allocating memory for it.  libxl__build_pre() sets
maxmem via xc_domain_setmaxmem().  Then xc_hvm_build_x86 allocates
memory skipping the VGA hole 0xA0000-0xC0000. So difference between
maxmem (max_pages, xlinfo->max_memkb) and tot_pages (xlinfo->current_memkb) is videoram + LIBXL_MAXMEM_CONSTANT + 32
(I.E. what VGA_HOLE_SIZE is definded as).


My testing has shows that some of these 32 pages are used outside of QEMU.
I am seeing just 23 free pages using a standalone program to display
the same info after a CentOS 6.3 guest is done booting.

In libxl I would like to avoid increasing mamxem for anything QEMU will
allocate later, that includes rom and vga vram. I am not sure how to
make that work with older QEMU versions that don't call
xc_domain_setmaxmem by themselves yet though. Maybe we could check the
specific QEMU version in libxl and decide based on that. Or we could
export a feature flag in QEMU.
Yes, it would be nice to adjust libxl to not increase maxmem. However since
videoram is included in memory (and maxmem), making the change related
to vram is a bigger issue.

the rom change is much simpler.

Currently I do not know of a way to do different things based on the QEMU
version
and/or features (this includes getting the QEMU version in libxl).

I have been going with:
1) change QEMU 1st.
2) Wait for an upstream version of QEMU with this.
3) change xen to optionally use a feature in the latest QEMU.
Let's try to take this approach for the videoram too

Ok.


--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
*shared_page, int vcpu)
   #endif

   #define BUFFER_IO_MAX_DELAY  100
+#define VGA_HOLE_SIZE (0x20)

   typedef struct XenPhysmap {
       hwaddr start_addr;
@@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size,
MemoryRegion *mr)
       xen_pfn_t *pfn_list;
       int i;
       xc_dominfo_t info;
+    unsigned long max_pages, free_pages, real_free;
+    long need_pages;
+    uint64_t tot_pages, pod_cache_pages, pod_entries;
+
+    trace_xen_ram_alloc(ram_addr, size, mr->name);

       if (runstate_check(RUN_STATE_INMIGRATE)) {
           /* RAM already populated in Xen */
@@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size,
MemoryRegion *mr)
           return;
       }

-    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
-            " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
-            " mr.name=%s\n",
-            __func__, size, (long)(size>>10), ram_addr, mr->name);
-
-    trace_xen_ram_alloc(ram_addr, size);
-
       nr_pfn = size >> TARGET_PAGE_BITS;
       pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);

@@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size,
MemoryRegion *mr)
           pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
       }

-    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
+    if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
I think it's best to call xc_domain_getinfolist.


Will switch.

+       (info.domid != xen_domid)) {
           hw_error("xc_domain_getinfo failed");
       }
-    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
-                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
+    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
+    free_pages = max_pages - info.nr_pages;
+    real_free = free_pages;
+    if (free_pages > VGA_HOLE_SIZE) {
+        free_pages -= VGA_HOLE_SIZE;
+    } else {
+        free_pages = 0;
+    }
I don't think we need to subtract VGA_HOLE_SIZE.

If you do not use some slack (like 32 here), xen does report:


(d5) [2014-11-29 17:07:21] Loaded SeaBIOS
(d5) [2014-11-29 17:07:21] Creating MP tables ...
(d5) [2014-11-29 17:07:21] Loading ACPI ...
(XEN) [2014-11-29 17:07:21] page_alloc.c:1568:d5 Over-allocation for domain 5: 1057417 > 1057416 (XEN) [2014-11-29 17:07:21] memory.c:158:d5 Could not allocate order=0 extent: id=5 memflags=0 (0 of 1)
(d5) [2014-11-29 17:07:21] vm86 TSS at 00098c00
(d5) [2014-11-29 17:07:21] BIOS map:


However while QEMU startup ends with 32 "free" pages (maxmem - curmem)
this quickly changes to 23. I.E. there are 7 more places that act like a call
to xc_domain_populate_physmap_exact() but do not log errors if there is
no free pages.  And just to make sure I do not fully understand what is
happening here, after the message above, the domain appears to work
fine and ends up with 8 "free" pages (code I do not know about ends up
releasing populated pages).

So my current thinking is to have QEMU leave a few (9 to 32 (64?)) pages "free".



+    need_pages = nr_pfn - free_pages;
+    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
+            " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
+            " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
+            __func__, size, (long)(size>>10), ram_addr,
+           max_pages, free_pages, nr_pfn, need_pages,
+            mr->name);
+    if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
+                                 &pod_cache_pages, &pod_entries) >= 0) {
+        unsigned long populated = tot_pages - pod_cache_pages;
+        long delta_tot = tot_pages - info.nr_pages;
+
+        fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld
nop=%ld
free=%ld\n",
+                __func__, populated, (long)tot_pages, delta_tot,
+                (long)pod_cache_pages, (long)pod_entries,
+               (long)info.nr_outstanding_pages, real_free);
+    }
What is the purpose of calling xc_domain_get_pod_target here? It doesn't
look like is affecting the parameters passed to xc_domain_setmaxmem.
This was just to test and see if anything was needed for PoD mode.
I did not see anything.

Did you want me to make a patch to send to the list that has my proposed
changes?
Yep, that's fine.


I do think that what I have would be fine to do since most of the time the
new code does nothing with the current xen (and older versions), until
more then 4 nics are configured for xen.

It would be good to have the change:

[PATCH] libxl_set_memory_target: retain the same maxmem offset on top of the
current target

(When working) in xen before using a QEMU with this change.

Not sure I would push for 2.2 however.
I wouldn't.


Will mark as for 2.3

   -Don Slutz



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.