[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0)



On Thu, 05 Sep 2013 10:41:09 +0100, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Hmm...

gcc -O2 -fomit-frame-pointer -m64 -fno-strict-aliasing -std=gnu99
-Wall -Wstrict-prototypes -Wdeclaration-after-statement
-Wno-unused-but-set-variable   -DNDEBUG -fno-builtin -fno-common
-Wredundant-decls -iwithprefix include -Werror -Wno-pointer-arith
-pipe -I/root/rpmbuild/BUILD/xen-4.3.0/xen/include
-I/root/rpmbuild/BUILD/xen-4.3.0/xen/include/asm-x86/mach-generic
-I/root/rpmbuild/BUILD/xen-4.3.0/xen/include/asm-x86/mach-default
-msoft-float -fno-stack-protector -fno-exceptions -Wnested-externs
-mno-red-zone -mno-sse -fpic -fno-asynchronous-unwind-tables
-DGCC_HAS_VISIBILITY_ATTRIBUTE -nostdinc -g -D__XEN__ -include
/root/rpmbuild/BUILD/xen-4.3.0/xen/include/xen/config.h -DHAS_ACPI
-DHAS_GDBSX -DHAS_PASSTHROUGH -MMD -MF .debug.o.d -c debug.c -o
debug.o
gcc -O2 -fomit-frame-pointer -m64 -fno-strict-aliasing -std=gnu99
-Wall -Wstrict-prototypes -Wdeclaration-after-statement
-Wno-unused-but-set-variable   -DNDEBUG -fno-builtin -fno-common
-Wredundant-decls -iwithprefix include -Werror -Wno-pointer-arith
-pipe -I/root/rpmbuild/BUILD/xen-4.3.0/xen/include
-I/root/rpmbuild/BUILD/xen-4.3.0/xen/include/asm-x86/mach-generic
-I/root/rpmbuild/BUILD/xen-4.3.0/xen/include/asm-x86/mach-default
-msoft-float -fno-stack-protector -fno-exceptions -Wnested-externs
-mno-red-zone -mno-sse -fpic -fno-asynchronous-unwind-tables
-DGCC_HAS_VISIBILITY_ATTRIBUTE -nostdinc -g -D__XEN__ -include
/root/rpmbuild/BUILD/xen-4.3.0/xen/include/xen/config.h -DHAS_ACPI
-DHAS_GDBSX -DHAS_PASSTHROUGH -MMD -MF .domain.o.d -c domain.c -o
domain.o
domain.c: In function âarch_domain_destroyâ:
domain.c:595: error: âstruct pv_domainâ has no member named âe820â
make[4]: *** [domain.o] Error 1

It would seem you omitted this block from the original patch:

===
@@ -592,8 +592,8 @@ void arch_domain_destroy(struct domain *d)
 {
     if ( is_hvm_domain(d) )
         hvm_domain_destroy(d);
-    else
-        xfree(d->arch.pv_domain.e820);
+
+    xfree(d->arch.e820);

     free_domain_pirqs(d);
     if ( !is_idle_domain(d) )
===

Was that intentional? Does that block look OK to you? Should I re-add it?

Just to clarify - re-adding this block fixes the build issue.
Will test tonight whether it runs. What I really wanted to
know is whether this is the correct way to handle the cleanup
in this case.

Gordan

On Wed, 4 Sep 2013 22:04:42 -0400, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
On Wed, Sep 04, 2013 at 02:11:06PM +0100, Gordan Bobic wrote:
I have this at the point where it actually builds.
Otherwise completely untested (will do that later today).

Attached are:

1) libxl patch
Modified from the original patch to _not_ implicitly enable
e820_host when PCI devices are passed.

2) Mukesh's hypervisor e820 patch from here:
http://lists.xen.org/archives/html/xen-devel/2013-05/msg01603.html
Modified slightly to attempt to address Jan's comment on the same
thread, and to adjust the diff line pointers to match against
4.3.0 release code.

I think that was the old version. I spotted a bug in it that
was causing a hang. And also the one that explains why libxl
would refuse to setup the E820.

The problem was that in the XENMEM_set_memory_map there was
a check to make sure that the guest launched was not HVM.

Also there was bug in the initial domain creation where
the spinlock was only set for PV and not for HVM.


3) A patch based on Konrad's earlier in this thread, with
a few additions and changes to make it all compile.

Some peer review would be most welcome - this is my first
venture into Xen code, so please do assume that I have
no idea what I'm doing at the moment. :)

I added yet another E820MAX #define, this time to
tools/firmware/hvmloader/e820.h

If there is a better place to #include that via from
e820.c, please point me in the right direction.

I think I saw that #define in tools/libxc/xenctrl.h. But since
the tools/firmware cannot link to the libxc (b/c it is a Minicontained
OS) I believe just having the #define in hvmloader/e820.h is
the right call.

Good first pass. I altered it a bit and got in the HVM guest
the E820 entries printed out. Here is a big giant diff:

diff --git a/tools/firmware/hvmloader/e820.c
b/tools/firmware/hvmloader/e820.c
index 2e05e93..3c80241 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -22,6 +22,9 @@

 #include "config.h"
 #include "util.h"
+#include "hypercall.h"
+#include <xen/memory.h>
+#include <errno.h>

 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
@@ -74,10 +77,20 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int bios_image_base)
 {
     unsigned int nr = 0;
+    struct xen_memory_map op;
+    struct e820entry map[E820MAX];
+    int rc;

     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;

+    set_xen_guest_handle(op.buffer, map);
+
+    rc = hypercall_memory_op ( XENMEM_memory_map, &op);
+    if ( rc != -ENOSYS) { /* It works!? */
+        printf("%s:%d got %d op.nr_entries \n", __func__, __LINE__,
op.nr_entries);
+        dump_e820_table(&map[0], op.nr_entries);
+    }
     /* Lowmem must be at least 512K to keep Windows happy) */
     ASSERT ( lowmem_reserved_base > 512<<10 );

diff --git a/tools/firmware/hvmloader/e820.h
b/tools/firmware/hvmloader/e820.h
index b2ead7f..2fa700d 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -8,6 +8,9 @@
 #define E820_RESERVED     2
 #define E820_ACPI         3
 #define E820_NVS          4
+#define E820_UNUSABLE     5
+
+#define E820MAX         128

 struct e820entry {
     uint64_t addr;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 0c32d0b..d8e2346 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -208,6 +208,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,

     libxl_defbool_setdefault(&b_info->disable_migrate, false);

+    libxl_defbool_setdefault(&b_info->e820_host, false);
+
     switch (b_info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
         if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
@@ -280,7 +282,6 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,

         break;
     case LIBXL_DOMAIN_TYPE_PV:
-        libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
         if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
             b_info->shadow_memkb = 0;
         if (b_info->u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT)
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 85341a0..fd6389a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -299,6 +299,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("irqs",             Array(uint32, "num_irqs")),
     ("iomem",            Array(libxl_iomem_range, "num_iomem")),
     ("claim_mode",        libxl_defbool),
+    # Use host's E820 for PCI passthrough.
+    ("e820_host",        libxl_defbool),
     ("u", KeyedUnion(None, libxl_domain_type, "type",
[("hvm", Struct(None, [("firmware", string),
                                        ("bios",
libxl_bios_type),
@@ -345,8 +347,6 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ("cmdline", string),
                                       ("ramdisk", string),
("features", string, {'const': True}), - # Use host's E820 for PCI passthrough.
-                                      ("e820_host", libxl_defbool),
                                       ])),
                  ("invalid", Struct(None, [])),
], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index a78c91d..94515a5 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -216,28 +216,41 @@ static int libxl__e820_alloc(libxl__gc *gc,
uint32_t domid,
     struct e820entry map[E820MAX];
     libxl_domain_build_info *b_info;

- if (d_config == NULL || d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM)
-        return ERROR_INVAL;
-
     b_info = &d_config->b_info;
-    if (!libxl_defbool_val(b_info->u.pv.e820_host))
+    if (!libxl_defbool_val(b_info->e820_host)) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "%s:%d.",__func__,
__LINE__);
         return ERROR_INVAL;
-
+    }
     rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
     if (rc < 0) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "%s:%d.",__func__,
__LINE__);
         errno = rc;
         return ERROR_FAIL;
     }
     nr = rc;
-    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
-                       (b_info->max_memkb - b_info->target_memkb) +
-                       b_info->u.pv.slack_memkb);
+    LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "%s:%d.nr:%d",__func__,
__LINE__, nr);
+    if (d_config == NULL || d_config->c_info.type ==
LIBXL_DOMAIN_TYPE_HVM) {
+        rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
+ (b_info->max_memkb - b_info->target_memkb));
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
"%s:%d.rc%d",__func__, __LINE__, rc);
+    } else if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
+        rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
+ (b_info->max_memkb - b_info->target_memkb) +
+                           b_info->u.pv.slack_memkb);
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
"%s:%d.rc%d",__func__, __LINE__, rc);
+    }
+
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
"%s:%d.rc%d",__func__, __LINE__, rc);
     if (rc)
         return ERROR_FAIL;

+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "%s:%d.rc%d,
nr:%d",__func__, __LINE__, rc, nr);
+
     rc = xc_domain_set_memory_map(ctx->xch, domid, map, nr);

+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
"%s:%d.rc%d",__func__, __LINE__, rc);
     if (rc < 0) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
"%s:%d.rc%d",__func__, __LINE__, rc);
         errno  = rc;
         return ERROR_FAIL;
     }
@@ -296,8 +309,7 @@ int libxl__arch_domain_create(libxl__gc *gc,
libxl_domain_config *d_config,
         xc_shadow_control(ctx->xch, domid,
XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL);
     }

-    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV &&
-            libxl_defbool_val(d_config->b_info.u.pv.e820_host)) {
+    if (libxl_defbool_val(d_config->b_info.e820_host)) {
         ret = libxl__e820_alloc(gc, domid, d_config);
         if (ret) {
             LIBXL__LOG_ERRNO(gc->owner, LIBXL__LOG_ERROR,
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ed99622..d98ca24 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1291,11 +1291,7 @@ skip_vfb:
     if (!xlu_cfg_get_long (config, "pci_permissive", &l, 0))
         pci_permissive = l;

- /* To be reworked (automatically enabled) once the auto ballooning
-     * after guest starts is done (with PCI devices passed in). */
-    if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {
-        xlu_cfg_get_defbool(config, "e820_host",
&b_info->u.pv.e820_host, 0);
-    }
+ xlu_cfg_get_defbool(config, "e820_host", &b_info->e820_host, 0);

     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
@@ -1314,7 +1310,7 @@ skip_vfb:
                 d_config->num_pcidevs++;
         }
if (d_config->num_pcidevs && c_info->type == LIBXL_DOMAIN_TYPE_PV)
-            libxl_defbool_set(&b_info->u.pv.e820_host, true);
+            libxl_defbool_set(&b_info->e820_host, true);
     }

     switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
diff --git a/tools/libxl/xl_sxp.c b/tools/libxl/xl_sxp.c
index a16a025..f34f0ba 100644
--- a/tools/libxl/xl_sxp.c
+++ b/tools/libxl/xl_sxp.c
@@ -87,6 +87,10 @@ void printf_info_sexp(int domid,
libxl_domain_config *d_config)
         }
     }

+    printf("\t(e820_host %s)\n",
+           libxl_defbool_to_string(b_info->e820_host));
+
+
     printf("\t(image\n");
     switch (c_info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
@@ -150,8 +154,6 @@ void printf_info_sexp(int domid,
libxl_domain_config *d_config)
         printf("\t\t\t(kernel %s)\n", b_info->u.pv.kernel);
         printf("\t\t\t(cmdline %s)\n", b_info->u.pv.cmdline);
         printf("\t\t\t(ramdisk %s)\n", b_info->u.pv.ramdisk);
-        printf("\t\t\t(e820_host %s)\n",
-               libxl_defbool_to_string(b_info->u.pv.e820_host));
         printf("\t\t)\n");
         break;
     default:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 874742c..4796221 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -566,10 +566,9 @@ int arch_domain_create(struct domain *d,
unsigned int domcr_flags)
     {
         /* 64-bit PV guest by default. */
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
-
-        spin_lock_init(&d->arch.pv_domain.e820_lock);
     }

+    spin_lock_init(&d->arch.e820_lock);
     /* initialize default tsc behavior in case tools don't */
     tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
     spin_lock_init(&d->arch.vtsc_lock);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 54b1e6a..6c9b58c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3142,10 +3142,10 @@ static long hvm_memory_op(int cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)

     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
+    case XENMEM_memory_map:
     case XENMEM_decrease_reservation:
         rc = do_memory_op(cmd, arg);
current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1;
@@ -3217,10 +3217,10 @@ static long hvm_memory_op_compat32(int cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)

     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
+    case XENMEM_memory_map:
     case XENMEM_decrease_reservation:
         rc = compat_memory_op(cmd, arg);
current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7f0e13..4c3ce9a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4740,19 +4740,13 @@ long arch_memory_op(int op,
XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }

-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
             rcu_unlock_domain(d);
             return -ENOMEM;
         }
-
+
if ( copy_from_guest(e820, fmap.map.buffer, fmap.map.nr_entries) )
         {
             xfree(e820);
@@ -4760,11 +4754,11 @@ long arch_memory_op(int op,
XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         }

-        spin_lock(&d->arch.pv_domain.e820_lock);
-        xfree(d->arch.pv_domain.e820);
-        d->arch.pv_domain.e820 = e820;
-        d->arch.pv_domain.nr_e820 = fmap.map.nr_entries;
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
+        xfree(d->arch.e820);
+        d->arch.e820 = e820;
+        d->arch.nr_e820 = fmap.map.nr_entries;
+        spin_unlock(&d->arch.e820_lock);

         rcu_unlock_domain(d);
         return rc;
@@ -4778,26 +4772,26 @@ long arch_memory_op(int op,
XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&map, arg, 1) )
             return -EFAULT;

-        spin_lock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);

         /* Backwards compatibility. */
-        if ( (d->arch.pv_domain.nr_e820 == 0) ||
-             (d->arch.pv_domain.e820 == NULL) )
+        if ( (d->arch.nr_e820 == 0) ||
+             (d->arch.e820 == NULL) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -ENOSYS;
         }

- map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820);
-        if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820,
+        map.nr_entries = min(map.nr_entries, d->arch.nr_e820);
+        if ( copy_to_guest(map.buffer, d->arch.e820,
                            map.nr_entries) ||
              __copy_to_guest(arg, &map, 1) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -EFAULT;
         }

-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_unlock(&d->arch.e820_lock);
         return 0;
     }

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d79464d..c3f9f8e 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -234,11 +234,6 @@ struct pv_domain

     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
-
-    /* Pseudophysical e820 map (XENMEM_memory_map).  */
-    spinlock_t e820_lock;
-    struct e820entry *e820;
-    unsigned int nr_e820;
 };

 struct arch_domain
@@ -313,6 +308,11 @@ struct arch_domain
(possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Pseudophysical e820 map (XENMEM_memory_map).  */
+    spinlock_t e820_lock;
+    struct e820entry *e820;
+    unsigned int nr_e820;
 } __cacheline_aligned;

 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.