[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [PATCH 3 of 4] libxl: Add support for passing in the machine's E820 for PCI passthrough in libxl_device_pci_parse_bdf
On Tue, 2011-04-12 at 18:53 +0100, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 12, 2011 at 01:19:43PM +0100, Ian Campbell wrote: > > On Mon, 2011-04-11 at 22:35 +0100, Konrad Rzeszutek Wilk wrote: > > > > > > > > > The code libxl_device_pci_parse_bdf calls the libxl__e820_alloc when > > > it has parsed one BDF. Subsequent calls to libxl_device_pci_parse_bdf > > > will > > > not trigger libxl__e820_alloc being called (unless the first call to > > > libxl__e820_alloc failed). > > > > That sounds like a very odd non-intuitive location for that allocation. > > Why not do it in libxl_domain_create or somewhere like that? > > > > I think the e820 map added to the idl should become a simple boolean > > flag and this should all be taken care of internally based on that. > > Like this? Why no_machine_e820 instead of just machine_e820? You can set a non-zero default in the appropriate libxl_foo_init function if need be. Ian. > # HG changeset patch > # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > # Date 1302617081 14400 > # Node ID 945a34bcc2bd4bb341296f3ce84be6c42f24a4c9 > # Parent 4527044d3d1cebdc393e596072bc02d947a8de51 > libxl: Add support for passing in the machine's E820 for PCI passthrough > > The code that populates E820 is unconditionally triggered by the guest > configuration having "pci=['<BDF>,..']", being a PV guest, and if > b_info->u.pv.no_machine_e820 is not set. > > The code do_domain_create calls the libxl__e820_alloc when > it notices that the guest is PV, has at least one PCI devices, and does not > have the no_machine_e820 flag set.. > > libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems > E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as > well remove any E820_RAM or E820_UNUSED regions as the guest does not need to > know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED > to > get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes > that any > gap in the E820 is considered PCI I/O space which means that if we pass > in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the > gap between 2GB and 3GB will be considered as PCI I/O space. To guard against > that we also create an E820_UNUSABLE between the region of 'target_kb' > (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region. > Lastly, the xc_domain_set_memory_map is called to install the new E820. > > When tested with another PV guest (NetBSD 5.1) the modified E820 gave > it no trouble. The code has also been tested with older "classic" Xen Linux > and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid, > Debian Squeeze, 2.6.37, 2.6.38, 2.6.39). > > Memory that is slack or for balloon (so 'maxmem' in guest configuration) > is put behind the machine E820. Which in most cases is after the 4GB. > > The reason for doing the fetching of the E820 using the hypercall in > the toolstack (instead of the guest doing it) is that when a guest > would do a hypercall to 'XENMEM_machine_memory_map' it would > retrieve an E820 with I/O range caps added in. Meaning that the > region after 4GB up to end of possible memory would be marked as unusable > and the kernel would not have any space to allocate a balloon > region. > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl.idl > --- a/tools/libxl/libxl.idl Tue Apr 12 09:59:59 2011 -0400 > +++ b/tools/libxl/libxl.idl Tue Apr 12 10:04:41 2011 -0400 > @@ -117,6 +117,7 @@ libxl_domain_build_info = Struct("domain > ("cmdline", string), > ("ramdisk", libxl_file_reference), > ("features", string, True), > + ("no_machine_e820", bool), > ])), > ])), > ], > diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_create.c > --- a/tools/libxl/libxl_create.c Tue Apr 12 09:59:59 2011 -0400 > +++ b/tools/libxl/libxl_create.c Tue Apr 12 10:04:41 2011 -0400 > @@ -536,6 +536,14 @@ static int do_domain_create(libxl__gc *g > for (i = 0; i < d_config->num_pcidevs; i++) > libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1); > > + if (!d_config->c_info.hvm && !d_config->b_info.u.pv.no_machine_e820) { > + int rc = 0; > + rc = libxl__e820_alloc(ctx, domid, d_config); > + if (rc) > + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, > + "Failed while collecting E820 with: %d (errno:%d)\n", > + rc, errno); > + } > if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader )) { > if ( (*cb)(ctx, domid, priv) ) > goto error_out; > diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_internal.h > --- a/tools/libxl/libxl_internal.h Tue Apr 12 09:59:59 2011 -0400 > +++ b/tools/libxl/libxl_internal.h Tue Apr 12 10:04:41 2011 -0400 > @@ -335,4 +335,5 @@ _hidden int libxl__error_set(libxl__gc * > _hidden int libxl__file_reference_map(libxl_file_reference *f); > _hidden int libxl__file_reference_unmap(libxl_file_reference *f); > > +_hidden int libxl__e820_alloc(libxl_ctx *ctx, uint32_t domid, > libxl_domain_config *d_config); > #endif > diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_pci.c > --- a/tools/libxl/libxl_pci.c Tue Apr 12 09:59:59 2011 -0400 > +++ b/tools/libxl/libxl_pci.c Tue Apr 12 10:04:41 2011 -0400 > @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx > free(pcidevs); > return 0; > } > + > +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[], > + uint32_t *nr_entries, > + unsigned long map_limitkb, > + unsigned long balloon_kb) > +{ > + uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end; > + uint32_t i, idx = 0, nr; > + struct e820entry e820[E820MAX]; > + > + if (!src || !map_limitkb || !balloon_kb || !nr_entries) > + return ERROR_INVAL; > + > + nr = *nr_entries; > + if (!nr) > + return ERROR_INVAL; > + > + if (nr > E820MAX) > + return ERROR_NOMEM; > + > + /* Weed out anything under 16MB */ > + for (i = 0; i < nr; i++) { > + if (src[i].addr > 0x100000) > + continue; > + > + src[i].type = 0; > + src[i].size = 0; > + src[i].addr = -1ULL; > + } > + > + /* Find the lowest and highest entry in E820, skipping over > + * undersired entries. */ > + start = -1ULL; > + last = 0; > + for (i = 0; i < nr; i++) { > + if ((src[i].type == E820_RAM) || > + (src[i].type == E820_UNUSABLE) || > + (src[i].type == 0)) > + continue; > + > + start = src[i].addr < start ? src[i].addr : start; > + last = src[i].addr + src[i].size > last ? > + src[i].addr + src[i].size > last : last; > + } > + if (start > 1024) > + start_kb = start >> 10; > + > + /* Add the memory RAM region for the guest */ > + e820[idx].addr = 0; > + e820[idx].size = (uint64_t)map_limitkb << 10; > + e820[idx].type = E820_RAM; > + > + /* .. and trim if neccessary */ > + if (start_kb && map_limitkb > start_kb) { > + delta_kb = map_limitkb - start_kb; > + if (delta_kb) > + e820[idx].size -= (uint64_t)(delta_kb << 10); > + } > + /* Note: We don't touch balloon_kb here. Will add it at the end. */ > + ram_end = e820[idx].addr + e820[idx].size; > + idx ++; > + > + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Memory: %ldkB End of RAM: 0x%lx (PFN) > " \ > + "Delta: %ldkB, PCI start: %ldkB (0x%lx PFN), Balloon %ldkB\n", > + map_limitkb, ram_end >> 12, delta_kb, start_kb ,start >> 12, > + balloon_kb); > + > + /* Check if there is a region between ram_end and start. */ > + if (start > ram_end) { > + /* .. and if not present, add it in. This is to guard against > + the Linux guest assuming that the gap between the end of > + RAM region and the start of the E820_[ACPI,NVS,RESERVED] > + is PCI I/O space. Which it certainly is _not_. */ > + e820[idx].type = E820_UNUSABLE; > + e820[idx].addr = ram_end; > + e820[idx].size = start - ram_end; > + idx++; > + } > + /* Almost done: copy them over, ignoring the undesireable ones */ > + for (i = 0; i < nr; i++) { > + if ((src[i].type == E820_RAM) || > + (src[i].type == E820_UNUSABLE) || > + (src[i].type == 0)) > + continue; > + e820[idx].type = src[i].type; > + e820[idx].addr = src[i].addr; > + e820[idx].size = src[i].size; > + idx++; > + } > + > + /* At this point we have the mapped RAM + E820 entries from src. */ > + if (balloon_kb) { > + /* and if we truncated the RAM region, then add it to the end. */ > + e820[idx].type = E820_RAM; > + e820[idx].addr = (uint64_t)(1ULL << 32) > last ? (uint64_t)(1ULL << > 32) : last; > + /* also add the balloon memory to the end. */ > + e820[idx].size = (uint64_t)(delta_kb << 10) + (uint64_t)(balloon_kb > << 10); > + idx++; > + > + } > + nr = idx; > + > + for (i = 0; i < nr; i++) { > + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]", > + e820[i].type == E820_RAM ? "RAM " : > + (e820[i].type == E820_RESERVED ? "RSV " : > + e820[i].type == E820_ACPI ? "ACPI" : > + (e820[i].type == E820_NVS ? "NVS " : > + (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))), > + e820[i].addr >> 12, > + (e820[i].addr + e820[i].size) >> 12); > + } > + > + /* Done: copy the sanitized version. */ > + *nr_entries = nr; > + memcpy(src, e820, nr * sizeof(struct e820entry)); > + return 0; > +} > + > + > +int libxl__e820_alloc(libxl_ctx *ctx, uint32_t domid, libxl_domain_config > *d_config) > +{ > + int rc; > + uint32_t nr; > + struct e820entry map[E820MAX]; > + libxl_domain_build_info *b_info; > + > + if (d_config == NULL || d_config->c_info.hvm) > + return ERROR_INVAL; > + > + b_info = &d_config->b_info; > + if (b_info->u.pv.no_machine_e820) > + return ERROR_INVAL; > + > + rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX); > + if (rc < 0) { > + errno = rc; > + return ERROR_FAIL; > + } > + nr = rc; > + rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb, > + (b_info->max_memkb - b_info->target_memkb) + > + b_info->u.pv.slack_memkb); > + if (rc) > + return ERROR_FAIL; > + > + rc = xc_domain_set_memory_map(ctx->xch, domid, map, nr); > + > + if (rc < 0) { > + errno = rc; > + return ERROR_FAIL; > + } > + return 0; > +} > diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c Tue Apr 12 09:59:59 2011 -0400 > +++ b/tools/libxl/xl_cmdimpl.c Tue Apr 12 10:04:41 2011 -0400 > @@ -371,6 +371,7 @@ static void printf_info(int domid, > printf("\t\t\t(kernel %s)\n", b_info->kernel.path); > printf("\t\t\t(cmdline %s)\n", b_info->u.pv.cmdline); > printf("\t\t\t(ramdisk %s)\n", b_info->u.pv.ramdisk.path); > + printf("\t\t\t(no_machine_e820 %d)\n", b_info->u.pv.no_machine_e820); > printf("\t\t)\n"); > } > printf("\t)\n"); > @@ -1025,7 +1026,9 @@ skip_vfb: > if (!libxl_device_pci_parse_bdf(&ctx, pcidev, buf)) > d_config->num_pcidevs++; > } > - } > + } else > + if (!c_info->hvm) > + b_info->u.pv.no_machine_e820 = 1; > > switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) { > case 0: _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |