[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0)
On Tue, Sep 03, 2013 at 09:35:50PM +0100, Gordan Bobic wrote: > First attempt at a test run predictably failed. I added e820_host=1 > to a VM config and tried starting it: > > [root@normandy ~]# xl create /etc/xen/edi > Parsing config from /etc/xen/edi > libxl: error: libxl_x86.c:307:libxl__arch_domain_create: Failed > while collecting E820 with: -3 (errno:-1) > > libxl: error: libxl_create.c:901:domcreate_rebuild_done: cannot > (re-)build domain: -3 > libxl: error: libxl_dm.c:1300:libxl__destroy_device_model: could not > find device-model's pid for dom 1 > libxl: error: libxl.c:1415:libxl__destroy_domid: > libxl__destroy_device_model failed for 1 > > xl-edi.log, qemu-dm-edi.log attached. > Both actually look identical to previous logs before the patch. > > Is this something that is clearly a consequence of the patch being > incomplete? Or did I break something? You are missing the hypervisor patch to set the E820 for HVM guests. http://lists.xen.org/archives/html/xen-devel/2013-05/msg01603.html And that should make it possible to "stash" the E820 in the hypervisor. Then after that you will need to implement in the hvmloader.c the XENMEM_memory_map hypercall to get the E820 and do something with it. Oh, and something like this probably should do it - not compile tested in any way: diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 1fcaed0..7b38890 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3146,6 +3146,7 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case XENMEM_machine_memory_map: case XENMEM_machphys_mapping: return -ENOSYS; + case XENMEM_memory_map: case XENMEM_decrease_reservation: rc = do_memory_op(cmd, arg); current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1; @@ -3216,10 +3217,10 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd & MEMOP_CMD_MASK ) { - case XENMEM_memory_map: case XENMEM_machine_memory_map: case XENMEM_machphys_mapping: return -ENOSYS; + case XENMEM_memory_map: case XENMEM_decrease_reservation: rc = compat_memory_op(cmd, arg); current->domain->arch.hvm_domain.qemu_mapcache_invalidate = 1; diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 2e05e93..86fb20a 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -68,16 +68,42 @@ void dump_e820_table(struct e820entry *e820, unsigned int nr) } } +static const char *e820_names(int type) +{ + switch (type) { + case E820_RAM: return "RAM"; + case E820_RESERVED: return "Reserved"; + case E820_ACPI: return "ACPI"; + case E820_NVS: return "ACPI NVS"; + case E820_UNUSABLE: return "Unusable"; + default: break; + } + return "Unknown"; +} + + /* Create an E820 table based on memory parameters provided in hvm_info. */ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { unsigned int nr = 0; + struct xen_memory_map op; + struct e820entry map[E820MAX]; + int rc; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA0000; + set_xen_guest_handle(op.buffer, map); + + rc = hypercall_memory_op ( XENMEM_memory_op, &op); + if ( rc != -ENOSYS) { /* It works!? */ + int i; + for ( i = 0; i < op.nr_entries; i++ ) + printf(" %lx -> %lx %s\n", map[i].addr >> 12, + (map[i].addr + map[i].size) >> 12, e820_names(map[i].type)); + } /* Lowmem must be at least 512K to keep Windows happy) */ ASSERT ( lowmem_reserved_base > 512<<10 ); > > Gordan > > On 09/03/2013 08:47 PM, Gordan Bobic wrote: > >On 09/03/2013 03:59 PM, Konrad Rzeszutek Wilk wrote: > > > >>>>>2) Further, I'm finding myself motivated to write that > >>>>>auto-set (as opposed to hard coded) vBAR=pBAR patch discussed > >>>>>briefly a week or so ago (have an init script read the BAR > >>>>>info from dom0 and put it in xenstore, plus a patch to > >>>>>make pBAR=vBAR reservations built dynamically rather than > >>>>>statically, based on this data. Now, I'm quite fluent in C, > >>>>>but my familiarity with Xen soruce code is nearly non-existant > >>>>>(limited to studying an old unsupported patch every now and then > >>>>>in order to make it apply to a more recent code release). > >>>>>Can anyone help me out with a high level view WRT where > >>>>>this would be best plumbed in (which files and the flow of > >>>>>control between the affected files)? > >>>> > >>>>hvmloader probably and the libxl e820 code. What from a > >>>>high view needs to happen is that: > >>>>1). Need to relax the check in libxl for e820_hole > >>>> to also do it for HVM guests. Said code just iterates over the > >>>> host E820 and sanitizes it a bit and makes a E820 hypercall to > >>>> set it for the guest. > >[snip] > > > >OK, I have attached a preliminary patch against 4.3.0 for the libxl > >part. It compiles. I haven't tried running it to see if it actually > >works or does something, but my packages build. > > > >Please let me know if I've missed anything. On it's own, I don't think > >this patch will do much (apart from maybe break HVM hosts with > >e820_host=1 set). > > > >>>>2). Figure out whether the E820 hypercall (which sets the E820 > >>>> layout for a guest) can be run on HVM guests. I think it > >>>> could not and Mukesh in his PVH patches posted a patch > >>>> to enable that - "..Move e820 fields out of pv_domain struct" > > > >Is this already in 4.3.0 or is this an out-of-tree patch? Do you have a > >link to it handy? > > > >>>>2). Hvmloader should do an E820 get machine memory hypercall > >>>> to see if there is anything there. If there is - that means > >>>> the toolstack has request a "new" type of E820. Iterate > >>>> over the E820 and make it look like that. > >>>> You can look in the Linux arch/x86/xen/setup.c to see how > >>>> it does that. > >>>> > >>>> The complication there is that hvmloader needs to to fit the > >>>> ACPI code (the guest type one) and such. > >>>> Presumarily you can just re-use the existing spaces that > >>>> the host has marked as E820_RESERVED or E820_ACPI.. > >>> > >>>Yup, I get it. Not only that, but it should also ideally (not > >>>strictly necessary, but it'd be handy) map the IOMEM for devices > >>>it is passed so that pBAR=vBAR (as opposed to just leaving all > >>>the host e820 reserved areas well alone - which would work for > >>>most things). > >> > >>Yes. That is an extra complication that could be done in subsequent > >>patches. But in theory if you have the E820 mirrored from the host the > >>pBAR=vBAR should be easy enough as the values from the host BARs can > >>easily fit in the E820 gaps. > > > >Agreed. Let's leave the pBAR=vBAR part for a separate patch set. I'll > >have to figure out a sensible way to query the IOMEM regions for each of > >the devices passed to the VM and make sure they are in the same hole. > > > >>>> Then there is the SMBIOS would need to move and the BIOS > >>>> might need to be relocated - but I think those are relocatable > >>>> in some form. > > > >[bit above left for later reference] > > > >>>>Well, I am more than happy to help you with this. > >>> > >>>Thanks, much appreciated. :) > >> > >>Yeeey! Vict^H^H^H^volunteer :-)! <manically laughter in the background> > >> > >>I am also reachable on IRC (FreeNode mostly) as either darnok or konrad > >>if that would be more convient to discuss this. > > > >Thanks. I'll keep that in mind. :) > > > >Gordan > > > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@xxxxxxxxxxxxx > >http://lists.xen.org/xen-devel > > > > domid: 1 > Using file /dev/zvol/ssd/edi in read-write mode > Watching /local/domain/0/device-model/1/logdirty/cmd > Watching /local/domain/0/device-model/1/command > Watching /local/domain/1/cpu > char device redirected to /dev/pts/3 > qemu_map_cache_init nr_buckets = 10000 size 4194304 > shared page at pfn feffd > buffered io page at pfn feffb > Guest uuid = a57e6840-e9f5-4a14-a822-b2cc662c177f > populating video RAM at ff000000 > mapping video RAM from ff000000 > Register xen platform. > Done register platform. > platform_fixed_ioport: changed ro/rw state of ROM memory area. now is rw > state. > xs_read(/local/domain/0/device-model/1/xen_extended_power_mgmt): read error > xs_read(): vncpasswd get error. > /vm/a57e6840-e9f5-4a14-a822-b2cc662c177f/vncpasswd. > Log-dirty: no command yet. > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > vcpu-set: watch node error. > [xenstore_process_vcpu_set_event]: /local/domain/1/cpu has no CPU! > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > xs_read(/local/domain/1/log-throttling): read error > qemu: ignoring not-understood drive `/local/domain/1/log-throttling' > medium change watch on `/local/domain/1/log-throttling' - unknown device, > ignored > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0 > dm-command: hot insert pass-through pci dev > register_real_device: Assigning real physical device 08:00.0 ... > register_real_device: Disable MSI translation via per device option > register_real_device: Enable power management > pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No > such file or directory: 0x8:0x0.0x0 > pt_register_regions: IO region registered (size=0x02000000 > base_addr=0xf8000000) > pt_register_regions: IO region registered (size=0x08000000 > base_addr=0xb800000c) > pt_register_regions: IO region registered (size=0x04000000 > base_addr=0xb400000c) > pt_register_regions: IO region registered (size=0x00000080 > base_addr=0x0000cf81) > pt_register_regions: Expansion ROM registered (size=0x00080000 > base_addr=0xfbc00000) > pci_intx: intx=1 > register_real_device: Real physical device 08:00.0 registered successfuly! > IRQ type = INTx > dm-command: hot insert pass-through pci dev > register_real_device: Assigning real physical device 08:00.1 ... > register_real_device: Disable MSI translation via per device option > register_real_device: Enable power management > pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No > such file or directory: 0x8:0x0.0x1 > pt_register_regions: IO region registered (size=0x00004000 > base_addr=0xfbcfc000) > pci_intx: intx=2 > register_real_device: Real physical device 08:00.1 registered successfuly! > IRQ type = INTx > dm-command: hot insert pass-through pci dev > register_real_device: Assigning real physical device 0c:00.0 ... > register_real_device: Disable MSI translation via per device option > register_real_device: Enable power management > pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No > such file or directory: 0xc:0x0.0x0 > pt_register_regions: IO region registered (size=0x00004000 > base_addr=0xd7efc000) > pci_intx: intx=1 > register_real_device: Real physical device 0c:00.0 registered successfuly! > IRQ type = INTx > dm-command: hot insert pass-through pci dev > register_real_device: Assigning real physical device 00:1a.1 ... > register_real_device: Disable MSI translation via per device option > register_real_device: Enable power management > pt_iomul_init: Error: pt_iomul_init can't open file /dev/xen/pci_iomul: No > such file or directory: 0x0:0x1a.0x1 > pt_register_regions: IO region registered (size=0x00000020 > base_addr=0x00008a01) > pci_intx: intx=2 > register_real_device: Real physical device 00:1a.1 registered successfuly! > IRQ type = INTx > pt_iomem_map: e_phys=e0000000 maddr=b8000000 type=8 len=134217728 index=1 > first_map=1 > pt_iomem_map: e_phys=e8000000 maddr=b4000000 type=8 len=67108864 index=3 > first_map=1 > pt_iomem_map: e_phys=ec000000 maddr=f8000000 type=0 len=33554432 index=0 > first_map=1 > vga s->lfb_addr = ef000000 s->lfb_end = ef800000 > pt_iomem_map: e_phys=ef8a0000 maddr=fbcfc000 type=0 len=16384 index=0 > first_map=1 > pt_iomem_map: e_phys=ef8a4000 maddr=d7efc000 type=0 len=16384 index=0 > first_map=1 > pt_ioport_map: e_phys=c100 pio_base=cf80 len=128 index=5 first_map=1 > pt_ioport_map: e_phys=c1e0 pio_base=8a00 len=32 index=4 first_map=1 > platform_fixed_ioport: changed ro/rw state of ROM memory area. now is rw > state. > platform_fixed_ioport: changed ro/rw state of ROM memory area. now is ro > state. > Unknown PV product 2 loaded in guest > PV driver build 1 > region type 0 at [ef880000,ef8a0000). > squash iomem [ef880000, ef8a0000). > region type 1 at [c180,c1c0). > vga s->lfb_addr = ef000000 s->lfb_end = ef800000 > pt_iomem_map: e_phys=ffffffff maddr=f8000000 type=0 len=33554432 index=0 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=b8000000 type=8 len=134217728 index=1 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=b4000000 type=8 len=67108864 index=3 > first_map=0 > pt_ioport_map: e_phys=ffff pio_base=cf80 len=128 index=5 first_map=0 > pt_iomem_map: e_phys=ec000000 maddr=f8000000 type=0 len=33554432 index=0 > first_map=0 > pt_iomem_map: e_phys=e0000000 maddr=b8000000 type=8 len=134217728 index=1 > first_map=0 > pt_iomem_map: e_phys=e8000000 maddr=b4000000 type=8 len=67108864 index=3 > first_map=0 > pt_ioport_map: e_phys=c100 pio_base=cf80 len=128 index=5 first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=fbcfc000 type=0 len=16384 index=0 > first_map=0 > pt_pci_write_config: [00:06:0] Warning: Guest attempt to set address to > unused Base Address Register. [Offset:30h][Length:4] > pt_iomem_map: e_phys=ef8a0000 maddr=fbcfc000 type=0 len=16384 index=0 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=d7efc000 type=0 len=16384 index=0 > first_map=0 > pt_pci_write_config: [00:07:0] Warning: Guest attempt to set address to > unused Base Address Register. [Offset:30h][Length:4] > pt_iomem_map: e_phys=ef8a4000 maddr=d7efc000 type=0 len=16384 index=0 > first_map=0 > pt_ioport_map: e_phys=ffff pio_base=8a00 len=32 index=4 first_map=0 > pt_pci_write_config: [00:08:0] Warning: Guest attempt to set address to > unused Base Address Register. [Offset:30h][Length:4] > pt_ioport_map: e_phys=c1e0 pio_base=8a00 len=32 index=4 first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=f8000000 type=0 len=33554432 index=0 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=b8000000 type=8 len=134217728 index=1 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=b4000000 type=8 len=67108864 index=3 > first_map=0 > pt_ioport_map: e_phys=ffff pio_base=cf80 len=128 index=5 first_map=0 > pt_iomem_map: e_phys=ec000000 maddr=f8000000 type=0 len=33554432 index=0 > first_map=0 > pt_iomem_map: e_phys=e0000000 maddr=b8000000 type=8 len=134217728 index=1 > first_map=0 > pt_iomem_map: e_phys=e8000000 maddr=b4000000 type=8 len=67108864 index=3 > first_map=0 > pt_ioport_map: e_phys=c100 pio_base=cf80 len=128 index=5 first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=fbcfc000 type=0 len=16384 index=0 > first_map=0 > pt_iomem_map: e_phys=ef8a0000 maddr=fbcfc000 type=0 len=16384 index=0 > first_map=0 > pt_ioport_map: e_phys=ffff pio_base=8a00 len=32 index=4 first_map=0 > pt_ioport_map: e_phys=c1e0 pio_base=8a00 len=32 index=4 first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=d7efc000 type=0 len=16384 index=0 > first_map=0 > pt_iomem_map: e_phys=ef8a4000 maddr=d7efc000 type=0 len=16384 index=0 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=fbcfc000 type=0 len=16384 index=0 > first_map=0 > pt_iomem_map: e_phys=ffffffff maddr=d7efc000 type=0 len=16384 index=0 > first_map=0 > pt_ioport_map: e_phys=ffff pio_base=8a00 len=32 index=4 first_map=0 > shutdown requested in cpu_handle_ioreq > Issued domain 1 poweroff > Waiting for domain edi (domid 1) to die [pid 8363] > Domain 1 has shut down, reason code 0 0x0 > Action for shutdown reason code 0 is destroy > Domain 1 needs to be cleaned up: destroying the domain > libxl: error: libxl_pci.c:990:libxl__device_pci_reset: The kernel doesn't > support reset from sysfs for PCI device 0000:08:00.0 > libxl: error: libxl_pci.c:990:libxl__device_pci_reset: The kernel doesn't > support reset from sysfs for PCI device 0000:08:00.1 > Done. Exiting now _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |