[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0)
On Fri, 6 Sep 2013 09:20:50 -0400, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: On Fri, Sep 06, 2013 at 01:23:19PM +0100, Gordan Bobic wrote:On Thu, 05 Sep 2013 19:01:03 -0400, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: >Gordan Bobic <gordan@xxxxxxxxxx> wrote: >>On 09/05/2013 11:23 PM, Konrad Rzeszutek Wilk wrote: >>>Gordan Bobic <gordan@xxxxxxxxxx> wrote: >>>>Right, finally got around to trying this with the latest patch. >>>> >>>>With e820_host=0 things work as before: >>>> >>>>(XEN) HVM3: BIOS map: >>>>(XEN) HVM3: f0000-fffff: Main BIOS >>>>(XEN) HVM3: E820 table: >>>>(XEN) HVM3: [00]: 00000000:00000000 - 00000000:0009e000: RAM>>>>(XEN) HVM3: [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED>>>>(XEN) HVM3: HOLE: 00000000:000a0000 - 00000000:000e0000>>>>(XEN) HVM3: [02]: 00000000:000e0000 - 00000000:00100000: RESERVED>>>>(XEN) HVM3: [03]: 00000000:00100000 - 00000000:e0000000: RAM >>>>(XEN) HVM3: HOLE: 00000000:e0000000 - 00000000:fc000000>>>>(XEN) HVM3: [04]: 00000000:fc000000 - 00000001:00000000: RESERVED>>>>(XEN) HVM3: [05]: 00000001:00000000 - 00000002:1f800000: RAM >>>> >>>> >>>>I seem to be getting two different E820 table dumps with >>e820_host=1: >>>> >>>>(XEN) HVM1: BIOS map: >>>>(XEN) HVM1: f0000-fffff: Main BIOS >>>>(XEN) HVM1: build_e820_table:91 got 8 op.nr_entries >>>>(XEN) HVM1: E820 table: >>>>(XEN) HVM1: [00]: 00000000:00000000 - 00000000:3f790000: RAM >>>>(XEN) HVM1: [01]: 00000000:3f790000 - 00000000:3f79e000: ACPI >>>>(XEN) HVM1: [02]: 00000000:3f79e000 - 00000000:3f7d0000: NVS>>>>(XEN) HVM1: [03]: 00000000:3f7d0000 - 00000000:3f7e0000: RESERVED>>>>(XEN) HVM1: HOLE: 00000000:3f7e0000 - 00000000:3f7e7000>>>>(XEN) HVM1: [04]: 00000000:3f7e7000 - 00000000:40000000: RESERVED>>>>(XEN) HVM1: HOLE: 00000000:40000000 - 00000000:fee00000>>>>(XEN) HVM1: [05]: 00000000:fee00000 - 00000000:fee01000: RESERVED>>>>(XEN) HVM1: HOLE: 00000000:fee01000 - 00000000:ffc00000>>>>(XEN) HVM1: [06]: 00000000:ffc00000 - 00000001:00000000: RESERVED>>>>(XEN) HVM1: [07]: 00000001:00000000 - 00000001:68870000: RAM >>>>(XEN) HVM1: E820 table: >>>>(XEN) HVM1: [00]: 00000000:00000000 - 00000000:0009e000: RAM>>>>(XEN) HVM1: [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED>>>>(XEN) HVM1: HOLE: 00000000:000a0000 - 00000000:000e0000>>>>(XEN) HVM1: [02]: 00000000:000e0000 - 00000000:00100000: RESERVED>>>>(XEN) HVM1: [03]: 00000000:00100000 - 00000000:a7800000: RAM >>>>(XEN) HVM1: HOLE: 00000000:a7800000 - 00000000:fc000000>>>>(XEN) HVM1: [04]: 00000000:fc000000 - 00000001:00000000: RESERVED>>>>(XEN) HVM1: Invoking ROMBIOS ... >>>> >>>>I cannot quite figure out what is going on here - these >>>>tables can't >>>>both be true. >>>> >>> >>>Right. The code just prints the E820 that was constructed b/c >>>of the>>e820_host =1 parameter as the first output. Then the second one is>>what was constructed originally. >>> >>>The code that would tie in the E820 from the hyper call and >>>the alter >>how the hvmloader sets it up is not yet done. >>> >>> >>>>Looking at the IOMEM on the host, the IOMEM begins at >>>>0xa8000000 and >>>>goes more or less contiguously up to 0xfec8b000. >>>> >>>>Looking at dmesg on domU, the e820 map more or less matches the >>second >>>>dump above. >>>>>>Right. That is correct since the patch I sent just outputs stuff.>>No real changes to the E820 yet. >> >>I thought this did that in hvmloader/e820c: >>hypercall_memory_op ( XENMEM_memory_map, &op); >> >>Gordan > >No. They just gets the E820 that is stashed in the hypervisor for >the guest. The PV guest would use it but hvmloader is not. This is>what would needed to be implemented to allow hvmloader construct the>E820 on its own. Right. So so in hvmloader/e820.c we now have the host based map in struct e820entry map[E820MAX]; The rest of the function then goes and constructs the standard HVM e820 map in the passed in struct e820entry *e820 So all that needs to happen here is if e820_host is set, fill e820[] by copying map[] up to the hvm_info->low_mem_pgend (or hvm_info->high_mem_pgend if it is set). I am guessing thatRight. And then the overflow would be put past 4GB. Or fill in the E820_RAM regions with it.SeaBIOS and other existing stuff might break if the host map is just copied in verbatim, so presumably I need to add/dedupe the non-RAM parts of the maps.Probably. Or tweak SeaBIOS to use your E820. I don't think tweaking SeaBIOS to use a different specific map is the way forward. As I said in the other email, my motivation is to make something that will work in the general case, not for the memory map in my dodgy hardware (I'm sure there are many other poorly designed bits of hardware out there this might be useful on ;) ). Also you need to figure out where hvmloader constructs the ACPI andSMBIOS tables and make sure they are within the E820_RESERVED regions. This doesn't appear to have caused any problems - the only problematic part is trampling over the host's _mapped_ parts of the PCI MMIO hole. Having domU RAM everywhere else doesn't _appear_ to cause any problems, hence why I would like to focus my effort on making sure that the holes are mapped while breaking nothing else if at all possible. Is that right? Nothing else needs to happen?HA! You are going to hit some bugs probably :-) Hey, some degree of optimisim is required for perseverence. ;) The following questions arise: 1) What to do in case of overlaps? On my specific hardware, the key difference in the end map will be that the hole at: (XEN) HVM1: HOLE: 00000000:40000000 - 00000000:fee00000 will end up being created in domU.The hole is also known as PCI gap or MMIO region. With the e820_host in effect you should use the host's layout and use its hole placement. That will replicate it and make domU's E820 hole look like the host. Hmm... Now there's an idea. I _could_ just hard-code the memory hole to match that just to see if it fixes the problem. I rather expect, however, that this will just move the problem. Specifically, it is liable to make domU MMIO overlap (without matching) the dom0 MMIO and crash the host quite spectacularly. Unless domU decides to map MMIO from the bottom up, in which case there's 1688MB of MMIO space between 0x40000000 and 0xa8000000 where MMIO will end up in domU, never overlapping the host's map and everything will, by pure chance, work just fine from there on. 2) Do only the holes need to be pulled from the host or the entire map? Would hvmloader/seabios/whatever know what to do if passed a map that is different from what they might expect (i.e. different from what the current hvmloader provides)? Or would this be likely to cause extensive further breakages?I think there are some assumptions made where the hole starts. Those would have to be made more dynamic to deal with a different E820 layout. Assumptions made by what? 3) At the moment I am leaning toward just pulling in the holes from the host e820, mirroring them in domU.<nods>3.1) Marking them as "reserved" would likely fix the problem that was my primary motivation for doing this in the first place. Having said that - with all ofThat unfortuntaly will make them not-gaps nor MMIO regions. Meaning the kernel will scream: "You have a BAR in E820_ reserved region! That is bad!", and won't setup the card. What makes decision in domU where to map the PCI devices' MMIO? SeaBIOS? The hole needs to be replicated in the guest.the 1GB-3GB space marked as reserved, I'm not sure where the IOMEM would end up mapped in domU - things might just break. If marking the dom0 hole as a hole in domU without ensuring pBAR=vBAR, the PCI device in domU might get mapped with where another device is in dom0, which might cause the same problem.Right. hvmloader could (I hadn't checked the code) scan the E820 and determine that the PCI BARs are within the E820_RESRV and try to move them to a hole. Since no hole would be found below 4GB it would remap the PCI BAR above 4GB. That - depending on the device - could be disastrous for the device. That is if it is only capable of 32-bit DMA's it will never do anything. Nvidia cards have a 32-bit 32MB BAR by default, and two 64-bit BARs. Looking at the different maps, I think I see what is actually happening. In domU, the hole defaults to starting at e0000000, and this is also where the BARs get mapped for the GPU in domU. That implies that mirroring the host's hole at 1GB-4GB, would actually likely work (by a fluke), since the BARs would (hopefully) get mapped at bottom (plenty of hole before the host's mapping, 1688MB to be exact), and the rest of the hole would never get touched, stealthily (or obliviously, depending on how you want to look at it) avoiding trampling over the host's BARs. OK, I'm convinced - I'll give this a try and see how I get on. :) At the moment, I think the expedient thing to do is make domU map holes as per dom0 and ignore other non-RAM<nods>areas. This may (by luck) or may not fix my immediate problem (RAM in domU clobbering host's mapped IOMEM), but at least it would cover the pre-requisite hole mapping for the next step which is vBAR=pBAR.<nods>I light of this, however, depending on the answer to 2) above, it may not be practical for e820_host option do doI think it will mean you need to look in the hvmloader directory a bit more and find all of the assumptions it makes about memory locations. One excellent tool is to do 'git log -p tools/hvmloader' as it will tell you what changes have been done to address the memory layout construction. I'll have a dig. what it actually means for HVMs, at least not to the same extent as happens for PV. It would only do a part of it (initial vHOLE=pHOLE, to later be extended to the more specific case of vBAR=pBAR). Does this sound reasonable?Yes. I think the plan you outlined is sound. The difficultiy is going to be cramming the E820 constructed by e820_host in hvmloader and making sure that all the other parts of it (SMBIOS, ACPI, BIOS) will be more dynamic and use dynamic locations instead of hard-coded values. Loads of printks can help with that :-) This is my main concern - that other things are making assumptions about where the holes are. At the moment it doesn't look too bad since the only areas of conflict between (_my_) host and current hvmloader maps is in the RAM and HOLE areas, so coming up with a generic solution that will work for my use (and hopefully for most other people) ought to be fairly simple. Making it actually work in the edge cases will be harder - but then again for those cases it doesn't work at the moment anyway so erring on the side of pragmatism may be the correct thing to do here. The awesome thing is that it will make hvmloader a lot more flexible. And one can extend the e820_host to construct an E820 that is bizzare for testing even more absurd memory layouts (say, no RAM below 4GB). Keep on digging! Thanks for great analysis. Thanks, I appreciate it. :) Gordan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |