[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0)



On Fri, 6 Sep 2013 09:09:06 -0400, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
On Thu, Sep 05, 2013 at 11:42:38PM +0100, Gordan Bobic wrote:
On 09/05/2013 11:23 PM, Konrad Rzeszutek Wilk wrote:
>Gordan Bobic <gordan@xxxxxxxxxx> wrote:
>>Right, finally got around to trying this with the latest patch.
>>
>>With e820_host=0 things work as before:
>>
>>(XEN) HVM3: BIOS map:
>>(XEN) HVM3:  f0000-fffff: Main BIOS
>>(XEN) HVM3: E820 table:
>>(XEN) HVM3:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
>>(XEN) HVM3:  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
>>(XEN) HVM3:  HOLE: 00000000:000a0000 - 00000000:000e0000
>>(XEN) HVM3:  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
>>(XEN) HVM3:  [03]: 00000000:00100000 - 00000000:e0000000: RAM
>>(XEN) HVM3:  HOLE: 00000000:e0000000 - 00000000:fc000000
>>(XEN) HVM3:  [04]: 00000000:fc000000 - 00000001:00000000: RESERVED
>>(XEN) HVM3:  [05]: 00000001:00000000 - 00000002:1f800000: RAM
>>
>>
>>I seem to be getting two different E820 table dumps with e820_host=1:
>>
>>(XEN) HVM1: BIOS map:
>>(XEN) HVM1:  f0000-fffff: Main BIOS
>>(XEN) HVM1: build_e820_table:91 got 8 op.nr_entries
>>(XEN) HVM1: E820 table:
>>(XEN) HVM1:  [00]: 00000000:00000000 - 00000000:3f790000: RAM
>>(XEN) HVM1:  [01]: 00000000:3f790000 - 00000000:3f79e000: ACPI
>>(XEN) HVM1:  [02]: 00000000:3f79e000 - 00000000:3f7d0000: NVS
>>(XEN) HVM1:  [03]: 00000000:3f7d0000 - 00000000:3f7e0000: RESERVED
>>(XEN) HVM1:  HOLE: 00000000:3f7e0000 - 00000000:3f7e7000
>>(XEN) HVM1:  [04]: 00000000:3f7e7000 - 00000000:40000000: RESERVED
>>(XEN) HVM1:  HOLE: 00000000:40000000 - 00000000:fee00000
>>(XEN) HVM1:  [05]: 00000000:fee00000 - 00000000:fee01000: RESERVED
>>(XEN) HVM1:  HOLE: 00000000:fee01000 - 00000000:ffc00000
>>(XEN) HVM1:  [06]: 00000000:ffc00000 - 00000001:00000000: RESERVED
>>(XEN) HVM1:  [07]: 00000001:00000000 - 00000001:68870000: RAM
>>(XEN) HVM1: E820 table:
>>(XEN) HVM1:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
>>(XEN) HVM1:  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
>>(XEN) HVM1:  HOLE: 00000000:000a0000 - 00000000:000e0000
>>(XEN) HVM1:  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
>>(XEN) HVM1:  [03]: 00000000:00100000 - 00000000:a7800000: RAM
>>(XEN) HVM1:  HOLE: 00000000:a7800000 - 00000000:fc000000
>>(XEN) HVM1:  [04]: 00000000:fc000000 - 00000001:00000000: RESERVED
>>(XEN) HVM1: Invoking ROMBIOS ...
>>
>>I cannot quite figure out what is going on here - these tables can't
>>both be true.
>>
>
>Right. The code just prints the E820 that was constructed b/c of the e820_host =1 parameter as the first output. Then the second one is what was constructed originally.
>
>The code that would tie in the E820 from the hyper call and the alter how the hvmloader sets it up is not yet done.
>
>
>>Looking at the IOMEM on the host, the IOMEM begins at 0xa8000000 and
>>goes more or less contiguously up to 0xfec8b000.
>>
>>Looking at dmesg on domU, the e820 map more or less matches the second
>>dump above.
>
>Right. That is correct since the patch I sent just outputs stuff. No real changes to the E820 yet.

/me *facepalms*

That indeed explains everything. :)

But having had a thorough look through the memory mappings (see my
other long, rambling email), I don't actually see an obvious area
where RAM might overwrite a dom0 IOMEM range - assuming the "HOLE"
part isn't mapped as RAM in domU.

Or to summarize:
dom0 PCI IOMEM actually has mappings from a8000000 onward, and
giving domU up to that much memory works fine. So the memory stomp
must be happening from a8000000 onward. But - the only things above
that address in domU are the HOLE up to fc000000 and RESERVED up to
ffffffff. So no domU memory is getting mapped into the IOMEM range
anyway - which begs the question of what is _actually_ causing the
crash. Stuff I haven't yet found in domU getting mapped into the
a7800000-fc000000 hole overlapping dom0 IOMEM? SeaBIOS doing
smething odd in the fc000000-fec8b000 range marked RESERVED in domU?

There were some assumptions with that region and that stuff could
be stick in there (like ACPI tables and SMBIOS I think).

Perhaps a better question is - are any of the BARs of your card overlapping
with the RESERVED range in the domU?

Or if you grep through the hvmloader code are there anything addresses
that look to be within that range?

Incidentally could you send the output of lspci -vvvv from your output
in the guest and in dom0 please?

Attached.

The main point I'm trying to keep in mind here is that this
needs to be generic and useful in different hardware cases,
not just my own. If it were just about my own hardware and use
case I'd have just opted for the approach of the old vBAR-pBAR
patch, hard-coded the holes and been done with it.

Or am I reading this all wrong?

You are on the right track I think. There is some assumption made
about the RESERVED and HOLE that I think are conflicing with what the
card thinks of. Another way to figure out what is happening is to crank
up the verbosity of the driver in the domU. Specifically there is
an CONFIG_MMIO_TRACE (or something like that) that will tell you the
physical address the PCI cards are using and what it is writting in it.

It could help in identifying _where_ the graphic card is writting/reading
from. And also the last moment when it wrote something.

That's a part of my problem - my domU with a reproducible crash
is Windows which is a lot less debuggable. :(
I have a Linux domU that I use for figuring out what the domU looks
like from the inside, but I don't have a readily usable test-case
for reproducing the crash there.

Gordan

Attachment: lspci.log
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.