[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: E820 memory allocation issue on Threadripper platforms



Hi all,

I recently found this mailing list thread when searching for information on a related issue regarding conflicting E820 on a Threadripper platform. For those interested in additional data points, I am using the ASUS WRX80E-SAGE SE Wifi II motherboard that presents the following E820 to Xen:

(XEN) EFI RAM map:
(XEN)  [0000000000000000, 0000000000000fff] (reserved)
(XEN)  [0000000000001000, 000000000008ffff] (usable)
(XEN)  [0000000000090000, 0000000000090fff] (reserved)
(XEN)  [0000000000091000, 000000000009ffff] (usable)
(XEN)  [00000000000a0000, 00000000000fffff] (reserved)
(XEN)  [0000000000100000, 0000000003ffffff] (usable)
(XEN)  [0000000004000000, 0000000004020fff] (ACPI NVS)
(XEN)  [0000000004021000, 0000000009df1fff] (usable)
(XEN)  [0000000009df2000, 0000000009ffffff] (reserved)
(XEN)  [000000000a000000, 00000000b5b04fff] (usable)
(XEN)  [00000000b5b05000, 00000000b8cd3fff] (reserved)
(XEN)  [00000000b8cd4000, 00000000b9064fff] (ACPI data)
(XEN)  [00000000b9065000, 00000000b942afff] (ACPI NVS)
(XEN)  [00000000b942b000, 00000000bb1fefff] (reserved)
(XEN)  [00000000bb1ff000, 00000000bbffffff] (usable)
(XEN)  [00000000bc000000, 00000000bfffffff] (reserved)
(XEN)  [00000000c1100000, 00000000c1100fff] (reserved)
(XEN)  [00000000e0000000, 00000000efffffff] (reserved)
(XEN)  [00000000f1280000, 00000000f1280fff] (reserved)
(XEN)  [00000000f2200000, 00000000f22fffff] (reserved)
(XEN)  [00000000f2380000, 00000000f2380fff] (reserved)
(XEN)  [00000000f2400000, 00000000f24fffff] (reserved)
(XEN)  [00000000f3680000, 00000000f3680fff] (reserved)
(XEN)  [00000000fea00000, 00000000feafffff] (reserved)
(XEN)  [00000000fec00000, 00000000fec00fff] (reserved)
(XEN)  [00000000fec10000, 00000000fec10fff] (reserved)
(XEN)  [00000000fed00000, 00000000fed00fff] (reserved)
(XEN)  [00000000fed40000, 00000000fed44fff] (reserved)
(XEN)  [00000000fed80000, 00000000fed8ffff] (reserved)
(XEN)  [00000000fedc2000, 00000000fedcffff] (reserved)
(XEN)  [00000000fedd4000, 00000000fedd5fff] (reserved)
(XEN)  [00000000ff000000, 00000000ffffffff] (reserved)
(XEN)  [0000000100000000, 000000703f0fffff] (usable)
(XEN)  [000000703f100000, 000000703fffffff] (reserved)

And of course the default physical link address of the x86_64 kernel is 16MiB which clearly conflicts with the EfiACPIMemoryNVS memory starting at 0x4000000. On latest Debian (12.5.0, bookworm) the decompressed kernel is more than 60MiB, so it obviously overflows into the adjacent region. I can also confirm that loading the Debian kernel at 2MiB also works as expected. Debian is also built with CONFIG_RELOCATABLE=y, so it should be capable of being loaded with this new feature in Xen. 

I see the link at this ticket was implemented and committed (dfc9fab0) on April 8, 2024 but it appears to not have made its way into the latest (4.18) Xen release. Though there seem to be more recent commits cherry picked into that branch. When is this fix expected to make it into a release?

Branden.

On Jan 17, 2024, at 7:54 AM, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:

On Wed, Jan 17, 2024 at 11:40:20AM +0100, Jan Beulich wrote:
On 17.01.2024 11:13, Roger Pau Monné wrote:
On Wed, Jan 17, 2024 at 09:46:27AM +0100, Jan Beulich wrote:
Whereas I assume the native kernel can deal with that as long as
it's built with CONFIG_RELOCATABLE=y. I don't think we want to
get into the business of interpreting the kernel's internal
representation of the relocations needed, so it's not really
clear to me what we might do in such a case. Perhaps the only way
is to signal to the kernel that it needs to apply relocations
itself (which in turn would require the kernel to signal to us
that it's capable of doing so). Cc-ing Roger in case he has any
neat idea.

Hm, no, not really.

We could do like multiboot2: the kernel provides us with some
placement data (min/max addresses, alignment), and Xen let's the
kernel deal with relocations itself.

Requiring the kernel's entry point to take a sufficiently different
flow then compared to how it's today, I expect.

Indeed, I would expect that.

Additionally we could support the kernel providing a section with the
relocations and apply them from Xen, but that's likely hm, complicated
at best, as I don't even know which kinds of relocations we would have
to support.

If the kernel was properly linked to a PIE, there'd generally be only
one kind of relocation (per arch) that ought to need dealing with -
for x86-64 that's R_X86_64_RELATIVE iirc. Hence why (I suppose) they
don't use ELF relocation structures (for being wastefully large), but
rather a more compact custom representation. Even without building PIE
(presumably in part not possible because of how per-CPU data needs
dealing with), they get away with handling just very few relocs (and
from looking at the reloc processing code I'm getting the impression
they mistreat R_X86_64_32 as being the same as R_X86_64_32S, when it
isn't; needing to get such quirks right is one more aspect of why I
think we should leave relocation handling to the kernel).

Would have to look into more detail, but I think leaving any relocs
for the OS to perform would be my initial approach.

I'm not sure how Linux deals with this in the bare metal case, are
relocations done after decompressing and before jumping into the entry
point?

That's how it was last time I looked, yes.

I've created a gitlab ticket for it:

https://gitlab.com/xen-project/xen/-/issues/180

So that we don't forget, as I don't have time to work into this right
now, but I think it's important enough that we don't forget.

For PV it's a bit more unclear how we want to deal with it, as it's
IMO a specific Linux behavior that makes it fail to boot.

Roger.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.