[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Jason Andryuk <jason.andryuk@xxxxxxx>
  • Date: Thu, 7 Mar 2024 11:07:52 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IZfJHW4+qhmx6Ya7GyNzxiynp2FfaAyMQKvtIGqcTsY=; b=g7dRg2Lc+wojhi4bfCHZJ/OdF+RQCOdpN5BCX3HbDVZ/gVQRiJx9DvfpB4PmSkpyWIDuE46CnIlcFh5ks5EFMRTY/mf5G42r54UtBgTjXoFCmUwv5gHp7OW6i8/6lTeZiQs1BE5toi+aV8KgFB1Mr163SjdBCZtYzN2GPrCOQMK/l0CoRjcAaC2kblytMkhmkE2hgGnEkgAU3JK4Ioehy0iIcus5j3JbvbhtDo5Ws2Kxq0DCPjw2B5SguSrJ/w5NDZyrpGIOw3LXv2nB83sfQ+QM0twKiSe6UvDuOF3UA2OIdY5ZbbMiEGDOHkrRe/nlWZh/6KcJZtsUkEbYcIprTQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ASCg67Cn/6H8V3IvUD4rsYP1H87Z1z4VUq6D+VJc5NttscSAQLCZSerFugoz2pS9wBWqBuNjdaRSoPcXZHgiu//MINIIAl4MfkHHfmrzt75WkcgTaD0WjOVYyDrWEFE3k/uSqGcpcbaSafjUS7LTnkWsssqnP1wIDh/ZbTQLMNJ/uGEJUlmkdVQhq2Q/2I5vAiXFVzA/3HqxLleTwIB9PK6somyMO8R2+f2snymd8DyrbYiWBI0rlWKkuMLgJopuVw83LLaqXpUs4RN1e5X6U1R8ZdCVkV0ad0gtRUcag6uPqRwaXX7WdbdzdaXbRthliiCO+PB2baipHRdA3JfsIA==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>
  • Delivery-date: Thu, 07 Mar 2024 16:12:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 2024-03-06 21:09, Stefano Stabellini wrote:
On Wed, 6 Mar 2024, Jason Andryuk wrote:
Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The PVH entry code is not relocatable - it loads from absolute
addresses, which fail when the kernel is loaded at a different address.
With a suitably modified kernel, a reloctable entry point is possible.

Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it
supports a relocatable entry path.

Change the loading to check for an acceptable load address.  If the
kernel is relocatable, support finding an alternate load address.

Linux cares about its physical alignment.  This can be pulled out of the
bzImage header, but not from the vmlinux ELF file.  If an alignment
can't be found, use 2MB.

Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
---
Put alignment as a new ELF note?  Presence of that note would indicate
relocation support without a new XENFEAT flag.

Default alignment to a multiple of 2MB to make more cases work?  It has
to be a power of two, and a multiple might allow loading a customized
kernel.  A larger alignment would limit the number of possible load
locations.
---
  xen/arch/x86/hvm/dom0_build.c | 109 ++++++++++++++++++++++++++++++++++
  xen/include/public/features.h |   5 ++
  2 files changed, 114 insertions(+)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bbae8a5645..34d68ee8fb 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -537,6 +537,109 @@ static paddr_t __init find_memory(
      return INVALID_PADDR;
  }
+static bool __init check_load_address(
+    const struct domain *d, const struct elf_binary *elf)
+{
+    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
+    paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size,
+                                 PAGE_SIZE);
+    unsigned int i;
+
+    /*
+     * The memory map is sorted and all RAM regions starts and sizes are
+     * aligned to page boundaries.
+     */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
+
+        if ( start <= kernel_start &&
+             end >= kernel_end &&
+             d->arch.e820[i].type == E820_RAM )
+            return true;
+    }
+
+    return false;
+}
+
+/*
+ * Find an e820 RAM region that fits the kernel at a suitable alignment.
+ */
+static paddr_t find_kernel_memory(
+    const struct domain *d, struct elf_binary *elf, paddr_t align)
+{
+    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
+    paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size,
+                                 PAGE_SIZE);
+    unsigned int i;
+
+    /*
+     * The memory map is sorted and all RAM regions starts and sizes are
+     * aligned to page boundaries.
+     */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
+        paddr_t kstart, kend, offset;
+
+        if ( d->arch.e820[i].type != E820_RAM )
+            continue;
+
+        if ( d->arch.e820[i].size < elf->dest_size )
+            continue;
+
+        if ( end < kernel_end )
+            continue;

Why this check? Is it to make sure we look for e820 regions that are
higher in terms of addresses? If so, couldn't we start from
d->arch.nr_e820 and go down instead of starting from 0 and going up?

Yes, I thought we only wanted a higher address. The Linux bzImage entry code uses the LOAD_PHYSICAL_ADDR (CONFIG_PHYSICAL_START/elf->dest_base) as a minimum when extracting vmlinux for a relocatable kernel. I'm not sure if that is strictly required though.

I also thought a smaller adjustment would be better, so starting from the lower e820 entries would find the first acceptable one. But that may not matter.

The PVH entry point is a 32-bit entry point if I remember right? Do we
need a 32-bit check? If so then it might not be a good idea to start
from arch.nr_e820 and go down.

Yes, the entry point is 32-bit, so you are correct that that the range should to be limited at 4GB.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.