[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 07/11] xen/riscv: add Linux kernel loading support





On 5/4/26 4:05 PM, Jan Beulich wrote:
On 28.04.2026 16:33, Oleksii Kurochko wrote:
--- /dev/null
+++ b/xen/arch/riscv/kernel.c
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/bug.h>
+#include <xen/compiler.h>
+#include <xen/errno.h>
+#include <xen/fdt-kernel.h>
+#include <xen/guest_access.h>
+#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
+#include <xen/mm.h>
+#include <xen/types.h>
+#include <xen/vmap.h>
+
+#include <asm/setup.h>
+
+#define IMAGE64_MAGIC_V2 0x05435352 /* Magic number 2, le, "RSC\x05" */
+
+static void __init place_modules(struct kernel_info *info, paddr_t kernbase,
+                                 paddr_t kernend)
+{
+    const struct boot_module *mod = info->bd.initrd;
+    const struct membanks *banks = kernel_info_get_mem_const(info);
+    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0,
+                                       KERNEL_LOAD_ADDR_ALIGNMENT);
+    const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt),
+                                    KERNEL_LOAD_ADDR_ALIGNMENT);


Why would modules need to be this strongly aligned?
No specific reason except to be aligned with similar alignment below, it could be lesser (PAGE_SIZE or even just unsigned long aligned) or even dropped, I think. It was just easier then to calculate aligned addresses. But I don't see any big issue to have such alignments except maybe that it will waste some memory.


+    const paddr_t modsize = initrd_len + dtb_len;
+    unsigned int bi = banks->nr_banks;
+
+    BUG_ON(modsize < initrd_len);

Where's the earlier check that allows this to be BUG_ON()?

There isn't one. I will replace that with:

if ( modsize < initrd_len )
    panic("Module size overflow: initrd + dtb size wraps paddr_t\n");



+    /*
+     * Place modules as high in RAM as possible, scanning banks from
+     * last to first so that the end of the last bank is preferred.
+     */
+    while ( bi-- > 0 )
+    {
+        const struct membank *bank = &banks->bank[bi];
+        const paddr_t bank_end = bank->start + bank->size;
+        paddr_t modbase;
+
+        if ( modsize > bank->size )
+            continue;
+
+        modbase = ROUNDDOWN(bank_end - modsize, KERNEL_LOAD_ADDR_ALIGNMENT);

Same question here.

I used KERNEL_LOAD_ADDR_ALIGNMENT to be sure that big page tables be potentially used in page table.


+        if ( modbase < bank->start )
+            continue;
+
+        /*
+         * If modules would overlap the kernel, try placing them below it.
+         */

With how kernel_image_place() works, and with the heavy alignment applied
above, is this even possible to succeed? Oh, wait, yes - for the not-
position-independent case.

I don't understand what is wrong with putting initrd and dtb to basically any place except where kernel is expected to be placed (as in case of position dependent case I assume it is pretty crucial).


+        if ( (modbase < ROUNDUP(kernend, KERNEL_LOAD_ADDR_ALIGNMENT)) &&
+             (modbase + modsize > kernbase) )
+        {
+            modbase = ROUNDDOWN(kernbase - modsize, 
KERNEL_LOAD_ADDR_ALIGNMENT);

What prevents this subtraction from underflowing?

I will put the following check at the start of the place_modules() function:
if ( kernbase < modsize )
   panic("Underflow could happen between kernbase and modsize\n");



+static paddr_t __init kernel_image_place(struct kernel_info *info)
+{
+    paddr_t load_addr = INVALID_PADDR;
+    uint64_t image_size = info->image.image_size ?: info->image.len;
+    const struct membanks *banks = kernel_info_get_mem_const(info);
+    unsigned int nr_banks = banks->nr_banks;
+    unsigned int bi;
+
+    dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);

Did you mean to drop this before submitting?

Sure, it should be dropped.


+    /*
+     * At the moment, RISC-V's Linux kernel should be always position
+     * independent based on "Per-MMU execution" of boot.rst:
+     *   https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
+     *
+     * But just for the case when RISC-V's Linux kernel isn't position
+     * independent it is needed to take load address from
+     * info->image.start.
+     *
+     * If `start` is zero, the Image is position independent.
+     */
+    if ( likely(!info->image.start) )
+    {
+        for ( bi = 0; bi != nr_banks; bi++ )
+        {
+            const struct membank *bank = &banks->bank[bi];
+            paddr_t bank_start = bank->start;
+            /*
+             * According to boot.rst kernel load address should be properly
+             * aligned:
+             *   https://docs.kernel.org/arch/riscv/boot.html#kernel-location
+             *
+             * As Image in this case is PIC we can ignore
+             * info->image.text_offset.
+             */
+            paddr_t aligned_start = ROUNDUP(bank_start, 
KERNEL_LOAD_ADDR_ALIGNMENT);
+            paddr_t bank_end = bank_start + bank->size;
+            paddr_t bank_size;
+
+            if ( aligned_start > bank_end )
+                continue;
+
+            bank_size = bank_end - aligned_start;
+
+            dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, 
bank->start);

And this one? (I also find it puzzling that ->start would be of (primary) 
interest
here, when ...

+            if ( image_size <= bank_size )

... bank_size is what's relevant.

This one should be dropped too.


+/* Check if the image is a 64-bit Image */
+static int __init kernel_image64_probe(struct kernel_info *info,
+                                       paddr_t addr, paddr_t size)
+{
+    /* https://www.kernel.org/doc/Documentation/riscv/boot-image-header.rst */
+    struct {
+        uint32_t code0;         /* Executable code */
+        uint32_t code1;         /* Executable code */
+        uint64_t text_offset;   /* Image load offset, little endian */
+        uint64_t image_size;    /* Effective Image size, little endian */
+        uint64_t flags;         /* kernel flags, little endian */
+        uint32_t version;       /* Version of this header */
+        uint32_t res1;          /* Reserved */
+        uint64_t res2;          /* Reserved */
+        uint64_t magic;         /* Deprecated: Magic number, little endian, 
"RISCV" */
+        uint32_t magic2;        /* Magic number 2, little endian, "RSC\x05" */
+        uint32_t res3;          /* Reserved for PE COFF offset */
+    } image;
+    uint64_t effective_size;
+
+    if ( size < sizeof(image) )
+        return -EINVAL;
+
+    copy_from_paddr(&image, addr, sizeof(image));
+
+    /* Magic v1 is deprecated and may be removed.  Only use v2 */
+    if ( le32_to_cpu(image.magic2) != IMAGE64_MAGIC_V2 )
+        return -EINVAL;
+
+    effective_size = le64_to_cpu(image.image_size);
+
+    if ( effective_size && size > effective_size )
+        return -EINVAL;

Is the rhs of the && the wrong way round? If effective_size > size,
aren't you in trouble? Question of course is what "effective" really
means. Yet in any event it seems dubious to me that effective_size <
size would really be a problem. IOW this will want commenting upon
if the check is to stay.

Actually ...

+    info->image.kernel_addr = addr;
+    /* Actual size in the binary file */
+    info->image.len = size;
+    /* Total memory the kernel occupies at runtime */
+    info->image.image_size = effective_size;

... this looks to suggest something .bss-like.

Yes, effective_size it is size which included .bss.

size it of LK after decompression of Image.gz and it doesn't include .bss so it should be lesser then effective_size.

I don't think that I am in trouble that effective_size is bigger then size if we allocate enough space in memory effective_size is fine to be bigger.

It is a good question if effective_size < size is a problem. I think it
isn't but could it be really happen?
I think that I am okay to drop that part of if().


--- a/xen/include/xen/fdt-kernel.h
+++ b/xen/include/xen/fdt-kernel.h
@@ -59,8 +59,15 @@ struct kernel_info {
          struct {
              paddr_t kernel_addr;
              paddr_t len;
-#if defined(CONFIG_ARM_64) || defined(CONFIG_RISCV_64)
-            paddr_t text_offset; /* 64-bit Image only */
+#if defined(CONFIG_ARM_64) || defined(CONFIG_RISCV)
+            /*
+             * ARM: 64-bit Image only.
+             * RISC-V: both 32-bit and 64-bit Images.
+             */
+            paddr_t text_offset;
+#endif
+#if defined(CONFIG_RISCV)
+            uint64_t image_size; /* Effective size of Image */

As this (apparently) is for both RV64 and RV32 - can the latter really have
wider than 32-bit image sizes? If not - use size_t or unsigned long here?

Agree, unsigned long should be enough.

Thanks.

~ Oleksii





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.