[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] device-tree: size first hwdom bank for boot modules


  • To: dmukhin@xxxxxxxx
  • From: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • Date: Fri, 22 May 2026 22:07:31 +0300
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=U6FoAH0aljJtCakQnRGXKX2kfxj3ossrbBz1P1HqOv4=; fh=9JZxiMpSfuLqiQWy+/Qm7fbwQ+MYsT4GTch2ySqG6FU=; b=kN04v0p4bcXxGgtohsc+NjedHp33yL763aL8NljnP4x7q3mjV5+TZmyVR5sXe96whe xPrv3VZQrtNxp0qCCEHHG4KwucgEWkILDXe1EfwE70RFOLmPH9xfMCkUQoCPfAg4LmQb YLHGCHvi/pd0KtwLwRctirCqa9J4ZbPCphjIWp26IGAJAXT+B9aCl0OTmbQNhtV9ldSH 9vdwTWWEUtK2sD6D7S5OW3JrfSEOiupvJPoU7R7jMMfOgEwM23HBSq9nAUKhQdBI/0JS DA8DMnC1/X36YEyhWuJNalM0tbRDslAJS2pgLoi2IJh5K4u7AdLgdKfaO0Bvsbqkc0jE O9wQ==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1779476864; cv=none; d=google.com; s=arc-20240605; b=cheVRrRQ+QupfbxZ6OvPzAt2xs5cGkuyQ6rLCE9P0BUQkU8DGKzRXTUwN0W///uOGb S1BkvrUpVeQXFeriQI1Mwla1sdF3EvhoNfnTc2FpSrpSr8l07PB4eYsJvTYAx7FhHlOX 1UwhrTGI6Y6dbJAONh9XA6TS20x8NlpbTGhrspCGaa26y4jNWYgSiXRTsvH3nwmZWWDF /q27hc5ia31O5VOgluC7DnyL5k5OJuHio/Z+Bp3JY2kZo4DVMslf2DI6ljoO1GgFN6hH 4trYU/v9THJ/NzcfLUdqA4o5p7hCAQQH3yMdbQBb++dHiko2JPF9i5VDdZCt2f6+gCMO 3sMg==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Content-Transfer-Encoding:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 22 May 2026 19:08:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Denis,

Thank you for the review.

On Fri, May 22, 2026 at 1:10 AM <dmukhin@xxxxxxxx> wrote:
>
> Hi Mykola,
>
> The patch looks good!
>
> I would try to add a CI coverage for QEMU aarch64 tests, since QEMU
> supports multiple RAM banks topology.
>
> What do you think?

Sounds good to me. CI coverage would be useful for this case.

The test would need LLC coloring enabled and a low free region that
passes the old 128MB threshold while still being too small for the
kernel, generated DTB and initrd placement. This should be possible to
model in QEMU with suitable reserved-memory holes in the DT.

>
> Also, few remarks below.
>
> On Sun, May 17, 2026 at 11:57:56PM +0300, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > With LLC coloring enabled, the hardware domain memory comes from
> > allocate_hwdom_memory(), not from the fixed direct-map banks used when
> > coloring is off.
> >
> > Commit de99f3263555 ("device-tree: Improve hwdom memory allocation for
> > DMA") made that allocator sort free host regions by ascending address so
> > Dom0 gets DMA-capable low memory first. The first bank filter still only
> > required 128MB. That can select a low region which is large enough for
> > the heuristic, but not large enough for place_modules() to put the Dom0
> > kernel, generated DTB and initrd contiguously in bank 0.
> >
> > Ask arch code for any additional first-bank size requirement. On Arm,
> > compute it from the actual Dom0 kernel placement, rounded initrd size and
> > generated DTB size hint. For 64-bit Image kernels, include the text offset
> > from the candidate bank start, because the returned requirement is compared
> > with a bank size measured from that start. The hint covers both the normal
> > Device Tree path and the minimal DTB created for ACPI boot.
> >
> > Check the first-bank threshold against the size which will actually be
> > assigned to Dom0, after capping the host region by the remaining unassigned
> > Dom0 memory. Otherwise a large host region could pass the test but still
> > produce a first guest bank too small for place_modules().
> >
> > Use the typed min()/max() helpers for this normal allocation arithmetic;
> > MIN()/MAX() are intended for preprocessor-style contexts and skip the type
> > checking provided by the lowercase helpers.
> >
> > This keeps the DMA-oriented allocation policy from de99f3263555 while
> > preventing a too-small bank 0 from reaching place_modules().
> >
> > Fixes: de99f3263555 ("device-tree: Improve hwdom memory allocation for DMA")
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > Test/setup notes:
> >
> > The failure was reproduced on a Renesas H3ULCB/R-Car H3 (r8a7795)
> > arm64 board booted through U-Boot/TFTP and using huge initrd.
> >
> > Relevant Xen command line excerpt:
> >   dom0_mem=2048M llc-coloring=on
> >
> > Boot module layout from Xen:
> >   MODULE[2]: 0x0000000084000040-0x000000008e75d92f Ramdisk
> >   MODULE[3]: 0x00000000a0000000-0x00000000a3ffffff Kernel
> >   MODULE[4]: 0x00000000a4000000-0x00000000a400ffff XSM Policy
> >
> > The initrd is about 168MB. With LLC coloring enabled and the low-address
> > allocation policy from de99f3263555, Dom0 can receive a 192MB first bank:
> >   d0 BANK[0] 0x00000048000000-0x00000054000000 (192MB)
> >
> > That bank satisfies the old 128MB minimum but is too small for the
> > rounded Dom0 kernel, generated DTB and initrd placement. The observed
> > failure before this patch was:
> >   Panic on CPU 0:
> >   Not enough memory in the first bank for the kernel+dtb+initrd
> >
> > With this patch, the same boot skips the too-small low region for bank 0
> > and reaches Dom0:
> >   d0 BANK[0] 0x00000057000000-0x00000084000000 (720MB)
> >   d0 BANK[1] 0x0000008e800000-0x000000c0000000 (792MB)
> >   d0 BANK[2] 0x00000500000000-0x00000521800000 (536MB)
> >   d0: extended region 0: 0x48000000->0x54000000
> >   Loading zImage from 0x00000000a0000000 to 0x57000000-0x5b000000
> >   Loading d0 initrd from 0x0000000084000040 to 0x5f200000-0x6995d8f0
> >   Loading d0 DTB to 0x5f000000-0x5f011c80
> >   Linux version 5.10.194-yocto-standard
> > ---
> >  xen/arch/arm/acpi/domain_build.c        |  2 --
> >  xen/arch/arm/domain_build.c             |  8 ++++++
> >  xen/arch/arm/include/asm/domain_build.h |  4 +++
> >  xen/arch/arm/include/asm/kernel.h       |  8 ++++++
> >  xen/arch/arm/kernel.c                   | 35 +++++++++++++++++++++++++
> >  xen/common/device-tree/domain-build.c   | 27 ++++++++++++++-----
> >  xen/include/xen/fdt-kernel.h            |  8 ++++++
> >  7 files changed, 83 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/arm/acpi/domain_build.c 
> > b/xen/arch/arm/acpi/domain_build.c
> > index 249d899c33..db16f7fa94 100644
> > --- a/xen/arch/arm/acpi/domain_build.c
> > +++ b/xen/arch/arm/acpi/domain_build.c
> > @@ -26,8 +26,6 @@
> >  #undef virt_to_mfn
> >  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> >
> > -#define ACPI_DOM0_FDT_MIN_SIZE 4096
> > -
> >  static int __init acpi_iomem_deny_access(struct domain *d)
> >  {
> >      acpi_status status;
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 1efddc60ef..226e053c68 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -115,6 +115,14 @@ int __init parse_arch_dom0_param(const char *s, const 
> > char *e)
> >                               (IS_ENABLED(CONFIG_STATIC_SHM) ?         \
> >                                (NR_SHMEM_BANKS * (160 + 16)) : 0))
> >
> > +paddr_t __init dom0_get_fdt_size_hint(void)
> > +{
> > +    if ( !acpi_disabled )
> > +        return ACPI_DOM0_FDT_MIN_SIZE;
> > +
> > +    return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
>
> I would invert the condition so it is read more straightforward:
>
>     if ( acpi_disabled )
>         return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
>
>     return ACPI_DOM0_FDT_MIN_SIZE;

Ack, I will change it.

>
> [..]
> > diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> > index b72585b7fe..3644663e2f 100644
> > --- a/xen/arch/arm/kernel.c
> > +++ b/xen/arch/arm/kernel.c
> > @@ -128,6 +128,41 @@ static paddr_t __init kernel_zimage_place(struct 
> > kernel_info *info)
> >      return load_addr;
> >  }
> >
> > +static paddr_t __init kernel_placement_size(paddr_t load_addr, paddr_t len)
> > +{
> > +    return ROUNDUP(load_addr + len, MB(2)) - load_addr;
> > +}
> > +
> > +paddr_t __init arch_get_min_first_bank_size(struct kernel_info *info,
> > +                                            paddr_t bank_start)
> > +{
> > +    const struct boot_module *mod = info->bd.initrd;
> > +    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> > +    const paddr_t dtb_len = ROUNDUP(dom0_get_fdt_size_hint(), MB(2));
> > +    paddr_t kernsize;
> > +
> > +#ifdef CONFIG_HAS_DOMAIN_TYPE
>
> Perhaps use `IS_ENABLED(CONFIG_HAS_DOMAIN_TYPE)` to reduce ifdefery?
> My impression that IS_ENABLED() is preferred.

I think the #ifdef is needed here. This code is in the common Arm kernel
path and can still be built for 32-bit Arm, while both the `type` field in
struct kernel_info and DOMAIN_64BIT are only available when
CONFIG_HAS_DOMAIN_TYPE is enabled. Also, the `text_offset` field used in
this branch is only present for ARM_64.

IS_ENABLED() would make the condition a compile-time constant, but the
compiler would still need to parse and type-check the expression in
configurations where those fields/symbols do not exist.

>
> > +    if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> > +    {
> > +        paddr_t load_addr = bank_start + info->image.text_offset;
> > +
> > +        /*
> > +         * The caller compares this value with a size measured from
> > +         * bank_start, so include the text offset before the kernel.
> > +         */
> > +        kernsize = ROUNDUP(load_addr + info->image.len, MB(2)) - 
> > bank_start;
> > +        return kernsize + initrd_len + dtb_len;
> > +    }
> > +#endif
> > +
> > +    if ( info->image.start == 0 )
>
> Here too: invert the condition?
>
>        if ( info->image.start )
>            kernsize = kernel_placement_size(info->image.start, 
> info->image.len);
>        else
>            ...

Sounds good to me. CI coverage would be useful for this case.

The test would need LLC coloring enabled and a low free region that
passes the old 128MB threshold while still being too small for the
kernel, generated DTB and initrd placement. I think this should be
possible to model in QEMU by using a suitable DT memory layout, for
example with reserved-memory holes.

~Mykola



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.