[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] device-tree: size first hwdom bank for boot modules


  • To: "Orzel, Michal" <michal.orzel@xxxxxxx>
  • From: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • Date: Mon, 25 May 2026 20:56:44 +0300
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=MNJt8cRHwyIcYi/0+vjGBOgpL03aVZwqgiIyAY5tsYs=; fh=vHQhu8+3w8fIT0NmGZ7Av5NNpAuuuKJ4YoBRPf5+ZME=; b=BdXiPIxhfgsWgcvVIto3ksspknOBwnhXaQb0qxXUv7B+rf3swIkQpOzgsqvCzxJei6 PkZnUK74wWf1IpYtuY4fCCexKu8uMPsNoVJ/RZz1eJKayPmVibunvuQV0egj0RDet/LB 15dJyOd5hHc/DGpaJCq8OW76zsxoHVR2jX4coF7MAugmwPO8vh43IAgvL2447ATBLobu ccQTJOpMtiuQfgf0Ar0+BHhuMdqDajeFLmzHvDPGDK8IX2uCuoPF2hkYGAnqY9INJwpB coHR578P3DCwCE7YPqVzgKQc96Innhi6lmM2YOf4x/DxfHqmxqLemficiE8Gc8pBPyJo mBYw==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1779731816; cv=none; d=google.com; s=arc-20240605; b=Q30f4cV3JS/rhIyQ7AJrubquxoxduo56cGIOxj++KQLLF1wA+QmF0Sns788QO3NTZ/ t/d5zoaoKz+kw+rGtj/19ZRcHn+pb9AjHUzSh4Jk5ldzkBg6oomKjk4wUorYYb3SqLgk gLBnhw5pABf73yswrUSoakv8PL+syM+w49s15VvP2Q1xOoT6zGn4SNERXwus1bjbH/ZC QsyfxnGT5V25WmmYkH6udmvyS6V5py+sNSXFqvCZg6cGZ/T/2Cj9T3huuvsMma+JX4iu 2I4sTfDrEoaQ3azfd+6LTWslWzc8V/Tl7oLucHfRfxt3aCfmyMCmo6iYnBi6Dj/o/QlG oTeA==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Content-Transfer-Encoding:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 25 May 2026 17:57:08 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Michal,

Thank you for the detailed review.

On Mon, May 25, 2026 at 10:35 AM Orzel, Michal <michal.orzel@xxxxxxx> wrote:
>
>
>
> On 17-May-26 22:57, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > With LLC coloring enabled, the hardware domain memory comes from
> > allocate_hwdom_memory(), not from the fixed direct-map banks used when
> > coloring is off.
> >
> > Commit de99f3263555 ("device-tree: Improve hwdom memory allocation for
> > DMA") made that allocator sort free host regions by ascending address so
> > Dom0 gets DMA-capable low memory first. The first bank filter still only
> > required 128MB. That can select a low region which is large enough for
> > the heuristic, but not large enough for place_modules() to put the Dom0
> Don't mention dom0 given that this is purely hwdom path.

Ack.

>
> > kernel, generated DTB and initrd contiguously in bank 0.
> >
> > Ask arch code for any additional first-bank size requirement. On Arm,
> > compute it from the actual Dom0 kernel placement, rounded initrd size and
> > generated DTB size hint. For 64-bit Image kernels, include the text offset
> > from the candidate bank start, because the returned requirement is compared
> > with a bank size measured from that start. The hint covers both the normal
> > Device Tree path and the minimal DTB created for ACPI boot.
> >
> > Check the first-bank threshold against the size which will actually be
> > assigned to Dom0, after capping the host region by the remaining unassigned
> > Dom0 memory. Otherwise a large host region could pass the test but still
> > produce a first guest bank too small for place_modules().
> >
> > Use the typed min()/max() helpers for this normal allocation arithmetic;
> > MIN()/MAX() are intended for preprocessor-style contexts and skip the type
> > checking provided by the lowercase helpers.
> >
> > This keeps the DMA-oriented allocation policy from de99f3263555 while
> > preventing a too-small bank 0 from reaching place_modules().
> >
> > Fixes: de99f3263555 ("device-tree: Improve hwdom memory allocation for DMA")
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > Test/setup notes:
> >
> > The failure was reproduced on a Renesas H3ULCB/R-Car H3 (r8a7795)
> > arm64 board booted through U-Boot/TFTP and using huge initrd.
> >
> > Relevant Xen command line excerpt:
> >   dom0_mem=2048M llc-coloring=on
> >
> > Boot module layout from Xen:
> >   MODULE[2]: 0x0000000084000040-0x000000008e75d92f Ramdisk
> >   MODULE[3]: 0x00000000a0000000-0x00000000a3ffffff Kernel
> >   MODULE[4]: 0x00000000a4000000-0x00000000a400ffff XSM Policy
> >
> > The initrd is about 168MB. With LLC coloring enabled and the low-address
> > allocation policy from de99f3263555, Dom0 can receive a 192MB first bank:
> >   d0 BANK[0] 0x00000048000000-0x00000054000000 (192MB)
> >
> > That bank satisfies the old 128MB minimum but is too small for the
> > rounded Dom0 kernel, generated DTB and initrd placement. The observed
> > failure before this patch was:
> >   Panic on CPU 0:
> >   Not enough memory in the first bank for the kernel+dtb+initrd
> >
> > With this patch, the same boot skips the too-small low region for bank 0
> > and reaches Dom0:
> >   d0 BANK[0] 0x00000057000000-0x00000084000000 (720MB)
> >   d0 BANK[1] 0x0000008e800000-0x000000c0000000 (792MB)
> >   d0 BANK[2] 0x00000500000000-0x00000521800000 (536MB)
> >   d0: extended region 0: 0x48000000->0x54000000
> >   Loading zImage from 0x00000000a0000000 to 0x57000000-0x5b000000
> >   Loading d0 initrd from 0x0000000084000040 to 0x5f200000-0x6995d8f0
> >   Loading d0 DTB to 0x5f000000-0x5f011c80
> >   Linux version 5.10.194-yocto-standard
> > ---
> >  xen/arch/arm/acpi/domain_build.c        |  2 --
> >  xen/arch/arm/domain_build.c             |  8 ++++++
> >  xen/arch/arm/include/asm/domain_build.h |  4 +++
> >  xen/arch/arm/include/asm/kernel.h       |  8 ++++++
> >  xen/arch/arm/kernel.c                   | 35 +++++++++++++++++++++++++
> >  xen/common/device-tree/domain-build.c   | 27 ++++++++++++++-----
> >  xen/include/xen/fdt-kernel.h            |  8 ++++++
> >  7 files changed, 83 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/arm/acpi/domain_build.c 
> > b/xen/arch/arm/acpi/domain_build.c
> > index 249d899c33..db16f7fa94 100644
> > --- a/xen/arch/arm/acpi/domain_build.c
> > +++ b/xen/arch/arm/acpi/domain_build.c
> > @@ -26,8 +26,6 @@
> >  #undef virt_to_mfn
> >  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> >
> > -#define ACPI_DOM0_FDT_MIN_SIZE 4096
> > -
> >  static int __init acpi_iomem_deny_access(struct domain *d)
> >  {
> >      acpi_status status;
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 1efddc60ef..226e053c68 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -115,6 +115,14 @@ int __init parse_arch_dom0_param(const char *s, const 
> > char *e)
> >                               (IS_ENABLED(CONFIG_STATIC_SHM) ?         \
> >                                (NR_SHMEM_BANKS * (160 + 16)) : 0))
> >
> > +paddr_t __init dom0_get_fdt_size_hint(void)
> > +{
> > +    if ( !acpi_disabled )
> > +        return ACPI_DOM0_FDT_MIN_SIZE;
> > +
> > +    return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
> > +}
> > +
> >  unsigned int __init dom0_max_vcpus(void)
> >  {
> >      if ( opt_dom0_max_vcpus == 0 )
> > diff --git a/xen/arch/arm/include/asm/domain_build.h 
> > b/xen/arch/arm/include/asm/domain_build.h
> > index df8b361b3d..45687c5d6f 100644
> > --- a/xen/arch/arm/include/asm/domain_build.h
> > +++ b/xen/arch/arm/include/asm/domain_build.h
> > @@ -19,6 +19,10 @@ int prepare_acpi(struct domain *d, struct kernel_info 
> > *kinfo);
> >
> >  int add_ext_regions(unsigned long s_gfn, unsigned long e_gfn, void *data);
> >
> > +#define ACPI_DOM0_FDT_MIN_SIZE 4096
> > +
> > +paddr_t dom0_get_fdt_size_hint(void);
> > +
> >  #if defined(CONFIG_MPU) && defined(CONFIG_ARM_64)
> >  /* Utility function to determine if an Armv8-R processor supports VMSA. */
> >  bool has_v8r_vmsa_support(void);
> > diff --git a/xen/arch/arm/include/asm/kernel.h 
> > b/xen/arch/arm/include/asm/kernel.h
> > index 21f4273fa1..17c5b9bce4 100644
> > --- a/xen/arch/arm/include/asm/kernel.h
> > +++ b/xen/arch/arm/include/asm/kernel.h
> > @@ -8,12 +8,20 @@
> >
> >  #include <asm/domain.h>
> >
> > +#include <xen/types.h>
> > +
> > +struct kernel_info;
> > +
> >  struct arch_kernel_info
> >  {
> >      /* Enable pl011 emulation */
> >      bool vpl011;
> >  };
> >
> > +#define arch_get_min_first_bank_size arch_get_min_first_bank_size
> > +paddr_t arch_get_min_first_bank_size(struct kernel_info *info,
> > +                                     paddr_t bank_start);
> > +
> >  #endif /* #ifdef __ARCH_ARM_KERNEL_H__ */
> >
> >  /*
> > diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> > index b72585b7fe..3644663e2f 100644
> > --- a/xen/arch/arm/kernel.c
> > +++ b/xen/arch/arm/kernel.c
> > @@ -128,6 +128,41 @@ static paddr_t __init kernel_zimage_place(struct 
> > kernel_info *info)
> >      return load_addr;
> >  }
> >
> > +static paddr_t __init kernel_placement_size(paddr_t load_addr, paddr_t len)
> > +{
> > +    return ROUNDUP(load_addr + len, MB(2)) - load_addr;
> Used from one site. place_modules() open-codes the same
> expression; the CONFIG_HAS_DOMAIN_TYPE branch below is
> kernel_placement_size(load_addr, len) + text_offset. Either drop the helper or
> use it consistently.

Ack, I will use the helper consistently.

>
> > +}
> > +
> > +paddr_t __init arch_get_min_first_bank_size(struct kernel_info *info,
> info is RO, so const please.

Ack.

>
> > +                                            paddr_t bank_start)
> > +{
> > +    const struct boot_module *mod = info->bd.initrd;
> Why mod instead of initrd? - choose more meaningful names

Ack.

>
> > +    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> It would be nice to explain why 2MB, at least to say that it mirrors
> place_modules rounding.

Ack, I will add a comment that this mirrors the 2MB rounding used by
place_modules().

>
> > +    const paddr_t dtb_len = ROUNDUP(dom0_get_fdt_size_hint(), MB(2));
> > +    paddr_t kernsize;
> > +
> > +#ifdef CONFIG_HAS_DOMAIN_TYPE
> > +    if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> > +    {
> > +        paddr_t load_addr = bank_start + info->image.text_offset;
> > +
> > +        /*
> > +         * The caller compares this value with a size measured from
> > +         * bank_start, so include the text offset before the kernel.
> The comment belongs above the load_addr line above.

Ack.

>
> > +         */
> > +        kernsize = ROUNDUP(load_addr + info->image.len, MB(2)) - 
> > bank_start;
> > +        return kernsize + initrd_len + dtb_len;
> Could it be written as:
> info->image.text_offset + kernel_placement_size(load_addr, info->image.len)

Ack, I will rewrite it that way.

>
> > +    }
> > +#endif
> > +
> > +    if ( info->image.start == 0 )
> > +        kernsize = ROUNDUP(info->image.len, MB(2));
> > +    else
> > +        kernsize = kernel_placement_size(info->image.start, 
> > info->image.len);
> This could be written as a single expression, no need for if/else:
> kernsize = kernel_placement_size(info->image.start, info->image.len);

Ack. You are right, kernel_placement_size(0, len) is mathematically
equivalent to ROUNDUP(len, MB(2)). I will simplify this to a single
expression in the next version.

>
> > +
> > +    return kernsize + initrd_len + dtb_len;
> > +}
> > +
> >  static void __init kernel_zimage_load(struct kernel_info *info)
> >  {
> >      paddr_t load_addr = kernel_zimage_place(info);
> > diff --git a/xen/common/device-tree/domain-build.c 
> > b/xen/common/device-tree/domain-build.c
> > index 2a760b007b..d8865db259 100644
> > --- a/xen/common/device-tree/domain-build.c
> > +++ b/xen/common/device-tree/domain-build.c
> > @@ -299,20 +299,33 @@ static bool __init allocate_hwdom_memory(struct 
> > kernel_info *kinfo)
> >
> >      for ( i = 0; (kinfo->unassigned_mem > 0) && (i < nr_banks); i++ )
> >      {
> > -        paddr_t bank_size;
> > +        const paddr_t bank_start = hwdom_free_mem->bank[i].start;
> > +        paddr_t bank_size = hwdom_free_mem->bank[i].size;
> > +
> > +        /*
> > +         * Check the size that would actually be assigned, not just the 
> > size
> > +         * of the host region.
> > +         */
> > +        bank_size = min(bank_size, kinfo->unassigned_mem);
> >
> >          /*
> >           * The first bank must be large enough for place_modules() to
> >           * fit the kernel, DTB and initrd.  Skip small regions to avoid
> >           * ending up with a tiny first bank.
> >           */
> > -        if ( !mem->nr_banks && (hwdom_free_mem->bank[i].size < 
> > min_bank_size) )
> > -            continue;
> > +        if ( !mem->nr_banks )
> > +        {
> > +            paddr_t arch_min_size;
> > +            paddr_t required_first_bank_size;
> > +
> > +            arch_min_size = arch_get_min_first_bank_size(kinfo, 
> > bank_start);
> > +            required_first_bank_size = max(min_bank_size, arch_min_size);
> > +
> > +            if ( bank_size < required_first_bank_size )
> > +                continue;
> > +        }
> >
> > -        bank_size = MIN(hwdom_free_mem->bank[i].size, 
> > kinfo->unassigned_mem);
> > -        if ( !allocate_bank_memory(kinfo,
> > -                                   
> > gaddr_to_gfn(hwdom_free_mem->bank[i].start),
> > -                                   bank_size) )
> > +        if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(bank_start), 
> > bank_size) )
> >          {
> >              xfree(hwdom_free_mem);
> >              return false;
> > diff --git a/xen/include/xen/fdt-kernel.h b/xen/include/xen/fdt-kernel.h
> > index 8cd1670c2c..931b3e1686 100644
> > --- a/xen/include/xen/fdt-kernel.h
> > +++ b/xen/include/xen/fdt-kernel.h
> > @@ -86,6 +86,14 @@ kernel_info_get_mem_const(const struct kernel_info 
> > *kinfo)
> >      return container_of(&kinfo->mem.common, const struct membanks, common);
> >  }
> >
> > +#ifndef arch_get_min_first_bank_size
> > +static inline paddr_t arch_get_min_first_bank_size(struct kernel_info 
> > *info,
> > +                                                   paddr_t bank_start)
> > +{
> > +    return 0;
> > +}
> > +#endif
> > +
> >  #ifndef KERNEL_INFO_SHM_MEM_INIT
> >
> >  #ifdef CONFIG_STATIC_SHM
>
> As for the upstream CI test, we should start with more generic tests that we 
> are
> missing (like regular LLC boot) before thinking of covering more granular
> scenarios. Also, in the past we agreed on first covering the supported 
> features
> before adding tests for unsupported ones.

Ack, that makes sense.

Best regards,
Mykola



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.