[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] device-tree: size first hwdom bank for boot modules


  • To: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • From: dmukhin@xxxxxxxx
  • Date: Thu, 21 May 2026 15:10:35 -0700
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 205.220.161.53) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=ford.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=ford.com; dkim=pass (signature was verified) header.d=saarlouis.ford.com; dkim=pass (signature was verified) header.d=ford.com; arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7djpnDKXX78zsY55yf381FaHrzR2Z9YAIdLj82cXheE=; b=jdBrKCaY2+faPSyMIGPk7oXmLzlEpdG30G5HyLqvsBB1uYz6KAiT/oI/c8po48idGcu5tXVQKPLf23ZlXA8Drsfi4J4mZ6qKFFQ+FgcFHHtntHYyAU+VTPhiwTlJsvDTKsi78x3N6XhjY4bPoaJbvNKGljqsKb9p3OweweZcU0N0KYAAN+2Uju9yge6soNbX9An1M0XIu7jRLrWS6gEi8N8+7wN3TL+E3xTVcfv0ugh1ajY8pVCAt1nUdvs0D2iW2PlnaXhOZtDKz1jf/nWPLUshVkDJEtNgQKYoMg8Sl0kfjII57hdiQjJyJHH3g3sg3ZCQ9iH08lbqkTX2GG2z+Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=E6eQ4PURLahNmnrbuJopVONKWLGdQTvPArvExhx3KWxdq3PnVipxbfQElIE8y64Kak5wqkevCSbmwNt+HtcCu6E/A/HrgQAxBEtdPyjxovKmj03rRdv6D26Ys4EZOEE5X8spjLCntNbCduUSHZbARDWh/3c0310dfyqrs3SyH1BxzSJ1uU7OxMV8xMcCBK+iG8zBEh8uhmD9H2oRpLnRT34G0kLTvLj1A0OZMwwUTBXYxB4IFQfq1rgcTmjgCBITEZW/SAMLZr0GrEXoh7mihDjIOrd2z+aes3orBb8Ui3Joc6cFOvSwopFrP2BwSgVeWK+9rYKcPaHzuSymFTJN9A==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=ppford header.d=ford.com header.i="@ford.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=selector2-azureford-onmicrosoft-com header.d=azureford.onmicrosoft.com header.i="@azureford.onmicrosoft.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"; dkim=pass header.s=ppserprodsaar header.d=saarlouis.ford.com header.i="@saarlouis.ford.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=ppfserpocford header.d=ford.com header.i="@ford.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 21 May 2026 22:10:58 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Pser-m365-app: SER-APP

Hi Mykola,

The patch looks good!

I would try to add a CI coverage for QEMU aarch64 tests, since QEMU
supports multiple RAM banks topology.

What do you think?

Also, few remarks below.

On Sun, May 17, 2026 at 11:57:56PM +0300, Mykola Kvach wrote:
> From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> 
> With LLC coloring enabled, the hardware domain memory comes from
> allocate_hwdom_memory(), not from the fixed direct-map banks used when
> coloring is off.
> 
> Commit de99f3263555 ("device-tree: Improve hwdom memory allocation for
> DMA") made that allocator sort free host regions by ascending address so
> Dom0 gets DMA-capable low memory first. The first bank filter still only
> required 128MB. That can select a low region which is large enough for
> the heuristic, but not large enough for place_modules() to put the Dom0
> kernel, generated DTB and initrd contiguously in bank 0.
> 
> Ask arch code for any additional first-bank size requirement. On Arm,
> compute it from the actual Dom0 kernel placement, rounded initrd size and
> generated DTB size hint. For 64-bit Image kernels, include the text offset
> from the candidate bank start, because the returned requirement is compared
> with a bank size measured from that start. The hint covers both the normal
> Device Tree path and the minimal DTB created for ACPI boot.
> 
> Check the first-bank threshold against the size which will actually be
> assigned to Dom0, after capping the host region by the remaining unassigned
> Dom0 memory. Otherwise a large host region could pass the test but still
> produce a first guest bank too small for place_modules().
> 
> Use the typed min()/max() helpers for this normal allocation arithmetic;
> MIN()/MAX() are intended for preprocessor-style contexts and skip the type
> checking provided by the lowercase helpers.
> 
> This keeps the DMA-oriented allocation policy from de99f3263555 while
> preventing a too-small bank 0 from reaching place_modules().
> 
> Fixes: de99f3263555 ("device-tree: Improve hwdom memory allocation for DMA")
> Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> ---
> Test/setup notes:
> 
> The failure was reproduced on a Renesas H3ULCB/R-Car H3 (r8a7795)
> arm64 board booted through U-Boot/TFTP and using huge initrd.
> 
> Relevant Xen command line excerpt:
>   dom0_mem=2048M llc-coloring=on
> 
> Boot module layout from Xen:
>   MODULE[2]: 0x0000000084000040-0x000000008e75d92f Ramdisk
>   MODULE[3]: 0x00000000a0000000-0x00000000a3ffffff Kernel
>   MODULE[4]: 0x00000000a4000000-0x00000000a400ffff XSM Policy
> 
> The initrd is about 168MB. With LLC coloring enabled and the low-address
> allocation policy from de99f3263555, Dom0 can receive a 192MB first bank:
>   d0 BANK[0] 0x00000048000000-0x00000054000000 (192MB)
> 
> That bank satisfies the old 128MB minimum but is too small for the
> rounded Dom0 kernel, generated DTB and initrd placement. The observed
> failure before this patch was:
>   Panic on CPU 0:
>   Not enough memory in the first bank for the kernel+dtb+initrd
> 
> With this patch, the same boot skips the too-small low region for bank 0
> and reaches Dom0:
>   d0 BANK[0] 0x00000057000000-0x00000084000000 (720MB)
>   d0 BANK[1] 0x0000008e800000-0x000000c0000000 (792MB)
>   d0 BANK[2] 0x00000500000000-0x00000521800000 (536MB)
>   d0: extended region 0: 0x48000000->0x54000000
>   Loading zImage from 0x00000000a0000000 to 0x57000000-0x5b000000
>   Loading d0 initrd from 0x0000000084000040 to 0x5f200000-0x6995d8f0
>   Loading d0 DTB to 0x5f000000-0x5f011c80
>   Linux version 5.10.194-yocto-standard
> ---
>  xen/arch/arm/acpi/domain_build.c        |  2 --
>  xen/arch/arm/domain_build.c             |  8 ++++++
>  xen/arch/arm/include/asm/domain_build.h |  4 +++
>  xen/arch/arm/include/asm/kernel.h       |  8 ++++++
>  xen/arch/arm/kernel.c                   | 35 +++++++++++++++++++++++++
>  xen/common/device-tree/domain-build.c   | 27 ++++++++++++++-----
>  xen/include/xen/fdt-kernel.h            |  8 ++++++
>  7 files changed, 83 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/arm/acpi/domain_build.c 
> b/xen/arch/arm/acpi/domain_build.c
> index 249d899c33..db16f7fa94 100644
> --- a/xen/arch/arm/acpi/domain_build.c
> +++ b/xen/arch/arm/acpi/domain_build.c
> @@ -26,8 +26,6 @@
>  #undef virt_to_mfn
>  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
>  
> -#define ACPI_DOM0_FDT_MIN_SIZE 4096
> -
>  static int __init acpi_iomem_deny_access(struct domain *d)
>  {
>      acpi_status status;
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 1efddc60ef..226e053c68 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -115,6 +115,14 @@ int __init parse_arch_dom0_param(const char *s, const 
> char *e)
>                               (IS_ENABLED(CONFIG_STATIC_SHM) ?         \
>                                (NR_SHMEM_BANKS * (160 + 16)) : 0))
>  
> +paddr_t __init dom0_get_fdt_size_hint(void)
> +{
> +    if ( !acpi_disabled )
> +        return ACPI_DOM0_FDT_MIN_SIZE;
> +
> +    return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;

I would invert the condition so it is read more straightforward:

    if ( acpi_disabled )
        return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;

    return ACPI_DOM0_FDT_MIN_SIZE;

[..]
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index b72585b7fe..3644663e2f 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -128,6 +128,41 @@ static paddr_t __init kernel_zimage_place(struct 
> kernel_info *info)
>      return load_addr;
>  }
>  
> +static paddr_t __init kernel_placement_size(paddr_t load_addr, paddr_t len)
> +{
> +    return ROUNDUP(load_addr + len, MB(2)) - load_addr;
> +}
> +
> +paddr_t __init arch_get_min_first_bank_size(struct kernel_info *info,
> +                                            paddr_t bank_start)
> +{
> +    const struct boot_module *mod = info->bd.initrd;
> +    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> +    const paddr_t dtb_len = ROUNDUP(dom0_get_fdt_size_hint(), MB(2));
> +    paddr_t kernsize;
> +
> +#ifdef CONFIG_HAS_DOMAIN_TYPE

Perhaps use `IS_ENABLED(CONFIG_HAS_DOMAIN_TYPE)` to reduce ifdefery?
My impression that IS_ENABLED() is preferred.

> +    if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> +    {
> +        paddr_t load_addr = bank_start + info->image.text_offset;
> +
> +        /*
> +         * The caller compares this value with a size measured from
> +         * bank_start, so include the text offset before the kernel.
> +         */
> +        kernsize = ROUNDUP(load_addr + info->image.len, MB(2)) - bank_start;
> +        return kernsize + initrd_len + dtb_len;
> +    }
> +#endif
> +
> +    if ( info->image.start == 0 )

Here too: invert the condition?

       if ( info->image.start )
           kernsize = kernel_placement_size(info->image.start, info->image.len);
       else
           ...

> +        kernsize = ROUNDUP(info->image.len, MB(2));
> +    else
> +        kernsize = kernel_placement_size(info->image.start, info->image.len);
> +
> +    return kernsize + initrd_len + dtb_len;
> +}

--
Denis



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.