[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: skip holes in physical address space when setting up frametable



Hi Michal,

On 22/04/2026 09:25, Orzel, Michal wrote:


On 22/04/2026 09:01, Julien Grall wrote:
Hi Michal,

On 17/04/2026 10:11, Michal Orzel wrote:
Refactor setup_frametable_mappings() into init_frametable(), modeled
after x86's implementation. Instead of mapping one contiguous frametable
covering ram_start to ram_end (including holes), iterate the
pdx_group_valid bitmap to allocate and map frametable memory only for
valid PDX groups, skipping gaps in the physical address space. At the
moment we don't really take into account pdx_group_valid bitmap.

This reduces memory consumption on systems with sparse RAM layouts by
not allocating frametable entries for non-existent memory regions.

A file-local pdx_to_page() override is needed because the generic macro
in xen/include/xen/pdx.h does not account for ARM's non-zero
frametable_base_pdx.

Can you provide a bit more details? I am a bit concerned that this could
result to subttle bug in the future if code within mm.c is expecting the
original behavior. It would be preferable if the change is either for
everyone on Arm or the function is renamed to avoid any clash.
The generic pdx_to_page macro does not account for offset which is something I
mentioned in the footer and I'm willing to work on in the future.

Sorry I missed the comment in the footer. But if the function is broken, then why can't we implement pdx_to_page() correctly now? I understand that ...

 As of today,
this macro is *unused* on Arm. It's only used by x86 in some special big mem
related scenario. Using generic pdx_to_page on Arm would be wrong, so a future
patch doing that would be wrong (the fact that this patch adds a local redefine
does not change anything). Do we need a rename for a local redefine in a file
that is only related to frametable? Maybe a comment and a TODO would be ok?

... this is not meant to be used by Arm today. But given this is used in the page list, it is definitely not obvious that it is broken.

The alternative is to protect/move pdx_to_page() in x86. But I don't know much churn this would involve.



[...]

+void __init init_frametable(paddr_t ram_start)
+{
+    unsigned int sidx, nidx, max_idx;
/*
        * The size of paddr_t should be sufficient for the complete range of
@@ -26,24 +47,34 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t 
pe)
       BUILD_BUG_ON((sizeof(paddr_t) * BITS_PER_BYTE) < PADDR_BITS);
       BUILD_BUG_ON(sizeof(struct page_info) != PAGE_INFO_SIZE);
- if ( frametable_size > FRAMETABLE_SIZE )
-        panic("The frametable cannot cover the physical region %#"PRIpaddr" - 
%#"PRIpaddr"\n",
-              ps, pe);
+    max_idx = DIV_ROUND_UP(max_pdx, PDX_GROUP_COUNT);
+    frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ram_start));
- frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
-    /* Round up to 2M or 32M boundary, as appropriate. */
-    frametable_size = ROUNDUP(frametable_size, mapping_size);
-    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
+    /*
+     * pdx_to_page(pdx_s) in init_frametable_chunk must be page-aligned
+     * for map_pages_to_xen(). Aligning to PDX_GROUP_COUNT guarantees this
+     * because PDX_GROUP_COUNT * sizeof(page_info) is always a multiple of
+     * PAGE_SIZE by construction.
+     */
+    frametable_base_pdx = ROUNDDOWN(frametable_base_pdx, PDX_GROUP_COUNT);
- rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
-                          frametable_size >> PAGE_SHIFT,
-                          PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
-    if ( rc )
-        panic("Unable to setup the frametable mappings.\n");
+    if ( (max_pdx - frametable_base_pdx) > FRAMETABLE_NR )
+        panic("Frametable too small\n");
+
+    for ( sidx = (frametable_base_pdx / PDX_GROUP_COUNT); ; sidx = nidx )
+    {
+        unsigned int eidx;
+
+        eidx = find_next_zero_bit(pdx_group_valid, max_idx, sidx);
+        nidx = find_next_bit(pdx_group_valid, max_idx, eidx);
+
+        if ( nidx >= max_idx )
+            break;
+
+        init_frametable_chunk(sidx * PDX_GROUP_COUNT, eidx * PDX_GROUP_COUNT);

The function will do a round-up the mapping to either a 2MiB or 32MiB
aligned size. This means we could potentially cover the previous mapped
region or the next one. I can't seem to find any code to cover this
use-case. What did I miss?
Hmm, I think I calculated something wrong here. Anyway, how about using 2MB
mapping size all the time? PDX group size is 2MB,

Looking at the code, it seems to be based on SECOND_SHIFT which technically depends on the page granularity. Even though Xen supports only 4KiB, we are trying to avoid making such assumption or add least adding a BUILD_BUG_ON() (in this case, I would consider that PDX_GROUP_COUNT is always 2MiB or SECOND_SHIFT).

in-loop chunks are multiple of
2MB, there is no roundup needed - zero overshoot. The last chunk may have ~2MB
overshoot but it does not matter as there is nothing after it to conflict with.
The downside is more TLB pressure.

I am a bit warry to modify the frametable allocation method because it is used fairly often in Xen. Would it be possible to hence the loop to detect contiguous chunk and decide the size allocation based on the chunk?


Alternatively, we could reduce the mapping size closer to boundaries (x86
choice) but that would require a bit more work.

This would technically have the same downside as above.

Cheers,

--
Julien Grall




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.