[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 29 Jul 2025 16:16:15 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 29 Jul 2025 14:16:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.07.2025 13:04, Roger Pau Monne wrote:
> --- a/xen/common/pdx.c
> +++ b/xen/common/pdx.c
> @@ -24,6 +24,7 @@
>  #include <xen/param.h>
>  #include <xen/pfn.h>
>  #include <xen/sections.h>
> +#include <xen/sort.h>
>  
>  /**
>   * Maximum (non-inclusive) usable pdx. Must be
> @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn)
>  
>  #ifdef CONFIG_PDX_MASK_COMPRESSION
>      invalid |= mfn & pfn_hole_mask;
> +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION)
> +{
> +    unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)];
> +
> +    invalid |= mfn < base || mfn >= base + pdx_region_size;

Leveraging wrapping, this could be simplified to

    invalid |= mfn - base >= pdx_region_size;

I think. Considering it's a frequently used path, doing so may be worthwhile.

> @@ -75,6 +82,13 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn)
>  # error "Missing architecture maximum number of RAM ranges"
>  #endif
>  
> +/* Some functions should be init when not using PDX mask compression. */
> +#ifndef CONFIG_PDX_MASK_COMPRESSION
> +# define __init_or_pdx_mask __init
> +#else
> +# define __init_or_pdx_mask
> +#endif

Considering this is local to the CU, "pdx" in the name isn't very meaningful.
Instead it being a compression type may be, so how about __init_or_mask_compr
or some such? (If we gain further compression types, this may be getting
unwieldy anyway.)

> +static void __init cf_check swp_node(void *a, void *b, size_t size)
> +{
> +    SWAP(a, b);

This doesn't look right - you swap a and b, not what they point to.

> +static bool __init pfn_offset_sanitize_ranges(void)
> +{
> +    unsigned int i = 0;
> +
> +    if ( nr_ranges == 1 )
> +    {
> +        ASSERT(PFN_TBL_IDX(ranges[0].base) ==
> +               PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1));

I think this points out a naming issue in patch 2: "base" and "size" look
as if these were address / byte count, when they're PFN / page count.
Which caught my eye because of the values being passed to something that
actually wants a PFN (and hence looked wrong at the first glance).

> +        return true;
> +    }
> +
> +    while ( i + 1 < nr_ranges )
> +    {
> +        /*
> +         * Ensure ranges [start, end] use the same offset table index.  
> Should
> +         * be guaranteed by the logic that calculates the pfn shift.
> +         */
> +        if ( PFN_TBL_IDX(ranges[i].base) !=
> +             PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) ||
> +             PFN_TBL_IDX(ranges[i + 1].base) !=
> +             PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )

It feels a little odd to re-do the 2nd half of the checking here on the next
iteration, when the table wouldn't have changed when ...

> +        {
> +            ASSERT_UNREACHABLE();
> +            return false;
> +        }
> +
> +        if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) )
> +        {
> +            i++;
> +            continue;

... taking this path. Could I talk you into moving the latter half ...

> +        }

... here? If that's needed at all, as ...

> +        /* Merge ranges with the same table index. */
> +        ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
> +                         ranges[i].base;

... the new range should also fulfill the requirement. Just that the last
such range then wouldn't be checked, unless ...

> +        memmove(&ranges[i + 1], &ranges[i + 2],
> +                (nr_ranges - (i + 2)) * sizeof(ranges[0]));
> +        nr_ranges--;
> +    }

... that checking was put past the loop. Which then would allow to remove
the special casing of nr_ranges == 1 at the top of the function: You'd
uniformly check the ranges[nr_ranges - 1] here. 

> +    return true;
> +}
> +
> +bool __init pfn_pdx_compression_setup(paddr_t base)
> +{
> +    unsigned long mask = PFN_DOWN(pdx_init_mask(base)), idx_mask = 0;
> +    unsigned long size = 0;
> +    unsigned int i;
> +
> +    if ( !nr_ranges )
> +    {
> +        printk(XENLOG_DEBUG "PFN compression disabled%s\n",
> +               pdx_compress ? ": no ranges provided" : "");
> +        return false;
> +    }
> +
> +    if ( nr_ranges > ARRAY_SIZE(ranges) )
> +    {
> +        printk(XENLOG_WARNING
> +               "Too many PFN ranges (%u > %zu), not attempting PFN 
> compression\n",
> +               nr_ranges, ARRAY_SIZE(ranges));
> +        return false;
> +    }
> +
> +    /* Sort ranges by start address. */
> +    sort(ranges, nr_ranges, sizeof(*ranges), cmp_node, swp_node);
> +
> +    for ( i = 0; i < nr_ranges; i++ )
> +    {
> +        unsigned long start = ranges[i].base;
> +
> +        /*
> +         * Align range base to MAX_ORDER.  This is required so the PDX offset
> +         * for the bits below MAX_ORDER matches the MFN offset, and pages
> +         * greater than the minimal order can be used to populate the
> +         * directmap.
> +         */
> +        ranges[i].base = ranges[i].base & ~((1UL << MAX_ORDER) - 1);

Nit: Use "start" here?

> +        ranges[i].size = start + ranges[i].size - ranges[i].base;
> +
> +        /*
> +         * Only merge overlapped regions now, leave adjacent regions 
> separated.
> +         * They would be merged later if both use the same index into the
> +         * lookup table.
> +         */
> +        if ( !i || ranges[i].base >= (ranges[i - 1].base + ranges[i - 
> 1].size) )
> +        {
> +            /*
> +             * We might parse the region at position 0 more than once, as for
> +             * index 0 we don't attempt to merge to keep this simple.
> +             */
> +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
> +            continue;

In how far is it relevant here that slot 0 may be looked at more than once?

> +        }
> +
> +        ranges[i - 1].size = ranges[i].base + ranges[i].size -
> +                             ranges[i - 1].base;
> +
> +        if ( i + 1 < nr_ranges )
> +            memmove(&ranges[i], &ranges[i + 1],
> +                    (nr_ranges - (i + 1)) * sizeof(ranges[0]));
> +        else /* last range */
> +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
> +        nr_ranges--;
> +        i--;
> +    }
> +
> +    /*
> +     * Populate a mask with the non-equal bits of the different ranges, do 
> this
> +     * to calculate the maximum PFN shift to use as the lookup table index.
> +     */
> +    for ( i = 0; i < nr_ranges; i++ )
> +        for ( unsigned int j = 0; j < nr_ranges; j++ )
> +            idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base & ~mask);
> +
> +    if ( !idx_mask )
> +        /* Single region case. */
> +        pfn_index_shift = flsl(mask);
> +    else if ( flsl(idx_mask) - ffsl(idx_mask) < CONFIG_PDX_OFFSET_TBL_ORDER )
> +        /* The changed mask fits in the table index width. */
> +        pfn_index_shift = ffsl(idx_mask) - 1;
> +    else
> +        /* Changed mask is wider than array size, use most significant bits. 
> */
> +        pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER;

Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER
may yield better results?

> +    /*
> +     * Ensure correctness of the calculated values, plus merge ranges if they
> +     * use the same lookup table index.
> +     */
> +    if ( !pfn_offset_sanitize_ranges() )
> +    {
> +        printk(XENLOG_DEBUG "PFN compression is invalid, disabling\n");
> +        pfn_pdx_compression_reset();
> +        return false;
> +    }
> +
> +    /*
> +     * Find the maximum PFN range size after having merged ranges with same
> +     * index.  The rounded up region size will be the base for the PDX region
> +     * size and shift.
> +     */
> +    for ( i = 0; i < nr_ranges; i++ )
> +        size = max(size, ranges[i].size);
> +
> +    /* pdx_init_mask() already takes MAX_ORDER into account. */
> +    mask = PFN_DOWN(pdx_init_mask(size << PAGE_SHIFT));

We're in arch-neutral code here, so I think you need to cast size to paddr_t
before shifting.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.