Xen project Mailing List

Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 4 Aug 2025 15:40:55 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 04 Aug 2025 13:41:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.08.2025 15:08, Roger Pau Monné wrote: > On Tue, Jul 29, 2025 at 04:16:15PM +0200, Jan Beulich wrote: >> On 24.07.2025 13:04, Roger Pau Monne wrote: >>> --- a/xen/common/pdx.c >>> +++ b/xen/common/pdx.c >>> @@ -24,6 +24,7 @@ >>> #include <xen/param.h> >>> #include <xen/pfn.h> >>> #include <xen/sections.h> >>> +#include <xen/sort.h> >>> >>> /** >>> * Maximum (non-inclusive) usable pdx. Must be >>> @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn) >>> >>> #ifdef CONFIG_PDX_MASK_COMPRESSION >>> invalid |= mfn & pfn_hole_mask; >>> +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION) >>> +{ >>> + unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)]; >>> + >>> + invalid |= mfn < base || mfn >= base + pdx_region_size; >> >> Leveraging wrapping, this could be simplified to >> >> invalid |= mfn - base >= pdx_region_size; >> >> I think. Considering it's a frequently used path, doing so may be worthwhile. > > I don't think that would work for all cases, take the following > example: > > PFN compression using lookup table shift 18 and region size 0x40000 > range 0 [0000000280000, 00000002bffff] PFN IDX 10 : 0000000280000 > > If you pass mfn 0 to mfn_valid() with your proposed adjustment, the > result of the subtraction would be: > > 0 - ~0UL == 1 > > Which wouldn't satisfy the >= condition, and hence pfn 0 would be > reported as a valid mfn. I think we need to keep both sides of the > check. Hmm, right - I keep forgetting that the start of a pfn_bases[x] isn't necessarily a valid page itself. >>> +static void __init cf_check swp_node(void *a, void *b, size_t size) >>> +{ >>> + SWAP(a, b); >> >> This doesn't look right - you swap a and b, not what they point to. >> >>> +static bool __init pfn_offset_sanitize_ranges(void) >>> +{ >>> + unsigned int i = 0; >>> + >>> + if ( nr_ranges == 1 ) >>> + { >>> + ASSERT(PFN_TBL_IDX(ranges[0].base) == >>> + PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1)); >> >> I think this points out a naming issue in patch 2: "base" and "size" look >> as if these were address / byte count, when they're PFN / page count. >> Which caught my eye because of the values being passed to something that >> actually wants a PFN (and hence looked wrong at the first glance). > > The struct name is pfn_range, I could rename the fields to base_pfn > and pages or similar, but my impression was that the struct name was > enough of a pointer that those are PFN ranges. Problem being that the struct name isn't anywhere in sight here. > Do you have any > alternative proposal about how to name those? Your suggested new naming looks good to me. >>> + return true; >>> + } >>> + >>> + while ( i + 1 < nr_ranges ) >>> + { >>> + /* >>> + * Ensure ranges [start, end] use the same offset table index. >>> Should >>> + * be guaranteed by the logic that calculates the pfn shift. >>> + */ >>> + if ( PFN_TBL_IDX(ranges[i].base) != >>> + PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) || >>> + PFN_TBL_IDX(ranges[i + 1].base) != >>> + PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) ) >> >> It feels a little odd to re-do the 2nd half of the checking here on the next >> iteration, when the table wouldn't have changed when ... >> >>> + { >>> + ASSERT_UNREACHABLE(); >>> + return false; >>> + } >>> + >>> + if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + >>> 1].base) ) >>> + { >>> + i++; >>> + continue; >> >> ... taking this path. Could I talk you into moving the latter half ... >> >>> + } >> >> ... here? If that's needed at all, as ... >> >>> + /* Merge ranges with the same table index. */ >>> + ranges[i].size = ranges[i + 1].base + ranges[i + 1].size - >>> + ranges[i].base; >> >> ... the new range should also fulfill the requirement. Just that the last >> such range then wouldn't be checked, unless ... >> >>> + memmove(&ranges[i + 1], &ranges[i + 2], >>> + (nr_ranges - (i + 2)) * sizeof(ranges[0])); >>> + nr_ranges--; >>> + } >> >> ... that checking was put past the loop. Which then would allow to remove >> the special casing of nr_ranges == 1 at the top of the function: You'd >> uniformly check the ranges[nr_ranges - 1] here. > > What about doing it like: > > static bool __init pfn_offset_sanitize_ranges(void) > { > unsigned int i = 0; > > if ( PFN_TBL_IDX(ranges[0].base) != > PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1) ) > { > ASSERT_UNREACHABLE(); > return false; > } > > while ( i + 1 < nr_ranges ) > { > /* > * Ensure ranges [start, end] use the same offset table index. Should > * be guaranteed by the logic that calculates the pfn shift. > */ > if ( PFN_TBL_IDX(ranges[i + 1].base) != > PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) ) > { > ASSERT_UNREACHABLE(); > return false; > } > > if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) ) > { > i++; > continue; > } > > /* Merge ranges with the same table index. */ > ranges[i].size = ranges[i + 1].base + ranges[i + 1].size - > ranges[i].base; > memmove(&ranges[i + 1], &ranges[i + 2], > (nr_ranges - (i + 2)) * sizeof(ranges[0])); > nr_ranges--; > } > > return true; > } > > I've pulled the index 0 check ahead of the loop, which then covers for > the case where nr_ranges == 1. There's also no duplicate checking of > the ranges, since the range at i + 1 will always be a non-checked one; > either because the array has been shifted as a result of a range > merging, or the index has been advanced. Looks good, thanks. >>> + ranges[i].size = start + ranges[i].size - ranges[i].base; >>> + ranges[i - 1].size = ranges[i].base + ranges[i].size - >>> + ranges[i - 1].base; >>> + >>> + if ( i + 1 < nr_ranges ) >>> + memmove(&ranges[i], &ranges[i + 1], >>> + (nr_ranges - (i + 1)) * sizeof(ranges[0])); >>> + else /* last range */ >>> + mask |= pdx_region_mask(ranges[i].base, ranges[i].size); >>> + nr_ranges--; >>> + i--; >>> + } >>> + >>> + /* >>> + * Populate a mask with the non-equal bits of the different ranges, do >>> this >>> + * to calculate the maximum PFN shift to use as the lookup table index. >>> + */ >>> + for ( i = 0; i < nr_ranges; i++ ) >>> + for ( unsigned int j = 0; j < nr_ranges; j++ ) >>> + idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base & >>> ~mask); >>> + >>> + if ( !idx_mask ) >>> + /* Single region case. */ >>> + pfn_index_shift = flsl(mask); >>> + else if ( flsl(idx_mask) - ffsl(idx_mask) < >>> CONFIG_PDX_OFFSET_TBL_ORDER ) >>> + /* The changed mask fits in the table index width. */ >>> + pfn_index_shift = ffsl(idx_mask) - 1; >>> + else >>> + /* Changed mask is wider than array size, use most significant >>> bits. */ >>> + pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER; >> >> Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER >> may yield better results? > > What about: > > printk(XENLOG_DEBUG > "PFN compression table index truncated, requires order %u\n", > flsl(idx_mask) - ffsl(idx_mask) + 1); Again, fine with me. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.