|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets
On 04.08.2025 15:08, Roger Pau Monné wrote:
> On Tue, Jul 29, 2025 at 04:16:15PM +0200, Jan Beulich wrote:
>> On 24.07.2025 13:04, Roger Pau Monne wrote:
>>> --- a/xen/common/pdx.c
>>> +++ b/xen/common/pdx.c
>>> @@ -24,6 +24,7 @@
>>> #include <xen/param.h>
>>> #include <xen/pfn.h>
>>> #include <xen/sections.h>
>>> +#include <xen/sort.h>
>>>
>>> /**
>>> * Maximum (non-inclusive) usable pdx. Must be
>>> @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn)
>>>
>>> #ifdef CONFIG_PDX_MASK_COMPRESSION
>>> invalid |= mfn & pfn_hole_mask;
>>> +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION)
>>> +{
>>> + unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)];
>>> +
>>> + invalid |= mfn < base || mfn >= base + pdx_region_size;
>>
>> Leveraging wrapping, this could be simplified to
>>
>> invalid |= mfn - base >= pdx_region_size;
>>
>> I think. Considering it's a frequently used path, doing so may be worthwhile.
>
> I don't think that would work for all cases, take the following
> example:
>
> PFN compression using lookup table shift 18 and region size 0x40000
> range 0 [0000000280000, 00000002bffff] PFN IDX 10 : 0000000280000
>
> If you pass mfn 0 to mfn_valid() with your proposed adjustment, the
> result of the subtraction would be:
>
> 0 - ~0UL == 1
>
> Which wouldn't satisfy the >= condition, and hence pfn 0 would be
> reported as a valid mfn. I think we need to keep both sides of the
> check.
Hmm, right - I keep forgetting that the start of a pfn_bases[x] isn't
necessarily
a valid page itself.
>>> +static void __init cf_check swp_node(void *a, void *b, size_t size)
>>> +{
>>> + SWAP(a, b);
>>
>> This doesn't look right - you swap a and b, not what they point to.
>>
>>> +static bool __init pfn_offset_sanitize_ranges(void)
>>> +{
>>> + unsigned int i = 0;
>>> +
>>> + if ( nr_ranges == 1 )
>>> + {
>>> + ASSERT(PFN_TBL_IDX(ranges[0].base) ==
>>> + PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1));
>>
>> I think this points out a naming issue in patch 2: "base" and "size" look
>> as if these were address / byte count, when they're PFN / page count.
>> Which caught my eye because of the values being passed to something that
>> actually wants a PFN (and hence looked wrong at the first glance).
>
> The struct name is pfn_range, I could rename the fields to base_pfn
> and pages or similar, but my impression was that the struct name was
> enough of a pointer that those are PFN ranges.
Problem being that the struct name isn't anywhere in sight here.
> Do you have any
> alternative proposal about how to name those?
Your suggested new naming looks good to me.
>>> + return true;
>>> + }
>>> +
>>> + while ( i + 1 < nr_ranges )
>>> + {
>>> + /*
>>> + * Ensure ranges [start, end] use the same offset table index.
>>> Should
>>> + * be guaranteed by the logic that calculates the pfn shift.
>>> + */
>>> + if ( PFN_TBL_IDX(ranges[i].base) !=
>>> + PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) ||
>>> + PFN_TBL_IDX(ranges[i + 1].base) !=
>>> + PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
>>
>> It feels a little odd to re-do the 2nd half of the checking here on the next
>> iteration, when the table wouldn't have changed when ...
>>
>>> + {
>>> + ASSERT_UNREACHABLE();
>>> + return false;
>>> + }
>>> +
>>> + if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i +
>>> 1].base) )
>>> + {
>>> + i++;
>>> + continue;
>>
>> ... taking this path. Could I talk you into moving the latter half ...
>>
>>> + }
>>
>> ... here? If that's needed at all, as ...
>>
>>> + /* Merge ranges with the same table index. */
>>> + ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
>>> + ranges[i].base;
>>
>> ... the new range should also fulfill the requirement. Just that the last
>> such range then wouldn't be checked, unless ...
>>
>>> + memmove(&ranges[i + 1], &ranges[i + 2],
>>> + (nr_ranges - (i + 2)) * sizeof(ranges[0]));
>>> + nr_ranges--;
>>> + }
>>
>> ... that checking was put past the loop. Which then would allow to remove
>> the special casing of nr_ranges == 1 at the top of the function: You'd
>> uniformly check the ranges[nr_ranges - 1] here.
>
> What about doing it like:
>
> static bool __init pfn_offset_sanitize_ranges(void)
> {
> unsigned int i = 0;
>
> if ( PFN_TBL_IDX(ranges[0].base) !=
> PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1) )
> {
> ASSERT_UNREACHABLE();
> return false;
> }
>
> while ( i + 1 < nr_ranges )
> {
> /*
> * Ensure ranges [start, end] use the same offset table index. Should
> * be guaranteed by the logic that calculates the pfn shift.
> */
> if ( PFN_TBL_IDX(ranges[i + 1].base) !=
> PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
> {
> ASSERT_UNREACHABLE();
> return false;
> }
>
> if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) )
> {
> i++;
> continue;
> }
>
> /* Merge ranges with the same table index. */
> ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
> ranges[i].base;
> memmove(&ranges[i + 1], &ranges[i + 2],
> (nr_ranges - (i + 2)) * sizeof(ranges[0]));
> nr_ranges--;
> }
>
> return true;
> }
>
> I've pulled the index 0 check ahead of the loop, which then covers for
> the case where nr_ranges == 1. There's also no duplicate checking of
> the ranges, since the range at i + 1 will always be a non-checked one;
> either because the array has been shifted as a result of a range
> merging, or the index has been advanced.
Looks good, thanks.
>>> + ranges[i].size = start + ranges[i].size - ranges[i].base;
>>> + ranges[i - 1].size = ranges[i].base + ranges[i].size -
>>> + ranges[i - 1].base;
>>> +
>>> + if ( i + 1 < nr_ranges )
>>> + memmove(&ranges[i], &ranges[i + 1],
>>> + (nr_ranges - (i + 1)) * sizeof(ranges[0]));
>>> + else /* last range */
>>> + mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
>>> + nr_ranges--;
>>> + i--;
>>> + }
>>> +
>>> + /*
>>> + * Populate a mask with the non-equal bits of the different ranges, do
>>> this
>>> + * to calculate the maximum PFN shift to use as the lookup table index.
>>> + */
>>> + for ( i = 0; i < nr_ranges; i++ )
>>> + for ( unsigned int j = 0; j < nr_ranges; j++ )
>>> + idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base &
>>> ~mask);
>>> +
>>> + if ( !idx_mask )
>>> + /* Single region case. */
>>> + pfn_index_shift = flsl(mask);
>>> + else if ( flsl(idx_mask) - ffsl(idx_mask) <
>>> CONFIG_PDX_OFFSET_TBL_ORDER )
>>> + /* The changed mask fits in the table index width. */
>>> + pfn_index_shift = ffsl(idx_mask) - 1;
>>> + else
>>> + /* Changed mask is wider than array size, use most significant
>>> bits. */
>>> + pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER;
>>
>> Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER
>> may yield better results?
>
> What about:
>
> printk(XENLOG_DEBUG
> "PFN compression table index truncated, requires order %u\n",
> flsl(idx_mask) - ffsl(idx_mask) + 1);
Again, fine with me.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |