|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/bitops: Optimise arch_ffs{,l}() some more on AMD
On 26.08.2025 19:41, Andrew Cooper wrote:
> --- a/xen/common/bitops.c
> +++ b/xen/common/bitops.c
> @@ -97,14 +97,14 @@ static void __init test_for_each_set_bit(void)
> if ( ui != ui_res )
> panic("for_each_set_bit(uint) expected %#x, got %#x\n", ui, ui_res);
>
> - ul = HIDE(1UL << (BITS_PER_LONG - 1) | 1);
> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11);
> for_each_set_bit ( i, ul )
> ul_res |= 1UL << i;
>
> if ( ul != ul_res )
> panic("for_each_set_bit(ulong) expected %#lx, got %#lx\n", ul,
> ul_res);
>
> - ull = HIDE(0x8000000180000001ULL);
> + ull = HIDE(0x8000000180000011ULL);
> for_each_set_bit ( i, ull )
> ull_res |= 1ULL << i;
How do these changes make a difference? Apart from ffs() using TZCNT, ...
> @@ -127,6 +127,79 @@ static void __init test_for_each_set_bit(void)
> panic("for_each_set_bit(break) expected 0x1008, got %#x\n", ui_res);
> }
>
> +/*
> + * A type-generic fls() which picks the appropriate fls{,l,64}() based on
> it's
> + * argument.
> + */
> +#define fls_g(x) \
> + (sizeof(x) <= sizeof(int) ? fls(x) : \
> + sizeof(x) <= sizeof(long) ? flsl(x) : \
> + sizeof(x) <= sizeof(uint64_t) ? fls64(x) : \
> + ({ BUILD_ERROR("fls_g() Bad input type"); 0; }))
> +
> +/*
> + * for_each_set_bit_reverse() - Iterate over all set bits in a scalar value,
> + * from MSB to LSB.
> + *
> + * @iter An iterator name. Scoped is within the loop only.
> + * @val A scalar value to iterate over.
> + *
> + * A copy of @val is taken internally.
> + */
> +#define for_each_set_bit_reverse(iter, val) \
> + for ( typeof(val) __v = (val); __v; __v = 0 ) \
> + for ( unsigned int (iter); \
> + __v && ((iter) = fls_g(__v) - 1, true); \
> + __clear_bit(iter, &__v) )
> +
> +/*
> + * Xen doesn't have need of for_each_set_bit_reverse() at present, but the
> + * construct does exercise a case of arch_fls*() not covered anywhere else by
> + * these tests.
> + */
> +static void __init test_for_each_set_bit_reverse(void)
> +{
> + unsigned int ui, ui_res = 0, tmp;
> + unsigned long ul, ul_res = 0;
> + uint64_t ull, ull_res = 0;
> +
> + ui = HIDE(0x80008001U);
> + for_each_set_bit_reverse ( i, ui )
> + ui_res |= 1U << i;
> +
> + if ( ui != ui_res )
> + panic("for_each_set_bit_reverse(uint) expected %#x, got %#x\n", ui,
> ui_res);
> +
> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11);
> + for_each_set_bit_reverse ( i, ul )
> + ul_res |= 1UL << i;
> +
> + if ( ul != ul_res )
> + panic("for_each_set_bit_reverse(ulong) expected %#lx, got %#lx\n",
> ul, ul_res);
> +
> + ull = HIDE(0x8000000180000011ULL);
> + for_each_set_bit_reverse ( i, ull )
> + ull_res |= 1ULL << i;
... even here the need for the extra setting of bit 4 remains unclear to
me: The thing that was missing was the testing of the reverse for-each.
You mention the need for an asymmetric input in the description, but isn't
that covered already by the first test using 0x80008001U?
That said, I certainly don't mind the change, it just isn't quite clear to
me whether I'm missing something crucial.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |