[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/bitops: Optimise arch_ffs{,l}() some more on AMD
On 27/08/2025 8:52 am, Jan Beulich wrote: > On 26.08.2025 19:41, Andrew Cooper wrote: >> --- a/xen/common/bitops.c >> +++ b/xen/common/bitops.c >> @@ -97,14 +97,14 @@ static void __init test_for_each_set_bit(void) >> if ( ui != ui_res ) >> panic("for_each_set_bit(uint) expected %#x, got %#x\n", ui, ui_res); >> >> - ul = HIDE(1UL << (BITS_PER_LONG - 1) | 1); >> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11); >> for_each_set_bit ( i, ul ) >> ul_res |= 1UL << i; >> >> if ( ul != ul_res ) >> panic("for_each_set_bit(ulong) expected %#lx, got %#lx\n", ul, >> ul_res); >> >> - ull = HIDE(0x8000000180000001ULL); >> + ull = HIDE(0x8000000180000011ULL); >> for_each_set_bit ( i, ull ) >> ull_res |= 1ULL << i; > How do these changes make a difference? Apart from ffs() using TZCNT, ... > >> @@ -127,6 +127,79 @@ static void __init test_for_each_set_bit(void) >> panic("for_each_set_bit(break) expected 0x1008, got %#x\n", ui_res); >> } >> >> +/* >> + * A type-generic fls() which picks the appropriate fls{,l,64}() based on >> it's >> + * argument. >> + */ >> +#define fls_g(x) \ >> + (sizeof(x) <= sizeof(int) ? fls(x) : \ >> + sizeof(x) <= sizeof(long) ? flsl(x) : \ >> + sizeof(x) <= sizeof(uint64_t) ? fls64(x) : \ >> + ({ BUILD_ERROR("fls_g() Bad input type"); 0; })) >> + >> +/* >> + * for_each_set_bit_reverse() - Iterate over all set bits in a scalar value, >> + * from MSB to LSB. >> + * >> + * @iter An iterator name. Scoped is within the loop only. >> + * @val A scalar value to iterate over. >> + * >> + * A copy of @val is taken internally. >> + */ >> +#define for_each_set_bit_reverse(iter, val) \ >> + for ( typeof(val) __v = (val); __v; __v = 0 ) \ >> + for ( unsigned int (iter); \ >> + __v && ((iter) = fls_g(__v) - 1, true); \ >> + __clear_bit(iter, &__v) ) >> + >> +/* >> + * Xen doesn't have need of for_each_set_bit_reverse() at present, but the >> + * construct does exercise a case of arch_fls*() not covered anywhere else >> by >> + * these tests. >> + */ >> +static void __init test_for_each_set_bit_reverse(void) >> +{ >> + unsigned int ui, ui_res = 0, tmp; >> + unsigned long ul, ul_res = 0; >> + uint64_t ull, ull_res = 0; >> + >> + ui = HIDE(0x80008001U); >> + for_each_set_bit_reverse ( i, ui ) >> + ui_res |= 1U << i; >> + >> + if ( ui != ui_res ) >> + panic("for_each_set_bit_reverse(uint) expected %#x, got %#x\n", ui, >> ui_res); >> + >> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11); >> + for_each_set_bit_reverse ( i, ul ) >> + ul_res |= 1UL << i; >> + >> + if ( ul != ul_res ) >> + panic("for_each_set_bit_reverse(ulong) expected %#lx, got %#lx\n", >> ul, ul_res); >> + >> + ull = HIDE(0x8000000180000011ULL); >> + for_each_set_bit_reverse ( i, ull ) >> + ull_res |= 1ULL << i; > ... even here the need for the extra setting of bit 4 remains unclear to > me: The thing that was missing was the testing of the reverse for-each. > You mention the need for an asymmetric input in the description, but isn't > that covered already by the first test using 0x80008001U? The first test covers {arch_,}f[fl]s() only. It happens to be safe to count-from-the-wrong-end bugs, but that was by chance. The second test covers {arch_,}f[fs]sl() only. They are unsafe WRT symmetry, and disjoint (coverage wise) from the first test. The third test, while the same as the second test on x86, isn't the same on arm32. Just because one test happens to be safe (symmetry wise) and passes, doesn't make the other variants tested. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |