[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/bitops: Optimise arch_ffs{,l}() some more on AMD
On 26.08.2025 19:41, Andrew Cooper wrote: > --- a/xen/common/bitops.c > +++ b/xen/common/bitops.c > @@ -97,14 +97,14 @@ static void __init test_for_each_set_bit(void) > if ( ui != ui_res ) > panic("for_each_set_bit(uint) expected %#x, got %#x\n", ui, ui_res); > > - ul = HIDE(1UL << (BITS_PER_LONG - 1) | 1); > + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11); > for_each_set_bit ( i, ul ) > ul_res |= 1UL << i; > > if ( ul != ul_res ) > panic("for_each_set_bit(ulong) expected %#lx, got %#lx\n", ul, > ul_res); > > - ull = HIDE(0x8000000180000001ULL); > + ull = HIDE(0x8000000180000011ULL); > for_each_set_bit ( i, ull ) > ull_res |= 1ULL << i; How do these changes make a difference? Apart from ffs() using TZCNT, ... > @@ -127,6 +127,79 @@ static void __init test_for_each_set_bit(void) > panic("for_each_set_bit(break) expected 0x1008, got %#x\n", ui_res); > } > > +/* > + * A type-generic fls() which picks the appropriate fls{,l,64}() based on > it's > + * argument. > + */ > +#define fls_g(x) \ > + (sizeof(x) <= sizeof(int) ? fls(x) : \ > + sizeof(x) <= sizeof(long) ? flsl(x) : \ > + sizeof(x) <= sizeof(uint64_t) ? fls64(x) : \ > + ({ BUILD_ERROR("fls_g() Bad input type"); 0; })) > + > +/* > + * for_each_set_bit_reverse() - Iterate over all set bits in a scalar value, > + * from MSB to LSB. > + * > + * @iter An iterator name. Scoped is within the loop only. > + * @val A scalar value to iterate over. > + * > + * A copy of @val is taken internally. > + */ > +#define for_each_set_bit_reverse(iter, val) \ > + for ( typeof(val) __v = (val); __v; __v = 0 ) \ > + for ( unsigned int (iter); \ > + __v && ((iter) = fls_g(__v) - 1, true); \ > + __clear_bit(iter, &__v) ) > + > +/* > + * Xen doesn't have need of for_each_set_bit_reverse() at present, but the > + * construct does exercise a case of arch_fls*() not covered anywhere else by > + * these tests. > + */ > +static void __init test_for_each_set_bit_reverse(void) > +{ > + unsigned int ui, ui_res = 0, tmp; > + unsigned long ul, ul_res = 0; > + uint64_t ull, ull_res = 0; > + > + ui = HIDE(0x80008001U); > + for_each_set_bit_reverse ( i, ui ) > + ui_res |= 1U << i; > + > + if ( ui != ui_res ) > + panic("for_each_set_bit_reverse(uint) expected %#x, got %#x\n", ui, > ui_res); > + > + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11); > + for_each_set_bit_reverse ( i, ul ) > + ul_res |= 1UL << i; > + > + if ( ul != ul_res ) > + panic("for_each_set_bit_reverse(ulong) expected %#lx, got %#lx\n", > ul, ul_res); > + > + ull = HIDE(0x8000000180000011ULL); > + for_each_set_bit_reverse ( i, ull ) > + ull_res |= 1ULL << i; ... even here the need for the extra setting of bit 4 remains unclear to me: The thing that was missing was the testing of the reverse for-each. You mention the need for an asymmetric input in the description, but isn't that covered already by the first test using 0x80008001U? That said, I certainly don't mind the change, it just isn't quite clear to me whether I'm missing something crucial. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |