[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/bitops: Optimise arch_ffs()/etc some more on AMD
On 28.05.2025 00:29, Andrew Cooper wrote: > One aspect of the old ffs()/etc infrastructure was the use of TZCNT/LZCNT > without a CPUID check. Given no obvious justification, I dropped it during > the cleanup. > > It turns out there is a good reason, on all but the most recent AMD CPUs. > > Reinstate the use of "blind" TZCNT/LZCNT when safe to do so. This happens to > be preferentially in loops where a repeated saving of 5-6 uops becomes far > more relevant than a one-off scenario. > > Leave an explanation of why. > > No functional change. > > Fixes: 989e5f08d33e ("x86/bitops: Improve arch_ffs() in the general case") > Fixes: 5ed26fc0768d ("xen/bitops: Implement ffsl() in common logic") > Fixes: 54b10ef6c810 ("xen/bitops: Implement fls()/flsl() in common logic") > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> In principle Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> but ... > --- a/xen/arch/x86/include/asm/bitops.h > +++ b/xen/arch/x86/include/asm/bitops.h > @@ -399,9 +399,16 @@ static always_inline unsigned int arch_ffs(unsigned int > x) > * > * and the optimiser really can work with the knowledge of x being > * non-zero without knowing it's exact value, in which case we don't > - * need to compensate for BSF's corner cases. Otherwise... > + * need to compensate for BSF's corner cases. > + * > + * That said, we intentionally use TZCNT on capable hardware when the > + * behaviour for the 0 case doesn't matter. On AMD CPUs prior to > + * Zen4, TZCNT is 1-2 uops while BSF is 6-8 with a latency to match. > + * Intel CPUs don't suffer this discrepancy. > + * > + * Otherwise... > */ > - asm ( "bsf %[val], %[res]" > + asm ( "rep bsf %[val], %[res]" ... why would we use REP (again) when gas 2.25 supports LZCNT/TZCNT? Jan > : [res] "=r" (r) > : [val] "rm" (x) ); > } > @@ -428,7 +435,7 @@ static always_inline unsigned int arch_ffsl(unsigned long > x) > > /* See arch_ffs() for safety discussions. */ > if ( __builtin_constant_p(x > 0) && x > 0 ) > - asm ( "bsf %[val], %q[res]" > + asm ( "rep bsf %[val], %q[res]" > : [res] "=r" (r) > : [val] "rm" (x) ); > else > @@ -446,7 +453,7 @@ static always_inline unsigned int arch_fls(unsigned int x) > > /* See arch_ffs() for safety discussions. */ > if ( __builtin_constant_p(x > 0) && x > 0 ) > - asm ( "bsr %[val], %[res]" > + asm ( "rep bsr %[val], %[res]" > : [res] "=r" (r) > : [val] "rm" (x) ); > else > @@ -464,7 +471,7 @@ static always_inline unsigned int arch_flsl(unsigned long > x) > > /* See arch_ffs() for safety discussions. */ > if ( __builtin_constant_p(x > 0) && x > 0 ) > - asm ( "bsr %[val], %q[res]" > + asm ( "rep bsr %[val], %q[res]" > : [res] "=r" (r) > : [val] "rm" (x) ); > else > > base-commit: d965e2ee07c56c341d8896852550914d87ea5374 > prerequisite-patch-id: 8da89000c73c38aab6abfa6622217ea9eff07fbd > prerequisite-patch-id: 74830838bac94ed1e036a8173cf3210a314b35d8 > prerequisite-patch-id: 0654835c28df8d40b6c97006d041c4d31447a9a6 > prerequisite-patch-id: 2d47d646c6a6e0019918c57753d6c01a752c377f > prerequisite-patch-id: d8f8c4221a2d7039bae6f3d38e93fe90b2091d01 > prerequisite-patch-id: e0397c86b545a1d65f2e6b2049c282b926c40c64 > prerequisite-patch-id: ea21abe4540ee229f4f8725ce3f701d9ba4bd4a8
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |