[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH] x86/bitops: Optimise arch_ffs()/etc some more on AMD
One aspect of the old ffs()/etc infrastructure was the use of TZCNT/LZCNT without a CPUID check. Given no obvious justification, I dropped it during the cleanup. It turns out there is a good reason, on all but the most recent AMD CPUs. Reinstate the use of "blind" TZCNT/LZCNT when safe to do so. This happens to be preferentially in loops where a repeated saving of 5-6 uops becomes far more relevant than a one-off scenario. Leave an explanation of why. No functional change. Fixes: 989e5f08d33e ("x86/bitops: Improve arch_ffs() in the general case") Fixes: 5ed26fc0768d ("xen/bitops: Implement ffsl() in common logic") Fixes: 54b10ef6c810 ("xen/bitops: Implement fls()/flsl() in common logic") Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> --- xen/arch/x86/include/asm/bitops.h | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h index 87eac7782f10..62e6ca26f5b0 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -399,9 +399,16 @@ static always_inline unsigned int arch_ffs(unsigned int x) * * and the optimiser really can work with the knowledge of x being * non-zero without knowing it's exact value, in which case we don't - * need to compensate for BSF's corner cases. Otherwise... + * need to compensate for BSF's corner cases. + * + * That said, we intentionally use TZCNT on capable hardware when the + * behaviour for the 0 case doesn't matter. On AMD CPUs prior to + * Zen4, TZCNT is 1-2 uops while BSF is 6-8 with a latency to match. + * Intel CPUs don't suffer this discrepancy. + * + * Otherwise... */ - asm ( "bsf %[val], %[res]" + asm ( "rep bsf %[val], %[res]" : [res] "=r" (r) : [val] "rm" (x) ); } @@ -428,7 +435,7 @@ static always_inline unsigned int arch_ffsl(unsigned long x) /* See arch_ffs() for safety discussions. */ if ( __builtin_constant_p(x > 0) && x > 0 ) - asm ( "bsf %[val], %q[res]" + asm ( "rep bsf %[val], %q[res]" : [res] "=r" (r) : [val] "rm" (x) ); else @@ -446,7 +453,7 @@ static always_inline unsigned int arch_fls(unsigned int x) /* See arch_ffs() for safety discussions. */ if ( __builtin_constant_p(x > 0) && x > 0 ) - asm ( "bsr %[val], %[res]" + asm ( "rep bsr %[val], %[res]" : [res] "=r" (r) : [val] "rm" (x) ); else @@ -464,7 +471,7 @@ static always_inline unsigned int arch_flsl(unsigned long x) /* See arch_ffs() for safety discussions. */ if ( __builtin_constant_p(x > 0) && x > 0 ) - asm ( "bsr %[val], %q[res]" + asm ( "rep bsr %[val], %q[res]" : [res] "=r" (r) : [val] "rm" (x) ); else base-commit: d965e2ee07c56c341d8896852550914d87ea5374 prerequisite-patch-id: 8da89000c73c38aab6abfa6622217ea9eff07fbd prerequisite-patch-id: 74830838bac94ed1e036a8173cf3210a314b35d8 prerequisite-patch-id: 0654835c28df8d40b6c97006d041c4d31447a9a6 prerequisite-patch-id: 2d47d646c6a6e0019918c57753d6c01a752c377f prerequisite-patch-id: d8f8c4221a2d7039bae6f3d38e93fe90b2091d01 prerequisite-patch-id: e0397c86b545a1d65f2e6b2049c282b926c40c64 prerequisite-patch-id: ea21abe4540ee229f4f8725ce3f701d9ba4bd4a8 -- 2.39.5
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |