[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] x86/bitops: Optimise arch_ffs()/etc some more on AMD



One aspect of the old ffs()/etc infrastructure was the use of TZCNT/LZCNT
without a CPUID check.  Given no obvious justification, I dropped it during
the cleanup.

It turns out there is a good reason, on all but the most recent AMD CPUs.

Reinstate the use of "blind" TZCNT/LZCNT when safe to do so.  This happens to
be preferentially in loops where a repeated saving of 5-6 uops becomes far
more relevant than a one-off scenario.

Leave an explanation of why.

No functional change.

Fixes: 989e5f08d33e ("x86/bitops: Improve arch_ffs() in the general case")
Fixes: 5ed26fc0768d ("xen/bitops: Implement ffsl() in common logic")
Fixes: 54b10ef6c810 ("xen/bitops: Implement fls()/flsl() in common logic")
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
 xen/arch/x86/include/asm/bitops.h | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 87eac7782f10..62e6ca26f5b0 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -399,9 +399,16 @@ static always_inline unsigned int arch_ffs(unsigned int x)
          *
          * and the optimiser really can work with the knowledge of x being
          * non-zero without knowing it's exact value, in which case we don't
-         * need to compensate for BSF's corner cases.  Otherwise...
+         * need to compensate for BSF's corner cases.
+         *
+         * That said, we intentionally use TZCNT on capable hardware when the
+         * behaviour for the 0 case doesn't matter.  On AMD CPUs prior to
+         * Zen4, TZCNT is 1-2 uops while BSF is 6-8 with a latency to match.
+         * Intel CPUs don't suffer this discrepancy.
+         *
+         * Otherwise...
          */
-        asm ( "bsf %[val], %[res]"
+        asm ( "rep bsf %[val], %[res]"
               : [res] "=r" (r)
               : [val] "rm" (x) );
     }
@@ -428,7 +435,7 @@ static always_inline unsigned int arch_ffsl(unsigned long x)
 
     /* See arch_ffs() for safety discussions. */
     if ( __builtin_constant_p(x > 0) && x > 0 )
-        asm ( "bsf %[val], %q[res]"
+        asm ( "rep bsf %[val], %q[res]"
               : [res] "=r" (r)
               : [val] "rm" (x) );
     else
@@ -446,7 +453,7 @@ static always_inline unsigned int arch_fls(unsigned int x)
 
     /* See arch_ffs() for safety discussions. */
     if ( __builtin_constant_p(x > 0) && x > 0 )
-        asm ( "bsr %[val], %[res]"
+        asm ( "rep bsr %[val], %[res]"
               : [res] "=r" (r)
               : [val] "rm" (x) );
     else
@@ -464,7 +471,7 @@ static always_inline unsigned int arch_flsl(unsigned long x)
 
     /* See arch_ffs() for safety discussions. */
     if ( __builtin_constant_p(x > 0) && x > 0 )
-        asm ( "bsr %[val], %q[res]"
+        asm ( "rep bsr %[val], %q[res]"
               : [res] "=r" (r)
               : [val] "rm" (x) );
     else

base-commit: d965e2ee07c56c341d8896852550914d87ea5374
prerequisite-patch-id: 8da89000c73c38aab6abfa6622217ea9eff07fbd
prerequisite-patch-id: 74830838bac94ed1e036a8173cf3210a314b35d8
prerequisite-patch-id: 0654835c28df8d40b6c97006d041c4d31447a9a6
prerequisite-patch-id: 2d47d646c6a6e0019918c57753d6c01a752c377f
prerequisite-patch-id: d8f8c4221a2d7039bae6f3d38e93fe90b2091d01
prerequisite-patch-id: e0397c86b545a1d65f2e6b2049c282b926c40c64
prerequisite-patch-id: ea21abe4540ee229f4f8725ce3f701d9ba4bd4a8
-- 
2.39.5




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.