[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen staging] x86/bitops: Improve arch_ffs() in the general case



commit 989e5f08d33e2642d25fd60d7b373d3d8ff39990
Author:     Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
AuthorDate: Thu Mar 14 23:31:11 2024 +0000
Commit:     Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CommitDate: Sat Jun 1 02:28:14 2024 +0100

    x86/bitops: Improve arch_ffs() in the general case
    
    The asm in arch_ffs() is safe but inefficient.
    
    CMOV would be an improvement over a conditional branch, but for 64bit CPUs
    both Intel and AMD have provided enough details about the behaviour for a 
zero
    input.  It is safe to pre-load the destination register with -1 and drop the
    conditional logic.
    
    However, it is common to find ffs() in a context where the optimiser knows
    that x is non-zero even if it the value isn't known precisely.  In this 
case,
    it's safe to drop the preload of -1 too.
    
    There are only a handful of uses of ffs() in the x86 build, and all of them
    improve as a result of this:
    
      add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-92 (-92)
      Function                                     old     new   delta
      mask_write                                   121     113      -8
      xmem_pool_alloc                             1076    1056     -20
      test_bitops                                  390     358     -32
      pt_update_contig_markers                    1236    1204     -32
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
    Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
    Release-acked-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
---
 xen/arch/x86/include/asm/bitops.h | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 13e3730138..a8e70d7ed7 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -420,12 +420,39 @@ static inline int ffsl(unsigned long x)
 
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
-    int r;
+    unsigned int r;
+
+    if ( __builtin_constant_p(x > 0) && x > 0 )
+    {
+        /*
+         * A common code pattern is:
+         *
+         *     while ( bits )
+         *     {
+         *         bit = ffs(bits);
+         *         ...
+         *
+         * and the optimiser really can work with the knowledge of x being
+         * non-zero without knowing it's exact value, in which case we don't
+         * need to compensate for BSF's corner cases.  Otherwise...
+         */
+        asm ( "bsf %[val], %[res]"
+              : [res] "=r" (r)
+              : [val] "rm" (x) );
+    }
+    else
+    {
+        /*
+         * ... the AMD manual states that BSF won't modify the destination
+         * register if x=0.  The Intel manual states that the result is
+         * undefined, but the architects have said that the register is
+         * written back with it's old value (zero extended as normal).
+         */
+        asm ( "bsf %[val], %[res]"
+              : [res] "=r" (r)
+              : [val] "rm" (x), "[res]" (-1) );
+    }
 
-    asm ( "bsf %1,%0\n\t"
-          "jnz 1f\n\t"
-          "mov $-1,%0\n"
-          "1:" : "=r" (r) : "rm" (x));
     return r + 1;
 }
 #define arch_ffs arch_ffs
--
generated by git-patchbot for /home/xen/git/xen.git#staging



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.