[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sun, 27 Apr 2025 at 12:17, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> >
> > ffs/fls are commonly found inside loops where x is the loop condition
> > too.  Therefore, using statically_true() to provide a form without the
> > zero compatibility turns out to be a win.
> 
> We already have the version without the zero capability - it's just
> called "__ffs()" and "__fls()", and performance-critical code uses
> those.
> 
> So fls/ffs are the "standard" library functions that have to handle
> zero, and add that stupid "+1" because that interface was designed by
> some Pascal person who doesn't understand that we start counting from
> 0.
> 
> Standards bodies: "companies aren't sending their best people".
> 
> But it's silly that we then spend effort on magic cmov in inline asm
> on those things when it's literally the "don't use this version unless
> you don't actually care about performance" case.
> 
> I don't think it would be wrong to just make the x86-32 code just do
> the check against zero ahead of time - in C.
> 
> And yes, that will generate some extra code - you'll test for zero
> before, and then the caller might also test for a zero result that
> then results in another test for zero that can't actually happen (but
> the compiler doesn't know that). But I suspect that on the whole, it
> is likely to generate better code anyway just because the compiler
> sees that first test and can DTRT.
> 
> UNTESTED patch applied in case somebody wants to play with this. It
> removes 10 lines of silly code, and along with them that 'cmov' use.
> 
> Anybody?

Makes sense - it seems to boot here, but I only did some very light 
testing.

There's a minor text size increase on x86-32 defconfig, GCC 14.2.0:

      text       data        bss         dec        hex    filename
  16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
  16577908    7598838    1744896    25921642    18b886a    vmlinux.after

bloatometer output:

  add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

Patch with changelog and your SOB added attached. Does it look good to 
you?

Thanks,

        Ingo

================>
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 28 Apr 2025 08:38:35 +0200
Subject: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling 
to C

Don't do the complicated and probably questionable BS*L+CMOVZL
asm() optimization in variable_ffs() and fls(): performance-critical
code is already using __ffs() and __fls() that use sane interfaces
close to the machine instruction ABI. Check ahead for zero in C.

There's a minor text size increase on x86-32 defconfig:

      text       data        bss         dec        hex    filename
  16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
  16577908    7598838    1744896    25921642    18b886a    vmlinux.after

bloatometer output:

  add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
 arch/x86/include/asm/bitops.h | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 100413aff640..6061c87f14ac 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -321,15 +321,10 @@ static __always_inline int variable_ffs(int x)
        asm("bsfl %1,%0"
            : "=r" (r)
            : ASM_INPUT_RM (x), "0" (-1));
-#elif defined(CONFIG_X86_CMOV)
-       asm("bsfl %1,%0\n\t"
-           "cmovzl %2,%0"
-           : "=&r" (r) : "rm" (x), "r" (-1));
 #else
-       asm("bsfl %1,%0\n\t"
-           "jnz 1f\n\t"
-           "movl $-1,%0\n"
-           "1:" : "=r" (r) : "rm" (x));
+       if (!x)
+               return 0;
+       asm("bsfl %1,%0" : "=r" (r) : "rm" (x));
 #endif
        return r + 1;
 }
@@ -378,15 +373,10 @@ static __always_inline int fls(unsigned int x)
        asm("bsrl %1,%0"
            : "=r" (r)
            : ASM_INPUT_RM (x), "0" (-1));
-#elif defined(CONFIG_X86_CMOV)
-       asm("bsrl %1,%0\n\t"
-           "cmovzl %2,%0"
-           : "=&r" (r) : "rm" (x), "rm" (-1));
 #else
-       asm("bsrl %1,%0\n\t"
-           "jnz 1f\n\t"
-           "movl $-1,%0\n"
-           "1:" : "=r" (r) : "rm" (x));
+       if (!x)
+               return 0;
+       asm("bsrl %1,%0" : "=r" (r) : "rm" (x));
 #endif
        return r + 1;
 }



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.