[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> > UNTESTED patch applied in case somebody wants to play with this. It
> > removes 10 lines of silly code, and along with them that 'cmov' use.
> > 
> > Anybody?
> 
> Makes sense - it seems to boot here, but I only did some very light 
> testing.
> 
> There's a minor text size increase on x86-32 defconfig, GCC 14.2.0:
> 
>       text       data        bss         dec        hex    filename
>   16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
>   16577908    7598838    1744896    25921642    18b886a    vmlinux.after
> 
> bloatometer output:
> 
>   add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

And once we remove 486, I think we can do the optimization below to 
just assume the output doesn't get clobbered by BS*L in the zero-case, 
right?

In the text size space it's a substantial optimization on x86-32 
defconfig:

        text       data        bss           dec            hex filename
  16,577,728    7598826    1744896      25921450        18b87aa vmlinux.vanilla 
     # CMOV+BS*L
  16,577,908    7598838    1744896      25921642        18b886a 
vmlinux.linus_patch  # if()+BS*L
  16,573,568    7602922    1744896      25921386        18b876a 
vmlinux.noclobber    # BS*L

Thanks,

        Ingo

---
 arch/x86/include/asm/bitops.h | 20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 6061c87f14ac..e3e94a806656 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -308,24 +308,16 @@ static __always_inline int variable_ffs(int x)
 {
        int r;
 
-#ifdef CONFIG_X86_64
        /*
         * AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the
         * dest reg is undefined if x==0, but their CPU architect says its
         * value is written to set it to the same as before, except that the
         * top 32 bits will be cleared.
-        *
-        * We cannot do this on 32 bits because at the very least some
-        * 486 CPUs did not behave this way.
         */
        asm("bsfl %1,%0"
            : "=r" (r)
            : ASM_INPUT_RM (x), "0" (-1));
-#else
-       if (!x)
-               return 0;
-       asm("bsfl %1,%0" : "=r" (r) : "rm" (x));
-#endif
+
        return r + 1;
 }
 
@@ -360,24 +352,16 @@ static __always_inline int fls(unsigned int x)
        if (__builtin_constant_p(x))
                return x ? 32 - __builtin_clz(x) : 0;
 
-#ifdef CONFIG_X86_64
        /*
         * AMD64 says BSRL won't clobber the dest reg if x==0; Intel64 says the
         * dest reg is undefined if x==0, but their CPU architect says its
         * value is written to set it to the same as before, except that the
         * top 32 bits will be cleared.
-        *
-        * We cannot do this on 32 bits because at the very least some
-        * 486 CPUs did not behave this way.
         */
        asm("bsrl %1,%0"
            : "=r" (r)
            : ASM_INPUT_RM (x), "0" (-1));
-#else
-       if (!x)
-               return 0;
-       asm("bsrl %1,%0" : "=r" (r) : "rm" (x));
-#endif
+
        return r + 1;
 }
 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.