|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [xen staging] x86/bitops: Improve arch_ffs() in the general case
commit 989e5f08d33e2642d25fd60d7b373d3d8ff39990
Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
AuthorDate: Thu Mar 14 23:31:11 2024 +0000
Commit: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CommitDate: Sat Jun 1 02:28:14 2024 +0100
x86/bitops: Improve arch_ffs() in the general case
The asm in arch_ffs() is safe but inefficient.
CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a
zero
input. It is safe to pre-load the destination register with -1 and drop the
conditional logic.
However, it is common to find ffs() in a context where the optimiser knows
that x is non-zero even if it the value isn't known precisely. In this
case,
it's safe to drop the preload of -1 too.
There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-92 (-92)
Function old new delta
mask_write 121 113 -8
xmem_pool_alloc 1076 1056 -20
test_bitops 390 358 -32
pt_update_contig_markers 1236 1204 -32
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
---
xen/arch/x86/include/asm/bitops.h | 37 ++++++++++++++++++++++++++++++++-----
1 file changed, 32 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/include/asm/bitops.h
b/xen/arch/x86/include/asm/bitops.h
index 13e3730138..a8e70d7ed7 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -420,12 +420,39 @@ static inline int ffsl(unsigned long x)
static always_inline unsigned int arch_ffs(unsigned int x)
{
- int r;
+ unsigned int r;
+
+ if ( __builtin_constant_p(x > 0) && x > 0 )
+ {
+ /*
+ * A common code pattern is:
+ *
+ * while ( bits )
+ * {
+ * bit = ffs(bits);
+ * ...
+ *
+ * and the optimiser really can work with the knowledge of x being
+ * non-zero without knowing it's exact value, in which case we don't
+ * need to compensate for BSF's corner cases. Otherwise...
+ */
+ asm ( "bsf %[val], %[res]"
+ : [res] "=r" (r)
+ : [val] "rm" (x) );
+ }
+ else
+ {
+ /*
+ * ... the AMD manual states that BSF won't modify the destination
+ * register if x=0. The Intel manual states that the result is
+ * undefined, but the architects have said that the register is
+ * written back with it's old value (zero extended as normal).
+ */
+ asm ( "bsf %[val], %[res]"
+ : [res] "=r" (r)
+ : [val] "rm" (x), "[res]" (-1) );
+ }
- asm ( "bsf %1,%0\n\t"
- "jnz 1f\n\t"
- "mov $-1,%0\n"
- "1:" : "=r" (r) : "rm" (x));
return r + 1;
}
#define arch_ffs arch_ffs
--
generated by git-patchbot for /home/xen/git/xen.git#staging
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |