[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH] x86/bitops: Account for POPCNT errata on earlier Intel CPUs
Manually break the false dependency for the benefit of cases such as bitmap_weight() which is a reasonable hotpath. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> Not sure if this warrants a fixes or not, but: Fixes: 6978602334d9 ("x86/bitops: Use the POPCNT instruction when available") Many examples online suggest a 2x improvement perf improvement on tight loops by breaking this dependency. cpumasks in particular make frequent use of this loop. Still TODO: 1) Put a double CS prefix on the CALL instruction to avoid a trailing 2-byte NOP, but this depends on x86_decode_lite() in order to work. 2) Revert a buggy GAS diagnostic: ./arch/x86/include/asm/bitops.h: Assembler messages: ./arch/x86/include/asm/bitops.h:493: Error: same type of prefix used twice ./arch/x86/include/asm/bitops.h:493: Error: same type of prefix used twice Multiple prefixes are not an error, and are sometimes the best choice available. It turns out that LZCNT/TZCNT have the same input dependent bug, prior to Skylake. There are no instructions in the "cleaned up" part of bitops yet, and I don't expect any to survive cleaning. --- xen/arch/x86/include/asm/bitops.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h index bb9d75646023..87eac7782f10 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -488,10 +488,16 @@ static always_inline unsigned int arch_hweightl(unsigned long x) * * This limits the POPCNT instruction to using the same ABI as a function * call (input in %rdi, output in %eax) but that's fine. + * + * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a false + * input dependency on it's destination register (errata HSD146, SKL029 + * amongst others), impacting loops such as bitmap_weight(). Insert an + * XOR to manually break the dependency. */ alternative_io("call arch_generic_hweightl", + "xor %k[res], %k[res]\n\t" "popcnt %[val], %q[res]", X86_FEATURE_POPCNT, - ASM_OUTPUT2([res] "=a" (r) ASM_CALL_CONSTRAINT), + ASM_OUTPUT2([res] "=&a" (r) ASM_CALL_CONSTRAINT), [val] "D" (x)); return r; -- 2.39.5
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |