|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 9/9] x86/bitops: Use the POPCNT instruction when available
On 23.08.2024 01:06, Andrew Cooper wrote:
> A few RFC points.
>
> * I throught we had an x86 general lib-y but I can't find one, hence why it's
> still in xen/lib/ for now.
We indeed have nothing like that yet. The file name should then imo not be
arch-* though, but x86-*. Or you could put it in xen/lib/x86/. It could even
be obj-y rather than lib-y, unless you expect we'll be able to get rid of
all uses, which seems unlikely at least due to bitmap_weight(). Otoh with
my ABI-level series the call site in arch_hweightl() could then be made go
away for v2 and above, at which point it living in lib-y will be preferable.
> * When we up the minimum toolchain to GCC 7 / Clang 5, we can use a
> __attribute__((no_caller_saved_registers)) and can forgo writing this in
> asm.
>
> GCC seems to need extra help, and wants -mgeneral-regs-only too. It has a
> habit of complaining about incompatible instructions even when it's not
> emitting them.
What is this part about? What incompatible instructions, in particular?
> @@ -475,4 +476,24 @@ static always_inline unsigned int arch_flsl(unsigned
> long x)
> }
> #define arch_flsl arch_flsl
>
> +static always_inline unsigned int arch_hweightl(unsigned long x)
> +{
> + unsigned int r;
> +
> + /*
> + * arch_generic_hweightl() is written in ASM in order to preserve all
> + * registers, as the compiler can't see the call.
> + *
> + * This limits the POPCNT instruction to using the same ABI as a function
> + * call (input in %rdi, output in %eax) but that's fine.
> + */
> + alternative_io("call arch_generic_hweightl",
> + "popcnt %[val], %q[res]", X86_FEATURE_POPCNT,
> + ASM_OUTPUT2([res] "=a" (r) ASM_CALL_CONSTRAINT),
> + [val] "D" (x));
If you made [val] an output ("+D") you could avoid preserving the register
in the function. And I'd expect the register wouldn't be re-used often
afterwards, so its clobbering likely won't harm code quality very much.
> --- /dev/null
> +++ b/xen/lib/arch-generic-hweightl.S
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> + .file __FILE__
> +
> +#include <xen/linkage.h>
> +
> +/*
> + * An implementation of generic_hweightl() used on hardware without the
> POPCNT
> + * instruction.
> + *
> + * This function is called from within an ALTERNATIVE in arch_hweightl().
> + * i.e. behind the back of the compiler. Therefore all registers are callee
> + * preserved.
> + *
> + * The ASM is what GCC-12 emits for generic_hweightl() in a release build of
> + * Xen, with spilling of %rdi/%rdx to preserve the callers registers.
> + */
> +FUNC(arch_generic_hweightl)
> + push %rdi
> + push %rdx
> +
> + movabs $0x5555555555555555, %rdx
> + mov %rdi, %rax
> + shr $1, %rax
> + and %rdx, %rax
> + sub %rax, %rdi
> + movabs $0x3333333333333333, %rax
> + mov %rdi, %rdx
> + shr $0x2, %rdi
Could I talk you into omitting the 0x here and ...
> + and %rax, %rdx
> + and %rax, %rdi
> + add %rdi, %rdx
> + mov %rdx, %rax
> + shr $0x4, %rax
... here, and maybe use ...
> + add %rdx, %rax
> + movabs $0xf0f0f0f0f0f0f0f, %rdx
> + and %rdx, %rax
> + movabs $0x101010101010101, %rdx
> + imul %rdx, %rax
> + shr $0x38, %rax
... 56 here (or even BITS_PER_LONG-8)?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |