[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 11/11] x86/bitops: Use the POPCNT instruction when available
On 29.08.2024 00:03, Andrew Cooper wrote: > It has existed in x86 CPUs since 2008, so we're only 16 years late adding > support. With all the other scafolding in place, implement arch_hweightl() > for x86. > > The only complication is that the call to arch_generic_hweightl() is behind > the compilers back. Address this by writing it in ASM and ensure that it > preserves all registers. > > Copy the code generation from generic_hweightl(). It's not a complicated > algorithm, and is easy to regenerate if needs be, but cover it with the same > unit tests as test_generic_hweightl() just for piece of mind. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > v2: > * Fix MISRA 8.2 (parameter name) and 8.5 (single declaration) regressions. > * Rename {arch->x86}-generic-hweightl.{S->c} > * Adjust ASM formating > > The __constructor trick to cause any reference to $foo() to pull in > test_$foo() only works when both are in the same TU. i.e. what I did in > v1 (putting test_arch_generic_hweightl() in the regular generic-hweightl.c) > didn't work. I'm afraid I don't understand this. What exactly didn't work, breaking in which way? Presumably as much as you, I don't really like the global asm() in a C file, when ideally the same could be written with less clutter in an assembly one. > This in turn means that arch_generic_hweightl() needs writing in a global asm > block, and also that we can't use FUNC()/END(). While we could adjust it to > work for GCC/binutils, we can't have CPP macros in Clang-IAS strings. What does Clang different from gcc there? I was hoping that at least their pre- processors would work in (sufficiently) similar ways. > --- /dev/null > +++ b/xen/lib/x86-generic-hweightl.c > @@ -0,0 +1,69 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > + > +#include <xen/bitops.h> > +#include <xen/init.h> > +#include <xen/self-tests.h> > + > +/* > + * An implementation of generic_hweightl() used on hardware without the > POPCNT > + * instruction. > + * > + * This function is called from within an ALTERNATIVE in arch_hweightl(). > + * i.e. behind the back of the compiler. Therefore all registers are callee > + * preserved. > + * > + * The ASM is what GCC-12 emits for generic_hweightl() in a release build of > + * Xen, with spilling of %rdi/%rdx to preserve the callers registers. > + * > + * Note: When we can use __attribute__((no_caller_saved_registers)) > + * unconditionally (GCC 7, Clang 5), we can implement this in plain C. > + */ > +asm ( > + ".type arch_generic_hweightl, STT_FUNC\n\t" > + ".globl arch_generic_hweightl\n\t" > + ".hidden arch_generic_hweightl\n\t" > + ".balign " STR(CONFIG_FUNCTION_ALIGNMENT) ", 0x90\n\t" Maybe better avoid open-coding CODE_FILL, in case we want to change that down the road? Also could I talk you into dropping the \t there? Canonical assembly code wants ... > + "arch_generic_hweightl:\n\t" .. labels unindented. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |