[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 3/4] Arm64: further speed-up to hweight{32, 64}()
>>> On 04.06.19 at 18:11, <julien.grall@xxxxxxx> wrote: > On 5/31/19 10:53 AM, Jan Beulich wrote: >> According to Linux commit e75bef2a4f ("arm64: Select >> ARCH_HAS_FAST_MULTIPLIER") this is a further improvement over the >> variant using only bitwise operations on at least some hardware, and no >> worse on other. >> >> Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> RFC: To be honest I'm not fully convinced this is a win in particular in >> the hweight32() case, as there's no actual shift insn which gets >> replaced by the multiplication. Even for hweight64() the compiler >> could emit better code and avoid the explicit shift by 32 (which it >> emits at least for me). > > I can see multiplication instruction used in both hweight32() and > hweight64() with the compiler I am using. That is for which exact implementation? What I was referring to as "could emit better code" was the multiplication-free variant, where the compiler fails to recognize (afaict) another opportunity to fold a shift into an arithmetic instruction: add x0, x0, x0, lsr #4 and x0, x0, #0xf0f0f0f0f0f0f0f add x0, x0, x0, lsr #8 add x0, x0, x0, lsr #16 >>> lsr x1, x0, #32 >>> add w0, w1, w0 and w0, w0, #0xff ret Afaict the two marked insns could be replaced by add x0, x0, x0, lsr #32 With there only a sequence of add-s remaining, I'm having difficulty seeing how the use of mul+lsr would actually help: add x0, x0, x0, lsr #4 and x0, x0, #0xf0f0f0f0f0f0f0f mov x1, #0x101010101010101 mul x0, x0, x1 lsr x0, x0, #56 ret But of course I know nothing about throughput and latency of such add-s with one of their operands shifted first. And yes, the variant using mul is, comparing with the better optimized case, still one insn smaller. > I would expect the compiler could easily replace a multiply by a series > of shift but it would be more difficult to do the invert. > > Also, this has been in Linux for a year now, so I am assuming Linux > folks are happy with changes (CCing Robin just in case I missed > anything). Therefore I am happy to give it a go on Xen as well. In which case - can I take this as an ack, or do you want to first pursue the discussion? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |