[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] x86: use POPCNT for hweight<N>() when available
>>> On 03.06.19 at 10:13, <JBeulich@xxxxxxxx> wrote: >>>> On 31.05.19 at 22:43, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 31/05/2019 02:54, Jan Beulich wrote: >>> This is faster than using the software implementation, and the insn is >>> available on all half-way recent hardware. Therefore convert >>> generic_hweight<N>() to out-of-line functions (without affecting Arm) >>> and use alternatives patching to replace the function calls. >>> >>> Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> >> So, I trust you weren't expecting to just ack this and let it go in? >> >> The principle of the patch (use popcnt when available) is an improvement >> which I'm entirely in agreement with, but everything else is a problem. >> >> The long and the short of it is that I'm not going to accept any version >> of this which isn't the Linux version. > > You're kidding. We want to move away from assembly wherever we > can, and you demand new assembly code? > >>>From a microarchitectural standpoint, the tradeoff between fractional >> register scheduling flexibility (which in practice is largely bound >> anyway by real function calls in surrounding code) and increased icache >> pressure/coldness (from the redundant function copies) falls largely in >> favour of the Linux way of doing it, a cold icache line is >> disproportionally more expensive than requiring the compiler to order >> its registers differently (especially as all non-obsolete processors >> these days have zero-cost register renaming internally, for the purpose >> of superscalar execution). > > I'm afraid I'm struggling heavily as to what you're wanting to tell > me here: Where's the difference (in this regard) between the > change here and the way how Linux does it? Both emit a CALL > insn with registers set up suitably for it, and both patch it with a > POPCNT insn using the registers as demanded by the CALL. Having thought about this some more, in an attempt to try to understand (a) what you mean and (b) how you want things to be done "your way", I'm afraid I've got more confused: Your reply reminds me heavily of the discussion we had on the BMI2 patching series I had done (and now dropped): There you complained about me _not_ using fixed registers and hence potentially calling frequent i-cache-cold lines to be accessed. While my original plan was to use a similar approach here, I specifically went the opposite way to avoid similar complaints of yours. Just to find that you use the (apparently) same argument again. As a result I can only conclude that I'm now pretty unclear on what model you would actually approve of. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |