[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C
On April 28, 2025 7:25:17 PM PDT, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >On 29/04/2025 3:00 am, H. Peter Anvin wrote: >> On April 28, 2025 5:12:13 PM PDT, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> wrote: >>> On 28/04/2025 10:38 pm, H. Peter Anvin wrote: >>>> On April 28, 2025 9:14:45 AM PDT, Linus Torvalds >>>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >>>>> On Mon, 28 Apr 2025 at 00:05, Ingo Molnar <mingo@xxxxxxxxxx> wrote: >>>>>> And once we remove 486, I think we can do the optimization below to >>>>>> just assume the output doesn't get clobbered by BS*L in the zero-case, >>>>>> right? >>>>> We probably can't, because who knows what "Pentium" CPU's are out there. >>>>> >>>>> Or even if Pentium really does get it right. I doubt we have any >>>>> developers with an original Pentium around. >>>>> >>>>> So just leave the "we don't know what the CPU result is for zero" >>>>> unless we get some kind of official confirmation. >>>>> >>>>> Linus >>>> If anyone knows for sure, it is probably Christian Ludloff. However, there >>>> was a *huge* tightening of the formal ISA when the i686 was introduced >>>> (family=6) and I really believe this was part of it. >>>> >>>> I also really don't trust that family=5 really means conforms to >>>> undocumented P5 behavior, e.g. for Quark. >>> https://www.sandpile.org/x86/flags.htm >>> >>> That's a lot of "can't even characterise the result" in the P5. >>> >>> Looking at P4 column, that is clearly what the latest SDM has >>> retroactively declared to be architectural. >>> >>> ~Andrew >> Yes, but it wasn't about flags here. >> >> Now, question: can we just use __builtin_*() for these? I think gcc should >> always generate inline code for these on x86. > >Yes it does generate inline code. https://godbolt.org/z/M45oo5rqT > >GCC does it branchlessly, but cannot optimise based on context. > >Clang can optimise based on context, except the 0 case it seems. > >Moving to -march=i686 causes both GCC and Clang to switch to CMOV and >create branchless code, but is still GCC still can't optimise out the >CMOV based on context. > >~Andrew Maybe a gcc bug report would be better than trying to hack around this in the kernel?
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |