[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C



On April 28, 2025 7:25:17 PM PDT, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 
wrote:
>On 29/04/2025 3:00 am, H. Peter Anvin wrote:
>> On April 28, 2025 5:12:13 PM PDT, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 
>> wrote:
>>> On 28/04/2025 10:38 pm, H. Peter Anvin wrote:
>>>> On April 28, 2025 9:14:45 AM PDT, Linus Torvalds 
>>>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>>> On Mon, 28 Apr 2025 at 00:05, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>>>>> And once we remove 486, I think we can do the optimization below to
>>>>>> just assume the output doesn't get clobbered by BS*L in the zero-case,
>>>>>> right?
>>>>> We probably can't, because who knows what "Pentium" CPU's are out there.
>>>>>
>>>>> Or even if Pentium really does get it right. I doubt we have any
>>>>> developers with an original Pentium around.
>>>>>
>>>>> So just leave the "we don't know what the CPU result is for zero"
>>>>> unless we get some kind of official confirmation.
>>>>>
>>>>>          Linus
>>>> If anyone knows for sure, it is probably Christian Ludloff. However, there 
>>>> was a *huge* tightening of the formal ISA when the i686 was introduced 
>>>> (family=6) and I really believe this was part of it.
>>>>
>>>> I also really don't trust that family=5 really means conforms to 
>>>> undocumented P5 behavior, e.g. for Quark.
>>> https://www.sandpile.org/x86/flags.htm
>>>
>>> That's a lot of "can't even characterise the result" in the P5.
>>>
>>> Looking at P4 column, that is clearly what the latest SDM has
>>> retroactively declared to be architectural.
>>>
>>> ~Andrew
>> Yes, but it wasn't about flags here. 
>>
>> Now, question: can we just use __builtin_*() for these? I think gcc should 
>> always generate inline code for these on x86.
>
>Yes it does generate inline code.  https://godbolt.org/z/M45oo5rqT
>
>GCC does it branchlessly, but cannot optimise based on context.
>
>Clang can optimise based on context, except the 0 case it seems.
>
>Moving to -march=i686 causes both GCC and Clang to switch to CMOV and
>create branchless code, but is still GCC still can't optimise out the
>CMOV based on context.
>
>~Andrew

Maybe a gcc bug report would be better than trying to hack around this in the 
kernel?



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.