[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/bitops: Account for POPCNT errata on earlier Intel CPUs


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 26 Mar 2025 10:20:11 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 26 Mar 2025 09:20:18 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.03.2025 19:52, Andrew Cooper wrote:
> Manually break the false dependency for the benefit of cases such as
> bitmap_weight() which is a reasonable hotpath.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> 
> Not sure if this warrants a fixes or not, but:
> 
> Fixes: 6978602334d9 ("x86/bitops: Use the POPCNT instruction when available")
> 
> Many examples online suggest a 2x improvement perf improvement on tight loops
> by breaking this dependency.  cpumasks in particular make frequent use of this
> loop.
> 
> Still TODO:
> 
>  1) Put a double CS prefix on the CALL instruction to avoid a trailing 2-byte
>     NOP, but this depends on x86_decode_lite() in order to work.
> 
>  2) Revert a buggy GAS diagnostic:
> 
>     ./arch/x86/include/asm/bitops.h: Assembler messages:
>     ./arch/x86/include/asm/bitops.h:493: Error: same type of prefix used twice
>     ./arch/x86/include/asm/bitops.h:493: Error: same type of prefix used twice
> 
>     Multiple prefixes are not an error, and are sometimes the best choice
>     available.

Well. Sane insns have every kind of prefix at most once. The assembler
has (effectively) boolean markers for that. Thing is that multiple
different prefixes of the same kind are ambiguous to encode: Which order
should they be emitted in? One programmer may think "first wins" while
another may expect "last wins". It may be possible to special case "same
prefix", by converting the boolean-ness to a count. Yet I expect more
often than not multiple prefixes of the same type are a mistake. Hence
we probably would then still want to warn about that, requiring a new
way to silence that warning.

However, imo it is better to leave the assembler unaltered, and continue
to require people who want such unusual encodings to specify the prefixes
they want as separate kind-of-insns. Such unusual encodings are likely to
also mean certain positioning of those prefixes relative to other possible
ones; there's no prescribed order the assembler would emit prefixes of
different kinds (REX and REX2 always coming last of course) when they're
part of a real insn.

> It turns out that LZCNT/TZCNT have the same input dependent bug, prior to
> Skylake.

These two do, but BSF/BSR don't? Pretty odd.

> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -488,10 +488,16 @@ static always_inline unsigned int 
> arch_hweightl(unsigned long x)
>       *
>       * This limits the POPCNT instruction to using the same ABI as a function
>       * call (input in %rdi, output in %eax) but that's fine.
> +     *
> +     * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a false
> +     * input dependency on it's destination register (errata HSD146, SKL029
> +     * amongst others), impacting loops such as bitmap_weight().  Insert an
> +     * XOR to manually break the dependency.
>       */

With this being an Intel-only issue, wouldn't we better ...

>      alternative_io("call arch_generic_hweightl",
> +                   "xor %k[res], %k[res]\n\t"

... put this line in #ifdef CONFIG_INTEL then? The extra overhead is small, but
I see no reason not to avoid it if we can. (I realize that's not quite as
straightforward as it reads, for alternative_io() being a macro.)

Jan

>                     "popcnt %[val], %q[res]", X86_FEATURE_POPCNT,
> -                   ASM_OUTPUT2([res] "=a" (r) ASM_CALL_CONSTRAINT),
> +                   ASM_OUTPUT2([res] "=&a" (r) ASM_CALL_CONSTRAINT),
>                     [val] "D" (x));
>  
>      return r;




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.