[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/bitops: Optimise arch_ffs()/etc some more on AMD


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 2 Jun 2025 11:35:50 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 02 Jun 2025 09:36:00 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.05.2025 00:29, Andrew Cooper wrote:
> One aspect of the old ffs()/etc infrastructure was the use of TZCNT/LZCNT
> without a CPUID check.  Given no obvious justification, I dropped it during
> the cleanup.
> 
> It turns out there is a good reason, on all but the most recent AMD CPUs.
> 
> Reinstate the use of "blind" TZCNT/LZCNT when safe to do so.  This happens to
> be preferentially in loops where a repeated saving of 5-6 uops becomes far
> more relevant than a one-off scenario.
> 
> Leave an explanation of why.
> 
> No functional change.
> 
> Fixes: 989e5f08d33e ("x86/bitops: Improve arch_ffs() in the general case")
> Fixes: 5ed26fc0768d ("xen/bitops: Implement ffsl() in common logic")
> Fixes: 54b10ef6c810 ("xen/bitops: Implement fls()/flsl() in common logic")
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

In principle
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
but ...

> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -399,9 +399,16 @@ static always_inline unsigned int arch_ffs(unsigned int 
> x)
>           *
>           * and the optimiser really can work with the knowledge of x being
>           * non-zero without knowing it's exact value, in which case we don't
> -         * need to compensate for BSF's corner cases.  Otherwise...
> +         * need to compensate for BSF's corner cases.
> +         *
> +         * That said, we intentionally use TZCNT on capable hardware when the
> +         * behaviour for the 0 case doesn't matter.  On AMD CPUs prior to
> +         * Zen4, TZCNT is 1-2 uops while BSF is 6-8 with a latency to match.
> +         * Intel CPUs don't suffer this discrepancy.
> +         *
> +         * Otherwise...
>           */
> -        asm ( "bsf %[val], %[res]"
> +        asm ( "rep bsf %[val], %[res]"

... why would we use REP (again) when gas 2.25 supports LZCNT/TZCNT?

Jan

>                : [res] "=r" (r)
>                : [val] "rm" (x) );
>      }
> @@ -428,7 +435,7 @@ static always_inline unsigned int arch_ffsl(unsigned long 
> x)
>  
>      /* See arch_ffs() for safety discussions. */
>      if ( __builtin_constant_p(x > 0) && x > 0 )
> -        asm ( "bsf %[val], %q[res]"
> +        asm ( "rep bsf %[val], %q[res]"
>                : [res] "=r" (r)
>                : [val] "rm" (x) );
>      else
> @@ -446,7 +453,7 @@ static always_inline unsigned int arch_fls(unsigned int x)
>  
>      /* See arch_ffs() for safety discussions. */
>      if ( __builtin_constant_p(x > 0) && x > 0 )
> -        asm ( "bsr %[val], %[res]"
> +        asm ( "rep bsr %[val], %[res]"
>                : [res] "=r" (r)
>                : [val] "rm" (x) );
>      else
> @@ -464,7 +471,7 @@ static always_inline unsigned int arch_flsl(unsigned long 
> x)
>  
>      /* See arch_ffs() for safety discussions. */
>      if ( __builtin_constant_p(x > 0) && x > 0 )
> -        asm ( "bsr %[val], %q[res]"
> +        asm ( "rep bsr %[val], %q[res]"
>                : [res] "=r" (r)
>                : [val] "rm" (x) );
>      else
> 
> base-commit: d965e2ee07c56c341d8896852550914d87ea5374
> prerequisite-patch-id: 8da89000c73c38aab6abfa6622217ea9eff07fbd
> prerequisite-patch-id: 74830838bac94ed1e036a8173cf3210a314b35d8
> prerequisite-patch-id: 0654835c28df8d40b6c97006d041c4d31447a9a6
> prerequisite-patch-id: 2d47d646c6a6e0019918c57753d6c01a752c377f
> prerequisite-patch-id: d8f8c4221a2d7039bae6f3d38e93fe90b2091d01
> prerequisite-patch-id: e0397c86b545a1d65f2e6b2049c282b926c40c64
> prerequisite-patch-id: ea21abe4540ee229f4f8725ce3f701d9ba4bd4a8




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.