[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 3/9] xen/bitops: Convert 'hweight(x) > 1' to new multiple_bits_set()



Using hweight() is an especially expensive way of determining simply if
multiple bits are set in a value.  Worse, 4 of the 10 hweight() calls in Xen
are of this form.

Switch to the new multiple_bits_set() helper.  This is far more efficient than
the longhand hweight() algorithm and, owing to its simplicity, likely more
efficient than even a dedicated instruction on a superscalar processor.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
CC: Wei Liu <wl@xxxxxxx>
CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
CC: Julien Grall <julien@xxxxxxx>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
CC: Bertrand Marquis <bertrand.marquis@xxxxxxx>
CC: Michal Orzel <michal.orzel@xxxxxxx>
CC: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
CC: Shawn Anastasio <sanastasio@xxxxxxxxxxxxxxxxxxxxx>

On x86, the code reduction speaks for itself:

  add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-240 (-240)
  Function                                     old     new   delta
  vlapic_ipi                                   722     650     -72
  numa_emulation                               577     497     -80
  do_xenpmu_op                                1665    1577     -88

That's an aweful lot of wasted calculation for the same answer.

I can't find a way of enabling CONFIG_NUMA on ARM (yet?) so right now there's
no change in any other architecture.  However, a synthetic helper shows the
following on ARM32:

  add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-128 (-128)
  Function                                     old     new   delta
  test_mbs                                     176      48    -128

and on ARM64:

  add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44)
  Function                                     old     new   delta
  test_mbs                                      60      16     -44

PPC64 has POPCNTD in a default build of Xen and by chance both algorithms
compile to the same number of instructions.

RISC-V doesn't have hweight() wired up yet at all.
---
 xen/arch/x86/cpu/vpmu.c   |  2 +-
 xen/arch/x86/hvm/vlapic.c | 10 ++++++----
 xen/common/numa.c         |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index b2ba9994129b..a5bb1689c7d5 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -673,7 +673,7 @@ long do_xenpmu_op(
     {
         if ( (pmu_params.val &
               ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
-             (hweight64(pmu_params.val) > 1) )
+             multiple_bits_set(pmu_params.val) )
             return -EINVAL;
 
         /* 32-bit dom0 can only sample itself. */
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 2ec95942713e..4a3e21a65f9d 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -467,12 +467,14 @@ static bool is_multicast_dest(struct vlapic *vlapic, 
unsigned int short_hand,
         return short_hand != APIC_DEST_SELF;
 
     if ( vlapic_x2apic_mode(vlapic) )
-        return dest_mode ? hweight16(dest) > 1 : dest == 0xffffffffU;
+        return dest_mode ? multiple_bits_set((uint16_t)dest)
+                         : dest == 0xffffffffU;
 
     if ( dest_mode )
-        return hweight8(dest &
-                        GET_xAPIC_DEST_FIELD(vlapic_get_reg(vlapic,
-                                                            APIC_DFR))) > 1;
+    {
+        dest &= GET_xAPIC_DEST_FIELD(vlapic_get_reg(vlapic, APIC_DFR));
+        return multiple_bits_set((uint8_t)dest);
+    }
 
     return dest == 0xff;
 }
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 28a09766fabc..ce3991929ce5 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -546,7 +546,7 @@ static int __init numa_emulation(unsigned long start_pfn,
     uint64_t sz = pfn_to_paddr(end_pfn - start_pfn) / numa_fake;
 
     /* Kludge needed for the hash function */
-    if ( hweight64(sz) > 1 )
+    if ( multiple_bits_set(sz) )
     {
         uint64_t x = 1;
 
-- 
2.39.2




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.