[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] x86/hvm: Improve hvm_set_guest_pat() code generation again


  • To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 10 Aug 2022 14:36:55 +0100
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Edwin Török <edvin.torok@xxxxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 10 Aug 2022 13:37:30 +0000
  • Ironport-data: A9a23:8AgG5qrHYzKNFtYw67IHazRTWXFeBmJ2ZRIvgKrLsJaIsI4StFCzt garIBnVa/iIZWenfIt2advn9kwHvZbdz4cyTQdkry0zEywT9JuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefSAOKU5NfsYkhZXRVjRDoqlSVtkus4hp8AqdWiCkaGt MiaT/f3YTdJ4BYpdDNPg06/gEk35q6q6GpA5gZWic1j5zcyqVFEVPrzGonpR5fIatE8NvK3Q e/F0Ia48gvxl/v6Ior4+lpTWhRiro/6ZWBiuFIPM0SRqkEqShgJ+rQ6LJIhhXJ/0F1lqTzTJ OJl7vRcQS9xVkHFdX90vxNwS0mSNoUekFPLzOTWXWV+ACQqflO1q8iCAn3aMqVF3eZ6Llhpp cUjFwtKVxu+28C35Z20H7wEasQLdKEHPasas3BkizrYEewnUdbIRKCiCd1whWlqwJoURLCHO pRfOWEHgBfoOnWjPn8+Dp4kkfjurX74azBC83qepLYt4niVxwt0uFToGIqNIYfXHZgK9qqej mHLoUrACBQqDuC4+B2K8i7zgdTisgquDer+E5Xnr6U30TV/3Fc7Fxk+RVa95/6jhSaWefhSN kgV8SoGtrUp+QqgSdyVdw21pjuIswARX/JUEvYm80edx6zM+QGbC2MYCDlbZ7QbWNQeHGJwk AXTxpWwWGIp4Ob9pW+hGqm88BSyNAcsCj87XhA/CjIO3oXBoaQcgUeaJjp8K5JZnuEZCBmpn W7S9Hlh3uxN5SIY//7lpA6a2lpAsrCMF1dovVuPAwpJ+ysjPOaYi5qUBU83BBqqBKKQVRG/s XcNgKByB8heXMjWxERhrAjgdYxFBspp0xWG2DaD57F7q1yQF4eLJOi8Gg1WKkZzKdojcjT0e kLVsg45zMYNYiPyMvcuMtLsUZ5CIU3c+TPNCJjpgidmOMAtJGdrAgk3DaJv44wduBd1yvxuU XtqWc2tEWwbGcxa8dZCfM9EiOdD7n1vmgvuqWXTlUvPPUy2OCHIEt/o8TKmMogE0U9ziF6No 44Ga5bTl0U3vS+XSnC/zLP/5GsidRATba0aYeQOHgJfCmKKwF0cNsI=
  • Ironport-hdrordr: A9a23:/8bGy6sOqv76gJTtaGiJ3rmV7skDjNV00zEX/kB9WHVpm6yj+v xGUs566faUskd0ZJhEo7q90ca7Lk80maQa3WBzB8bGYOCFghrKEGgK1+KLrwEIcxeUygc379 YDT0ERMrzN5VgRt7eG3OG7eexQvOVuJsqT9JjjJ3QGd3AVV0l5hT0JbTpyiidNNXJ77ZxSLu v72uN34wCOVF4wdcqBCnwMT4H41qf2fMKPW29+O/Y/gjP+9Q+V1A==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

From: Edwin Török <edvin.torok@xxxxxxxxxx>

Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat()
code generation"), and the discovery that Clang/LLVM makes some especially
disastrous code generation for the loop at -O2

  https://github.com/llvm/llvm-project/issues/54644

Edvin decided to remove the loop entirely by fully vectorising it.  This is
substantially more efficient than the loop, and rather harder for a typical
compiler to mess up.

Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx>
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
CC: Wei Liu <wl@xxxxxxx>
CC: Edwin Török <edvin.torok@xxxxxxxxxx>
---
 xen/arch/x86/hvm/hvm.c | 51 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0dd320a6a9fc..b63e6073dfd0 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -302,24 +302,43 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat)
         *guest_pat = v->arch.hvm.pat_cr;
 }
 
-int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat)
+/*
+ * MSR_PAT takes 8 uniform fields, each of which must be a valid architectural
+ * memory type (0, 1, 4-7).  This is a fully vectorised form of the
+ * 8-iteration loop over bytes looking for PAT_TYPE_* constants.
+ */
+static bool pat_valid(uint64_t val)
 {
-    unsigned int i;
-    uint64_t tmp;
+    /* Yields a non-zero value in any lane which had value greater than 7. */
+    uint64_t any_gt_7   =  val & 0xf8f8f8f8f8f8f8f8;
 
-    for ( i = 0, tmp = guest_pat; i < 8; i++, tmp >>= 8 )
-        switch ( tmp & 0xff )
-        {
-        case PAT_TYPE_UC_MINUS:
-        case PAT_TYPE_UNCACHABLE:
-        case PAT_TYPE_WRBACK:
-        case PAT_TYPE_WRCOMB:
-        case PAT_TYPE_WRPROT:
-        case PAT_TYPE_WRTHROUGH:
-            break;
-        default:
-            return 0;
-        }
+    /*
+     * With the > 7 case covered, identify lanes with the value 0-3 by finding
+     * lanes with bit 2 clear.
+     *
+     * Yields bit 2 set in each lane which has a value <= 3.
+     */
+    uint64_t any_le_3   = ~val & 0x0404040404040404;
+
+    /*
+     * Logically, any_2_or_3 is any_le_3 && bit 1 set.
+     *
+     * We could calculate any_gt_1 as val & 0x02 and resolve the two vectors
+     * of booleans (shift one of them until the mask lines up, then bitwise
+     * and), but that is unnecessary calculation.
+     *
+     * Shift any_le_3 so it becomes bit 1 in each lane which has a value <= 3,
+     * and look for bit 1 in a subset of lanes.
+     */
+    uint64_t any_2_or_3 =  val & (any_le_3 >> 1);
+
+    return !(any_gt_7 | any_2_or_3);
+}
+
+int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat)
+{
+    if ( !pat_valid(guest_pat) )
+        return 0;
 
     if ( !alternative_call(hvm_funcs.set_guest_pat, v, guest_pat) )
         v->arch.hvm.pat_cr = guest_pat;
-- 
2.11.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.