[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH] x86/hvm: Improve hvm_set_guest_pat() code generation again
From: Edwin Török <edvin.torok@xxxxxxxxxx> Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat() code generation"), and the discovery that Clang/LLVM makes some especially disastrous code generation for the loop at -O2 https://github.com/llvm/llvm-project/issues/54644 Edvin decided to remove the loop entirely by fully vectorising it. This is substantially more efficient than the loop, and rather harder for a typical compiler to mess up. Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> CC: Wei Liu <wl@xxxxxxx> CC: Edwin Török <edvin.torok@xxxxxxxxxx> --- xen/arch/x86/hvm/hvm.c | 51 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 0dd320a6a9fc..b63e6073dfd0 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -302,24 +302,43 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat) *guest_pat = v->arch.hvm.pat_cr; } -int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat) +/* + * MSR_PAT takes 8 uniform fields, each of which must be a valid architectural + * memory type (0, 1, 4-7). This is a fully vectorised form of the + * 8-iteration loop over bytes looking for PAT_TYPE_* constants. + */ +static bool pat_valid(uint64_t val) { - unsigned int i; - uint64_t tmp; + /* Yields a non-zero value in any lane which had value greater than 7. */ + uint64_t any_gt_7 = val & 0xf8f8f8f8f8f8f8f8; - for ( i = 0, tmp = guest_pat; i < 8; i++, tmp >>= 8 ) - switch ( tmp & 0xff ) - { - case PAT_TYPE_UC_MINUS: - case PAT_TYPE_UNCACHABLE: - case PAT_TYPE_WRBACK: - case PAT_TYPE_WRCOMB: - case PAT_TYPE_WRPROT: - case PAT_TYPE_WRTHROUGH: - break; - default: - return 0; - } + /* + * With the > 7 case covered, identify lanes with the value 0-3 by finding + * lanes with bit 2 clear. + * + * Yields bit 2 set in each lane which has a value <= 3. + */ + uint64_t any_le_3 = ~val & 0x0404040404040404; + + /* + * Logically, any_2_or_3 is any_le_3 && bit 1 set. + * + * We could calculate any_gt_1 as val & 0x02 and resolve the two vectors + * of booleans (shift one of them until the mask lines up, then bitwise + * and), but that is unnecessary calculation. + * + * Shift any_le_3 so it becomes bit 1 in each lane which has a value <= 3, + * and look for bit 1 in a subset of lanes. + */ + uint64_t any_2_or_3 = val & (any_le_3 >> 1); + + return !(any_gt_7 | any_2_or_3); +} + +int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat) +{ + if ( !pat_valid(guest_pat) ) + return 0; if ( !alternative_call(hvm_funcs.set_guest_pat, v, guest_pat) ) v->arch.hvm.pat_cr = guest_pat; -- 2.11.0
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |