 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/hvm: Improve hvm_set_guest_pat() code generation again
 On 10.08.2022 15:36, Andrew Cooper wrote:
> From: Edwin Török <edvin.torok@xxxxxxxxxx>
> 
> Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat()
> code generation"), and the discovery that Clang/LLVM makes some especially
> disastrous code generation for the loop at -O2
> 
>   https://github.com/llvm/llvm-project/issues/54644
> 
> Edvin decided to remove the loop entirely by fully vectorising it.  This is
> substantially more efficient than the loop, and rather harder for a typical
> compiler to mess up.
> 
> Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
The main downside being that changing the code to fit in a new PAT
type will now be harder. I wonder in particular whether with that
in mind it wouldn't be better to express the check not in terms of
relations, but in terms of set / clear bits ("bits 3-7 clear AND
(bit 2 set OR bit 1 clear)"). The code kind of does so already, but
the variable names don't reflect that (and would hence need to
change in such an event).
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -302,24 +302,43 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat)
>          *guest_pat = v->arch.hvm.pat_cr;
>  }
>  
> -int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat)
> +/*
> + * MSR_PAT takes 8 uniform fields, each of which must be a valid 
> architectural
> + * memory type (0, 1, 4-7).  This is a fully vectorised form of the
> + * 8-iteration loop over bytes looking for PAT_TYPE_* constants.
While grep-ing for PAT_TYPE_ will hit this line, I think we want
every individual type to also be found here when grep-ing for one.
The actual values aren't going to change, but perhaps the beast
way to do so would still be by way of BUILD_BUG_ON()s.
> + */
> +static bool pat_valid(uint64_t val)
>  {
> -    unsigned int i;
> -    uint64_t tmp;
> +    /* Yields a non-zero value in any lane which had value greater than 7. */
> +    uint64_t any_gt_7   =  val & 0xf8f8f8f8f8f8f8f8;
This and the other constant want to gain UL suffixes. (While I'm
open to be convinced otherwise on the earlier two points, this one
I'm going to insist on. Yet in case it would end up being the only
change in need of making, it could of course be done while
committing.)
Jan
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |