[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v8 46/50] x86emul: support GFNI insns
>>> On 21.06.19 at 16:20, <andrew.cooper3@xxxxxxxxxx> wrote: > On 21/06/2019 15:00, Jan Beulich wrote: >>> On 15/03/2019 11:06, Jan Beulich wrote: >>>> @@ -138,6 +141,26 @@ static bool simd_check_avx512vbmi_vl(voi >>>> return cpu_has_avx512_vbmi && cpu_has_avx512vl; >>>> } >>>> >>>> +static bool simd_check_sse2_gf(void) >>>> +{ >>>> + return cpu_has_gfni && cpu_has_sse2; >>> This dependency doesn't match the manual. The legacy encoding needs >>> GFNI alone. >>> >>> gen-cpuid.py is trying to reduce the ability to create totally >>> implausible configurations via levelling, but for software checks, we >>> should follow the manual to the letter. >> This is test harness code - I'd rather be a little more strict here than >> having to needlessly spend time fixing an issue in there. Furthermore >> this matches how gcc enforces dependencies. >>>> +} >>>> + >>>> +static bool simd_check_avx2_gf(void) >>>> +{ >>>> + return cpu_has_gfni && cpu_has_avx2; >>> Here, the dependency is only on AVX, which I think is probably trying to >>> express a dependency on xcr0.ymm >> Mostly as per above, except that here gcc (imo wrongly) enables just >> AVX. >> >>>> +} >>>> + >>>> +static bool simd_check_avx512bw_gf(void) >>>> +{ >>>> + return cpu_has_gfni && cpu_has_avx512bw; >>> I don't see any BW interaction anywhere (in the manual), despite the >>> fact it operates on a datatype of int8. >> But by operating on vectors of bytes, it requires 64 bits wide mask >> registers, which is the connection to BW. Again gcc also does so. > > To be honest, it doesn't matter what GCC does. Coming back to this - it very much matters _here_ (i.e. in test harness code): For one, the checks above need to be in line with the -m<isa> options we pass to the compiler. I.e. if anything the question might be on the Makefile additions why I enable SSE2, AVX2, and AVX512BW respectively. While I could simply claim that this is my choice as to producing sensible test case binary blobs, there's a compiler aspect _there_: gcc's intrinsics header enables SSE2, AVX, and AVX512F / AVX512BW around the definitions on the inline wrappers around the builtins. This in turn is because the availability of the respective builtins depends on these ISAs to be enabled alongside GFNI itself. We therefore may not use any ISA level _lower_ than what the builtins require. As mentioned before, seeing them use SSE2 as prereq for the legacy encoded insns, I question their use of AVX instead of AVX2, and hence I'd prefer to stick with AVX2; if nothing else then simply because of what I've said in the first half sentence of this paragraph. (My guess is that it's the availability of {,,V}MOVDQ{A,U}* that did determine their choice, rather than the general availability of vector operations on the given vector and element types and sizes. I'm sure this would lead to some rather "funny" generated code when inputs come from other transformations rather than straight from memory.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |