[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 06/44] x86emul: test for correct EVEX Disp8 scaling
>>> On 12.11.18 at 18:42, <andrew.cooper3@xxxxxxxxxx> wrote: > On 25/09/18 14:29, Jan Beulich wrote: >> Besides the already existing tests (which are going to be extended once >> respective ISA extension support is complete), let's also ensure for >> every individual insn that their Disp8 scaling (and memory access width) >> are correct. >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > I can see what you're attempting to do, but you now have two > implementations of the EVEX disp8 logic written by yourself. AFAICT, > this doesn't actually check that the behaviour of the instruction in > hardware matches your model of the instruction - it checks that two of > your models are the same. Correct, but I've specifically tried to make the two models sufficiently different. > The only way I can think of testing the emulator model against hardware > is to start with two memory area poisoned with a non-repeating pattern, > and a src/dst register poisoned with a different non-repeating pattern. > Then, execute a real instruction stub, emulate the other and memcmp() > the two memory regions. That's what some of the tests added right in patch 5 do. Did you intentionally skip that patch while reviewing? > That way, a systematic error in the two models won't cancel out to "all ok". Hence the two different models. I certainly realize the risk you name. >> --- /dev/null >> +++ b/tools/tests/x86_emulator/evex-disp8.c >> @@ -0,0 +1,452 @@ >> +#include <stdarg.h> >> +#include <stdio.h> >> + >> +#include "x86-emulate.h" > > This now needs rearranging to avoid: > > x86-emulate.h:30:3: error: #error "Must not include <stdio.h> before > x86-emulate.h" > # error "Must not include <stdio.h> before x86-emulate.h" Yes, I've already re-based over that other change. >> +enum vl { >> + VL_128, >> + VL_256, >> + VL_512, >> +}; >> + >> +enum scale { >> + SC_vl, >> + SC_el, >> +}; >> + >> +enum vsz { >> + VSZ_vl, >> + VSZ_vl_2, /* VL / 2 */ >> + VSZ_vl_4, /* VL / 4 */ >> + VSZ_vl_8, /* VL / 8 */ >> + /* "no broadcast" implied from here on. */ >> + VSZ_el, >> + VSZ_el_2, /* EL * 2 */ >> + VSZ_el_4, /* EL * 4 */ >> + VSZ_el_8, /* EL * 8 */ >> +}; >> + > > These acronyms get increasingly difficult to follow. What is el in this > context? VL -> vector length EL -> element length >> +static const struct test avx512f_all[] = { >> + INSN_SFP(mov, 0f, 10), >> + INSN_SFP(mov, 0f, 11), >> + INSN_PFP_NB(mova, 0f, 28), >> + INSN_PFP_NB(mova, 0f, 29), >> + INSN(movdqa32, 66, 0f, 6f, vl, d_nb, vl), >> + INSN(movdqa32, 66, 0f, 7f, vl, d_nb, vl), >> + INSN(movdqa64, 66, 0f, 6f, vl, q_nb, vl), >> + INSN(movdqa64, 66, 0f, 7f, vl, q_nb, vl), >> + INSN(movdqu32, f3, 0f, 6f, vl, d_nb, vl), >> + INSN(movdqu32, f3, 0f, 7f, vl, d_nb, vl), >> + INSN(movdqu64, f3, 0f, 6f, vl, q_nb, vl), >> + INSN(movdqu64, f3, 0f, 7f, vl, q_nb, vl), >> + INSN(movntdq, 66, 0f, e7, vl, d_nb, vl), >> + INSN(movntdqa, 66, 0f38, 2a, vl, d_nb, vl), >> + INSN_PFP_NB(movnt, 0f, 2b), >> + INSN_PFP_NB(movu, 0f, 10), >> + INSN_PFP_NB(movu, 0f, 11), >> +}; >> + >> +static const struct test avx512f_128[] = { >> + INSN(mov, 66, 0f, 6e, el, dq64, el), >> + INSN(mov, 66, 0f, 7e, el, dq64, el), >> + INSN(movq, f3, 0f, 7e, el, q, el), >> + INSN(movq, 66, 0f, d6, el, q, el), >> +}; >> + >> +static const struct test avx512bw_all[] = { >> + INSN(movdqu8, f2, 0f, 6f, vl, b, vl), >> + INSN(movdqu8, f2, 0f, 7f, vl, b, vl), >> + INSN(movdqu16, f2, 0f, 6f, vl, w, vl), >> + INSN(movdqu16, f2, 0f, 7f, vl, w, vl), >> +}; >> + >> +static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; >> +static const unsigned char vl_128[] = { VL_128 }; > > What are these for, and why is vl_all[]'s VL_128 out of order? The RUN() macro invocations (further down) reference one them each, to indicate what vector lengths to test. The first array entry does always get used, while subsequent entries (if any) require AVX512VL to be available. See the conditional at the top of the inner loop in test_group(). >> + >> +/* >> + * This table, indicating the presence of an immediate (byte) for an opcode >> + * space 0f major opcode, is indexed by high major opcode byte nibble, with >> + * each table element then bit-indexed by low major opcode byte nibble. >> + */ >> +static const uint16_t imm0f[16] = { >> + [0x7] = (1 << 0x0) /* vpshuf* */ | >> + (1 << 0x1) /* vps{ll,ra,rl}w */ | >> + (1 << 0x2) /* vps{l,r}ld, vp{rol,ror,sra}{d,q} */ | >> + (1 << 0x3) /* vps{l,r}l{,d}q */, >> + [0xc] = (1 << 0x2) /* vcmp{p,s}{d,s} */ | >> + (1 << 0x4) /* vpinsrw */ | >> + (1 << 0x5) /* vpextrw */ | >> + (1 << 0x6) /* vshufp{d,s} */, >> +}; >> + >> +static struct x86_emulate_ops emulops; >> + >> +static unsigned int accessed[3 * 64]; > > What are the expected properties? Why 3 * ? See record_access(): The instructions under test all get a Disp8 value of 1 encoded. In order to be able to sensibly see how exactly things go wrong (during debugging), it simply helps to cover the entire range from zero to 3 times the (maximum) vector length. All accesses farther out of bounds than by vector length will not be recorded here, and hence fail "silently". Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |