[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/2][4.17] x86emul: further correct 64-bit mode zero count repeated string insn handling
On 10.10.2022 20:56, Andrew Cooper wrote: > On 06/10/2022 14:11, Jan Beulich wrote: >> In an entirely different context I came across Linux commit 428e3d08574b >> ("KVM: x86: Fix zero iterations REP-string"), which points out that >> we're still doing things wrong: For one, there's no zero-extension at >> all on AMD. And then while RCX is zero-extended from 32 bits uniformly >> for all string instructions on newer hardware, RSI/RDI are only for MOVS >> and STOS on the systems I have access to. (On an old family 0xf system >> I've further found that for REP LODS even RCX is not zero-extended.) >> >> Fixes: 79e996a89f69 ("x86emul: correct 64-bit mode repeated string insn >> handling with zero count") >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> Partly RFC for none of this being documented anywhere (and it partly >> being model specific); inquiry pending. > > None of this surprises me. The rep instructions have always been > microcoded, and 0 reps is a special case which has been largely ignored > until recently. > > I wouldn't be surprised if the behaviour changes with > MISC_ENABLE.FAST_STRINGS (given the KVM commit message) and I also > wouldn't be surprised if it's different between Core and Atom too (given > the Fam 0xf observation). > > It's almost worth executing a zero-length rep stub, except that may > potentially go very wrong in certain ecx/rcx cases. > > I'm not sure how important these cases are to cover. Given that they do > differ between vendors and generation, and that their use in compiled > code is not going to consider the registers live after use, is the > complexity really worth it? By "complexity", what do you mean? The patch doesn't add new complexity, it only converts "true" to "false" in several places, plus it updates a comment. I don't think we can legitimately simplify things (by removing logic), so the only thing I can think of is your thought towards executing a zero-length REP stub (which you say may be problematic in certain cases). Patch 2 makes clear why this wouldn't be a good idea for INS and OUTS. It also cannot possibly be got right when emulating 16-bit code (without switching to a 16-bit code segment), and it's uncertain whether a 32-bit address size override would actually yield the same behavior as a native address size operation in 32-bit code. Of course, if limiting this (the way we currently do) to just 32-bit addressing in 64-bit mode, then this ought to be representative (with the INS/OUTS caveat remaining), but - as you say - adding complexity for likely little gain. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |