Xen project Mailing List

Re: [PATCH v4 01/14] x86/pv: Don't assume that INT $imm8 instructions are two bytes long

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Mon, 2 Mar 2026 12:03:57 +0100

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 02 Mar 2026 11:04:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.02.2026 00:16, Andrew Cooper wrote: > For INT $N instructions (besides $0x80 for which there is a dedicated fast > path), handling is mostly fault-based because of DPL0 gates in the IDT. This > means that when the guest kernel allows the instruction too, Xen must > increment %rip to the end of the instruction before passing a trap to the > guest kernel. > > When an INT $N instruction has a prefix, it's longer than two bytes, and Xen > will deliver the "trap" with %rip pointing into the middle of the instruction. > > Introduce a new pv_emulate_sw_interrupt() which uses x86_insn_length() to > determine the instruction length, rather than assuming two. > > This is a change in behaviour for PV guests, but the prior behaviour cannot > reasonably be said to be intentional. > > This change does not affect the INT $0x80 fastpath. Prefixed INT $N > instructions occur almost exclusively in test code or exploits, and INT $0x80 > appears to be the only user-usable interrupt gate in contemporary PV guests. Whereas for the slow path, while the subtracting of 2 from %rip there isn't quite right either, the insn size determination here would then simply yield 2 as well, so all is good for that case as well. > @@ -1401,6 +1402,53 @@ int pv_emulate_privileged_op(struct cpu_user_regs > *regs) > return 0; > } > > +/* > + * Hardware already decoded the INT $N instruction and determinted that there > + * was a DPL issue, hence the #GP. Xen has already determined that the guest > + * kernel has permitted this software interrupt. > + * > + * All that is needed is the instruction length, to turn the fault into a > + * trap. All errors are turned back into the original #GP, as that's the > + * action that really happened. > + */ > +void pv_emulate_sw_interrupt(struct cpu_user_regs *regs) > +{ > + struct vcpu *curr = current; > + struct domain *currd = curr->domain; > + struct priv_op_ctxt ctxt = { > + .ctxt.regs = regs, > + .ctxt.lma = !is_pv_32bit_domain(currd), The difference may not be overly significant here, but 64-bit guests can run 32-bit code, so setting .lma seems wrong in that case. As it ought to be largely benign, perhaps to code could even be left as is, just with a comment to clarify things? > + }; > + struct x86_emulate_state *state; > + uint8_t vector = regs->error_code >> 3; > + unsigned int len, ar; > + > + if ( !pv_emul_read_descriptor(regs->cs, curr, &ctxt.cs.base, > + &ctxt.cs.limit, &ar, 1) || > + !(ar & _SEGMENT_S) || > + !(ar & _SEGMENT_P) || > + !(ar & _SEGMENT_CODE) ) > + goto error; > + > + state = x86_decode_insn(&ctxt.ctxt, insn_fetch); > + if ( IS_ERR_OR_NULL(state) ) > + goto error; > + > + len = x86_insn_length(state, &ctxt.ctxt); > + x86_emulate_free_state(state); > + > + /* Note: Checked slightly late to simplify 'state' handling. */ > + if ( ctxt.ctxt.opcode != 0xcd /* INT $imm8 */ ) > + goto error; > + > + regs->rip += len; > + pv_inject_sw_interrupt(vector); > + return; > + > + error: > + pv_inject_hw_exception(X86_EXC_GP, regs->entry_vector); DYM regs->error_code here? Might it alternatively make sense to return a boolean here, for ... > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -1379,8 +1379,7 @@ void do_general_protection(struct cpu_user_regs *regs) > > if ( permit_softint(TI_GET_DPL(ti), v, regs) ) > { > - regs->rip += 2; > - pv_inject_sw_interrupt(vector); > + pv_emulate_sw_interrupt(regs); > return; ... the return here to become conditional, leveraging the #GP injection at the bottom of this function? Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.