|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 01/14] x86/pv: Don't assume that INT $imm8 instructions are two bytes long
On 28.02.2026 00:16, Andrew Cooper wrote:
> For INT $N instructions (besides $0x80 for which there is a dedicated fast
> path), handling is mostly fault-based because of DPL0 gates in the IDT. This
> means that when the guest kernel allows the instruction too, Xen must
> increment %rip to the end of the instruction before passing a trap to the
> guest kernel.
>
> When an INT $N instruction has a prefix, it's longer than two bytes, and Xen
> will deliver the "trap" with %rip pointing into the middle of the instruction.
>
> Introduce a new pv_emulate_sw_interrupt() which uses x86_insn_length() to
> determine the instruction length, rather than assuming two.
>
> This is a change in behaviour for PV guests, but the prior behaviour cannot
> reasonably be said to be intentional.
>
> This change does not affect the INT $0x80 fastpath. Prefixed INT $N
> instructions occur almost exclusively in test code or exploits, and INT $0x80
> appears to be the only user-usable interrupt gate in contemporary PV guests.
Whereas for the slow path, while the subtracting of 2 from %rip there isn't
quite right either, the insn size determination here would then simply yield
2 as well, so all is good for that case as well.
> @@ -1401,6 +1402,53 @@ int pv_emulate_privileged_op(struct cpu_user_regs
> *regs)
> return 0;
> }
>
> +/*
> + * Hardware already decoded the INT $N instruction and determinted that there
> + * was a DPL issue, hence the #GP. Xen has already determined that the guest
> + * kernel has permitted this software interrupt.
> + *
> + * All that is needed is the instruction length, to turn the fault into a
> + * trap. All errors are turned back into the original #GP, as that's the
> + * action that really happened.
> + */
> +void pv_emulate_sw_interrupt(struct cpu_user_regs *regs)
> +{
> + struct vcpu *curr = current;
> + struct domain *currd = curr->domain;
> + struct priv_op_ctxt ctxt = {
> + .ctxt.regs = regs,
> + .ctxt.lma = !is_pv_32bit_domain(currd),
The difference may not be overly significant here, but 64-bit guests can run
32-bit code, so setting .lma seems wrong in that case. As it ought to be
largely benign, perhaps to code could even be left as is, just with a comment
to clarify things?
> + };
> + struct x86_emulate_state *state;
> + uint8_t vector = regs->error_code >> 3;
> + unsigned int len, ar;
> +
> + if ( !pv_emul_read_descriptor(regs->cs, curr, &ctxt.cs.base,
> + &ctxt.cs.limit, &ar, 1) ||
> + !(ar & _SEGMENT_S) ||
> + !(ar & _SEGMENT_P) ||
> + !(ar & _SEGMENT_CODE) )
> + goto error;
> +
> + state = x86_decode_insn(&ctxt.ctxt, insn_fetch);
> + if ( IS_ERR_OR_NULL(state) )
> + goto error;
> +
> + len = x86_insn_length(state, &ctxt.ctxt);
> + x86_emulate_free_state(state);
> +
> + /* Note: Checked slightly late to simplify 'state' handling. */
> + if ( ctxt.ctxt.opcode != 0xcd /* INT $imm8 */ )
> + goto error;
> +
> + regs->rip += len;
> + pv_inject_sw_interrupt(vector);
> + return;
> +
> + error:
> + pv_inject_hw_exception(X86_EXC_GP, regs->entry_vector);
DYM regs->error_code here? Might it alternatively make sense to return a
boolean here, for ...
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1379,8 +1379,7 @@ void do_general_protection(struct cpu_user_regs *regs)
>
> if ( permit_softint(TI_GET_DPL(ti), v, regs) )
> {
> - regs->rip += 2;
> - pv_inject_sw_interrupt(vector);
> + pv_emulate_sw_interrupt(regs);
> return;
... the return here to become conditional, leveraging the #GP injection at
the bottom of this function?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |