[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 01/14] x86/pv: Don't assume that INT $imm8 instructions are two bytes long


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 2 Mar 2026 12:03:57 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 02 Mar 2026 11:04:00 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.02.2026 00:16, Andrew Cooper wrote:
> For INT $N instructions (besides $0x80 for which there is a dedicated fast
> path), handling is mostly fault-based because of DPL0 gates in the IDT.  This
> means that when the guest kernel allows the instruction too, Xen must
> increment %rip to the end of the instruction before passing a trap to the
> guest kernel.
> 
> When an INT $N instruction has a prefix, it's longer than two bytes, and Xen
> will deliver the "trap" with %rip pointing into the middle of the instruction.
> 
> Introduce a new pv_emulate_sw_interrupt() which uses x86_insn_length() to
> determine the instruction length, rather than assuming two.
> 
> This is a change in behaviour for PV guests, but the prior behaviour cannot
> reasonably be said to be intentional.
> 
> This change does not affect the INT $0x80 fastpath.  Prefixed INT $N
> instructions occur almost exclusively in test code or exploits, and INT $0x80
> appears to be the only user-usable interrupt gate in contemporary PV guests.

Whereas for the slow path, while the subtracting of 2 from %rip there isn't
quite right either, the insn size determination here would then simply yield
2 as well, so all is good for that case as well.

> @@ -1401,6 +1402,53 @@ int pv_emulate_privileged_op(struct cpu_user_regs 
> *regs)
>      return 0;
>  }
>  
> +/*
> + * Hardware already decoded the INT $N instruction and determinted that there
> + * was a DPL issue, hence the #GP.  Xen has already determined that the guest
> + * kernel has permitted this software interrupt.
> + *
> + * All that is needed is the instruction length, to turn the fault into a
> + * trap.  All errors are turned back into the original #GP, as that's the
> + * action that really happened.
> + */
> +void pv_emulate_sw_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct vcpu *curr = current;
> +    struct domain *currd = curr->domain;
> +    struct priv_op_ctxt ctxt = {
> +        .ctxt.regs = regs,
> +        .ctxt.lma = !is_pv_32bit_domain(currd),

The difference may not be overly significant here, but 64-bit guests can run
32-bit code, so setting .lma seems wrong in that case. As it ought to be
largely benign, perhaps to code could even be left as is, just with a comment
to clarify things?

> +    };
> +    struct x86_emulate_state *state;
> +    uint8_t vector = regs->error_code >> 3;
> +    unsigned int len, ar;
> +
> +    if ( !pv_emul_read_descriptor(regs->cs, curr, &ctxt.cs.base,
> +                                  &ctxt.cs.limit, &ar, 1) ||
> +         !(ar & _SEGMENT_S) ||
> +         !(ar & _SEGMENT_P) ||
> +         !(ar & _SEGMENT_CODE) )
> +        goto error;
> +
> +    state = x86_decode_insn(&ctxt.ctxt, insn_fetch);
> +    if ( IS_ERR_OR_NULL(state) )
> +        goto error;
> +
> +    len = x86_insn_length(state, &ctxt.ctxt);
> +    x86_emulate_free_state(state);
> +
> +    /* Note: Checked slightly late to simplify 'state' handling. */
> +    if ( ctxt.ctxt.opcode != 0xcd /* INT $imm8 */ )
> +        goto error;
> +
> +    regs->rip += len;
> +    pv_inject_sw_interrupt(vector);
> +    return;
> +
> + error:
> +    pv_inject_hw_exception(X86_EXC_GP, regs->entry_vector);

DYM regs->error_code here? Might it alternatively make sense to return a
boolean here, for ...

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1379,8 +1379,7 @@ void do_general_protection(struct cpu_user_regs *regs)
>  
>          if ( permit_softint(TI_GET_DPL(ti), v, regs) )
>          {
> -            regs->rip += 2;
> -            pv_inject_sw_interrupt(vector);
> +            pv_emulate_sw_interrupt(regs);
>              return;

... the return here to become conditional, leveraging the #GP injection at
the bottom of this function?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.