[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

On 29/05/2017 12:46, Razvan Cojocaru wrote:
> On 05/29/17 14:05, Jan Beulich wrote:
>>>>> On 29.05.17 at 11:20, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> On 05/26/17 18:11, Andrew Cooper wrote:
>>>> On 26/05/17 15:29, Jan Beulich wrote:
>>>>>>>> On 25.05.17 at 11:40, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>>> Could you explain what would lead to emulation of UD2?
>>>>>> This, in turn, causes a hvm_inject_event() call in the context of
>>>>>> hvm_do_resume(), which can, if there's already a pending event there,
>>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>>>>> input (mouse frozen, keyboard unresponsive).
>>>>>> After much trial and error, I've been able to confirm this by leaving a
>>>>>> guest on for almost a full day with this change:
>>>>>>      case X86EMUL_EXCEPTION:
>>>>>> -        hvm_inject_event(&ctx.ctxt.event);
>>>>>> +        if ( !hvm_event_pending(current) )
>>>>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>>>> and checking that there's been no BSOD or loss of input.
>>>>>> However, just losing the event here, while fine to prove that this is
>>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>>>>> way of fixing this is.
>>>>> Much depends on what the other event is: If it's an interrupt, I'd
>>>>> assume there to be an ordering problem (interrupts shouldn't be
>>>>> injected when there is a pending exception, their delivery instead
>>>>> should be attempted on the first instruction of the exception
>>>>> handler [if interrupts remain on] or whenever interrupts get
>>>>> re-enabled).
>>>> I suspect it is an ordering issue, and something has processed and
>>>> interrupt before the emulation occurs as part of the vm_event reply 
>>>> happens.
>>>> The interrupt ordering spec indicates that external interrupts take
>>>> precedent over faults raised from executing an instruction, on the basis
>>>> that once the interrupt handler returns, the instruction will generate
>>>> the same fault again.  However, its not obvious how this is intended to
>>>> interact with interrupt windows and vmexits.  I expect we can get away
>>>> with ensuring that external interrupts are the final thing considered
>>>> for injection on the return-to-guest path.
>>>> It might be an idea to leave an assert in vmx_inject_event() that an
>>>> event is not already pending, but in the short term, this probably also
>>>> wants debugging by trying to identify what sequence of actions is
>>>> leading us to inject two events in this case (if indeed this is what is
>>>> happening).
>>> With some patience, I've been able to catch the problem: "(XEN)
>>> vmx_inject_event(3, 14) but 0, 225 pending".
>>>  63 /*
>>>  64  * x86 event types. This enumeration is valid for:
>>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>>  67  */
>>>  68 enum x86_event_type {
>>>  69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
>>>  70     X86_EVENTTYPE_NMI = 2,          /* NMI */
>>>  71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
>>>  72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
>>>  73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>>  74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
>>>  75 };
>>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
>> So this confirms our suspicion, but doesn't move us closer to a
>> solution. The question after all is why an external interrupt is
>> being delivered prior to or while emulating. As Andrew did
>> explain, proper behavior would be to check for external
>> interrupts and don't enter emulation if one is pending, or don't
>> check for external interrupts until the _next_ instruction
>> boundary. Correct architectural behavior will result either way;
>> the second variant merely must not continuously defer interrupts
>> (i.e. there need to be instruction boundaries at which hardware
>> of software do check for them).
>> I'm not that familiar with the sequence of steps when dealing
>> with emulation requests from an introspection agent, so I would
>> hope you could go through those code paths to see where
>> external interrupts are being checked for. Or wait - isn't your
>> problem that you invoke emulation out of hvm_do_resume() (via
>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
>> which happens after all other "normal" processing of a VM exit?
>> Perhaps emulation should be skipped there if an event is already
>> pending injection, as emulation not having started means we still
>> are on an instruction boundary?
> Indeed, we are emulating after all the vmexit processing has already
> happened, in the hvm_do_resume() path you've mentioned.
> Not emulating if an event is pending injection is certainly trivial to
> test - however, since then we would have to lose the emulation attempt,
> presumably the EPT fault event would show up again (a "duplicate" event)
> in this case (since it is no longer safe to assume that we land on the
> same RIP after the interrupt has been handled, and thus the next
> hvm_do_resume() may not happen where we wanted to emulate).
> I'll test this scenario out.

The problem is in .Lvmx_do_vmentry.  We call vmx_intr_assist() before

Therefore, when the EPT violation occurs and we break for introspection,
we still process pending interrupts before finally getting desheduled to
wait for the vm_event reply.  On receiving the vm_event reply, we
emulate the instruction, and find that an interrupt has already been
queued from before, but not yet delivered.

Untangling this going to be complicated.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.