Xen project Mailing List

Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

To: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Mon, 29 May 2017 13:11:28 +0100

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Delivery-date: Mon, 29 May 2017 12:11:57 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 29/05/2017 12:46, Razvan Cojocaru wrote: > On 05/29/17 14:05, Jan Beulich wrote: >>>>> On 29.05.17 at 11:20, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>> On 05/26/17 18:11, Andrew Cooper wrote: >>>> On 26/05/17 15:29, Jan Beulich wrote: >>>>>>>> On 25.05.17 at 11:40, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can >>>>>> end up emulating an ud2, for which hvm_emulate_one() returns >>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event(). >>>>> Could you explain what would lead to emulation of UD2? >>>>> >>>>>> This, in turn, causes a hvm_inject_event() call in the context of >>>>>> hvm_do_resume(), which can, if there's already a pending event there, >>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of >>>>>> input (mouse frozen, keyboard unresponsive). >>>>>> >>>>>> After much trial and error, I've been able to confirm this by leaving a >>>>>> guest on for almost a full day with this change: >>>>>> >>>>>> case X86EMUL_EXCEPTION: >>>>>> - hvm_inject_event(&ctx.ctxt.event); >>>>>> + if ( !hvm_event_pending(current) ) >>>>>> + hvm_inject_event(&ctx.ctxt.event); >>>>>> >>>>>> and checking that there's been no BSOD or loss of input. >>>>>> >>>>>> However, just losing the event here, while fine to prove that this is >>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust >>>>>> way of fixing this is. >>>>> Much depends on what the other event is: If it's an interrupt, I'd >>>>> assume there to be an ordering problem (interrupts shouldn't be >>>>> injected when there is a pending exception, their delivery instead >>>>> should be attempted on the first instruction of the exception >>>>> handler [if interrupts remain on] or whenever interrupts get >>>>> re-enabled). >>>> I suspect it is an ordering issue, and something has processed and >>>> interrupt before the emulation occurs as part of the vm_event reply >>>> happens. >>>> >>>> The interrupt ordering spec indicates that external interrupts take >>>> precedent over faults raised from executing an instruction, on the basis >>>> that once the interrupt handler returns, the instruction will generate >>>> the same fault again. However, its not obvious how this is intended to >>>> interact with interrupt windows and vmexits. I expect we can get away >>>> with ensuring that external interrupts are the final thing considered >>>> for injection on the return-to-guest path. >>>> >>>> It might be an idea to leave an assert in vmx_inject_event() that an >>>> event is not already pending, but in the short term, this probably also >>>> wants debugging by trying to identify what sequence of actions is >>>> leading us to inject two events in this case (if indeed this is what is >>>> happening). >>> With some patience, I've been able to catch the problem: "(XEN) >>> vmx_inject_event(3, 14) but 0, 225 pending". >>> >>> 63 /* >>> 64 * x86 event types. This enumeration is valid for: >>> 65 * Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8] >>> 66 * AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only) >>> 67 */ >>> 68 enum x86_event_type { >>> 69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */ >>> 70 X86_EVENTTYPE_NMI = 2, /* NMI */ >>> 71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */ >>> 72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */ >>> 73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */ >>> 74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */ >>> 75 }; >>> >>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an >>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above. >> So this confirms our suspicion, but doesn't move us closer to a >> solution. The question after all is why an external interrupt is >> being delivered prior to or while emulating. As Andrew did >> explain, proper behavior would be to check for external >> interrupts and don't enter emulation if one is pending, or don't >> check for external interrupts until the _next_ instruction >> boundary. Correct architectural behavior will result either way; >> the second variant merely must not continuously defer interrupts >> (i.e. there need to be instruction boundaries at which hardware >> of software do check for them). >> >> I'm not that familiar with the sequence of steps when dealing >> with emulation requests from an introspection agent, so I would >> hope you could go through those code paths to see where >> external interrupts are being checked for. Or wait - isn't your >> problem that you invoke emulation out of hvm_do_resume() (via >> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()), >> which happens after all other "normal" processing of a VM exit? >> Perhaps emulation should be skipped there if an event is already >> pending injection, as emulation not having started means we still >> are on an instruction boundary? > Indeed, we are emulating after all the vmexit processing has already > happened, in the hvm_do_resume() path you've mentioned. > > Not emulating if an event is pending injection is certainly trivial to > test - however, since then we would have to lose the emulation attempt, > presumably the EPT fault event would show up again (a "duplicate" event) > in this case (since it is no longer safe to assume that we land on the > same RIP after the interrupt has been handled, and thus the next > hvm_do_resume() may not happen where we wanted to emulate). > > I'll test this scenario out. The problem is in .Lvmx_do_vmentry. We call vmx_intr_assist() before vmx_process_softirqs() Therefore, when the EPT violation occurs and we break for introspection, we still process pending interrupts before finally getting desheduled to wait for the vm_event reply. On receiving the vm_event reply, we emulate the instruction, and find that an interrupt has already been queued from before, but not yet delivered. Untangling this going to be complicated. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.