|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
On 22/10/2018 08:33, Chao Gao wrote:
> On Mon, Oct 15, 2018 at 01:06:12PM +0100, Andrew Cooper wrote:
>> On 15/10/18 11:30, Roger Pau Monné wrote:
>>> Hello,
>>>
>>> Wei recently discovered an issue when running a Linux PVH Dom0 on a
>>> box with a Intel Family 6 (0x6), Model 158 (0x9e), Stepping 9 (raw
>>> 000906e9) CPU, we are not sure whether the issue is limited to a PVH
>>> Dom0, or it just happens to be easier to trigger in this scenario.
>> This issue has been seen very occasionally for years. My debugging
>> patch dates back to 2013, and it has been observed on Haswell systems as
>> well. There have also been a handful of reports on xen-devel over the
>> years.
>>
>> Wei is the first person to get a reliable enough repro to debug. It is
>> not exclusive to PVH Dom0, but that appears to be the easiest way to
>> tickle the problem.
>>
>>> The issue is caused by what seems to be an interrupt injection while
>>> Xen is still servicing a previous interrupt (ie: the interrupt hasn't
>>> been EOI'ed and ISR for the vector is set) with the same or lower
>>> priority than the interrupt currently being serviced. This injection
>>> always happen when returning from idle from a state ACPI_STATE_C3 or
>>> lower.
>> As a bit of background, for some guest irqs, we need to inject the
>> interrupt into the guest and wait for an explicit ack.
>>
>> If the irq source doesn't have a mask bit which Xen can use, the only
>> option we have is to avoid repeated interruption is to leave the irq in
>> service at the LAPIC. The purpose of the Pending EOI stack is to manage
>> these as acks arrive back from guest context.
>>
>> For reasons which aren't clear, guest-bound MSI vectors which don't have
>> a mask bit also use this PEOI stack mechanism. I think this is probably
>> a Xen bug, but it also relevant to the issue.
>>
>> In Wei's case, the interrupt in question is an MSI non-maskable
>> interrupt from the USB controller.
>>
>>> Note that I haven't been able to reproduce this issue when using
>>> mwait-idle=0 or max_cstate=2 on the Xen command line, but again
>>> without knowing the underlying issue it's impossible to tell whether
>>> it's relevant.
>>>
>>> Andrew provided a debug patch which I've expanded to also log power
>>> state transition, and is attached to this email.
>>>
>>> Here is a trace of a crash, together with the debug info.
>>>
>>> (XEN) *** Pending EOI error ***
>>> (XEN) cpu #1, irq 30, vector 0x21, sp 1
>>> (XEN) Peoi stack: sp 1
>>> (XEN) [ 0] irq 30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0
>>> (XEN) Peoi stack trace records:
>>> (XEN) [22619] POP {sp 1, irq 30, vec 0x21}
>>> (XEN) [22620] POWER TYPE 4
>>> (XEN) [22621] IDLE PPR 0x00000010
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) [22622] WAKE PPR 0x00000010
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000004
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) [22623] ACK_PRE PPR 0x000000f0
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000004
>>> (XEN) [22624] ACK_POST PPR 0x00000010
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) [22625] POWER TYPE 5
>>> (XEN) [22626] IDLE PPR 0x00000010
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) [22627] WAKE PPR 0x00000010
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) [22628] PUSH {sp 0, irq 30, vec 0x21}
>>> (XEN) [22629] POWER TYPE 5
>>> (XEN) [22630] IDLE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22631] WAKE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22632] POWER TYPE 5
>>> (XEN) [22633] IDLE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22634] WAKE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000004
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22635] ACK_PRE PPR 0x000000f0
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000004
>>> (XEN) [22636] ACK_POST PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22637] READY {sp 1, irq 30, vec 0x21}
>>> (XEN) [22638] ACK_PRE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22639] ACK_POST PPR 0x00000010
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) [22640] POP {sp 1, irq 30, vec 0x21}
>>> (XEN) [22641] PUSH {sp 0, irq 30, vec 0x21}
>>> (XEN) [22642] POWER TYPE 4
>>> (XEN) [22643] IDLE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22644] WAKE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22645] POWER TYPE 3
>>> (XEN) [22646] IDLE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22647] WAKE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22648] POWER TYPE 3
>>> (XEN) [22649] IDLE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) [22650] WAKE PPR 0x00000020
>>> (XEN) IRR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN) ISR
>>> 0000000002000000000000000000000000000000000000000000000000000000
>> What has happened here is that, despite vector 0x21 being in service
>> (starting at the PUSH), we see it injected a second time. The ASSERT()
>> fires because we find this vector still on the pending EOI stack.
>>
>> After that, we go idle a few times, but never haven't yet acked the
>> vector (i.e. whatever we're waiting for the guest to acknowledge hasn't
>> happened yet, and Xen has nothing else to do on this CPU).
>>
> >From the debugging, we see that PPR/IRR/ISR appear to retain their state
>> across the mwait, and there is nothing in the manual which I can see
>> discussing the interaction of LAPIC state and C states.
>>
>> However, from the behaviour seen here, we occasionally get woken from
>> mwait by an interrupt which already pending. I can only conclude that
>> there is some issue with priority calculations for edge triggered
>> interrupts when idle, which allows another one to slip in. The fact
> Hi, Roger, Andrew and Wei,
>
> Jan's patch
> (https://lists.xen.org/archives/html/xen-devel/2018-10/msg01031.html)
> fixs an issue in handling SVI. Currently, when dealing with EOI from guest,
> the
> SVI was cleared. But the correct way is clearing the corresponding bit in VISR
> and then setting SVI to the highest index of bit set in VISR (please refer to
> SDM 29.1.4). If SVI is set to a value lower than the vector of the highest
> priority interrupt that is in service, the PPR virtualization (29.1.3) might
> set the VPPR to a lower value on VMEntry too. Thus an interrupt with same or
> lower priority, which should be blocked by VPPR, slips in.
>
> Could you apply Jan's patch and try to reproduce it again?
Hello,
I'm aware of Jan's patch, but pertains to Xen's emulation of the virtual
Local APIC for a guest.
This bug is with the real hardware APIC, as it pertains waking from
MWAIT. At the point that things go wrong, there is no VT-x involved at all.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |