Hi Thimo,
From your previous experience and log, it shows:
1.
The interrupt that triggers the issue is a MSI.
2.
MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log
indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).
3.
The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
4.
The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should
be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?
5.
Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it
a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.
6.
I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?
Also, please provide the whole Xen log.
Best regards,
Yang
From: xen-devel-bounces@xxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Thimo E.
Sent: Monday, August 12, 2013 1:47 AM
To: Andrew Cooper
Cc: Keir Fraser; Jan Beulich; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic
Hello again,
attached you'll find another crash dump from today. Don't know if it gives you more information than the last one.
Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a Core i5-4670 CPU.
Best regards
Thimo
Am 09.08.2013 23:44, schrieb Andrew Cooper:
On 09/08/13 22:40, Andrew Cooper wrote:
So according to my debugging, we really have just pushed the same irq which we have subsequently seen again unexpectedly.
This bug has only ever been seen on Haswell hardware, and appears linked to running HVM guests.
So either there is an erroneous ACK the LAPIC which is clearing the ISR before the PEOI stack is expecting (which I
"can't"
Apologies for the confusion.
~Andrew
obviously see, looking at the code), or something more funky is going on with the hardware.
CC'ing in the Intel maintainers: Do you have any ideas? Could this be related to APICv?
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel