[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic



Hi Thimo,

From your previous experience and log, it shows:

1.       The interrupt that triggers the issue is a MSI.

2.       MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).

3.       The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.

4.       The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

5.       Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.

6.       I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?

Also, please provide the whole Xen log.

 

Best regards,

Yang

 

From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Thimo E.
Sent: Monday, August 12, 2013 1:47 AM
To: Andrew Cooper
Cc: Keir Fraser; Jan Beulich; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

 

Hello again,

attached you'll find another crash dump from today. Don't know if it gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a Core i5-4670 CPU.

Best regards
  Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:

On 09/08/13 22:40, Andrew Cooper wrote:


So according to my debugging, we really have just pushed the same irq which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR before the PEOI stack is expecting (which I


"can't"

Apologies for the confusion.

~Andrew


obviously see, looking at the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers:  Do you have any ideas?  Could this be related to APICv?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.