[Xen-devel] IO-APIC interrupts getting stuck


While working on the FreeBSD PVH Dom0 port I've realized that IO-APIC 
interrupts get stuck in a very strange state very easily with the 
current PIRQ implementation that I'm using on FreeBSD.

Since I'm not sure what is going on, I would like to ask for some 
feedback and possible solutions, because at this point I'm running out 
of ideas of what's happening.

In this case I'm going to use IRQ 17 as an example, which is shared 
between an Intel(R) PRO/1000 nic, a Broadcom NetXtreme Gigabit nic and 
an Intel 82801JI (ICH10) USB controller.

Usually during the boot process, or very shortly after it, Dom0 looses 
interrupts from IRQ 17, dumping IRQ information from Xen ('i' key), 
gives the following output:

(XEN)    IRQ:  17 affinity:00000001 vec:a8 type=IO-APIC-level   status=00000010 
in-flight=0 domain-list=0: 17(---),
(XEN)     IRQ 17 Vec168:
(XEN)       Apic 0x00, Pin 17: vec=a8 delivery=LoPri dest=L status=1 polarity=1 
irr=1 trig=L mask=0 dest_id:1

I've also added some event channel debug functions to the FreeBSD 
in-kernel debugger in order to print the status of event channels:

Port 15 Type: PIRQ
        Pirq: 17 ActiveHi: 0 EdgeTrigger: 0 NeedsEOI: 1
        Masked: 0 Pending: 0
        Per-CPU Masks: cpu#0: 0 cpu#1: 0 cpu#2: 1 cpu#3: 0 cpu#4: 0 cpu#5: 0 
cpu#6: 0 cpu#7: 0

And the corresponding line from the Xen 'e' debug key:

(XEN)       15 [0/0/1]: s=4 n=2 x=0 p=17 i=17

This makes me thing that the FreeBSD kernel is failing to EOI the 
vector (because of the irr=1 in the Xen IRQ debug info), so I've also 
added a function to the debugger that allows me to EOI a vector from 
it. But even after issuing a PHYSDEVOP_eoi hypercall on the affected 
PIRQ (17), the status is exactly the same, because pirq->masked == 0, 
so desc_guest_eoi fails to EOI the vector (see xen/arch/x86/irq.c:1433).

So now I'm wondering, how can I "unstuck" this IRQ, and how did it get 
into this strange state?


