[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] IO-APIC interrupts getting stuck



El 16/12/14 a les 18.59, Andrew Cooper ha escrit:
> On 16/12/14 17:34, Roger Pau Monnà wrote:
>> Hello,
>>
>> While working on the FreeBSD PVH Dom0 port I've realized that IO-APIC 
>> interrupts get stuck in a very strange state very easily with the 
>> current PIRQ implementation that I'm using on FreeBSD.
>>
>> Since I'm not sure what is going on, I would like to ask for some 
>> feedback and possible solutions, because at this point I'm running out 
>> of ideas of what's happening.
>>
>> In this case I'm going to use IRQ 17 as an example, which is shared 
>> between an Intel(R) PRO/1000 nic, a Broadcom NetXtreme Gigabit nic and 
>> an Intel 82801JI (ICH10) USB controller.
>>
>> Usually during the boot process, or very shortly after it, Dom0 looses 
>> interrupts from IRQ 17, dumping IRQ information from Xen ('i' key), 
>> gives the following output:
>>
>> (XEN)    IRQ:  17 affinity:00000001 vec:a8 type=IO-APIC-level   
>> status=00000010 in-flight=0 domain-list=0: 17(---),
>> ...
>> (XEN)     IRQ 17 Vec168:
>> (XEN)       Apic 0x00, Pin 17: vec=a8 delivery=LoPri dest=L status=1 
>> polarity=1 irr=1 trig=L mask=0 dest_id:1
>>
>> I've also added some event channel debug functions to the FreeBSD 
>> in-kernel debugger in order to print the status of event channels:
>>
>> Port 15 Type: PIRQ
>>         Pirq: 17 ActiveHi: 0 EdgeTrigger: 0 NeedsEOI: 1
>>         Masked: 0 Pending: 0
>>         Per-CPU Masks: cpu#0: 0 cpu#1: 0 cpu#2: 1 cpu#3: 0 cpu#4: 0 cpu#5: 0 
>> cpu#6: 0 cpu#7: 0
>>
>> And the corresponding line from the Xen 'e' debug key:
>>
>> (XEN)       15 [0/0/1]: s=4 n=2 x=0 p=17 i=17
>>
>> This makes me thing that the FreeBSD kernel is failing to EOI the 
>> vector (because of the irr=1 in the Xen IRQ debug info), so I've also 
>> added a function to the debugger that allows me to EOI a vector from 
>> it. But even after issuing a PHYSDEVOP_eoi hypercall on the affected 
>> PIRQ (17), the status is exactly the same, because pirq->masked == 0, 
>> so desc_guest_eoi fails to EOI the vector (see xen/arch/x86/irq.c:1433).
>>
>> So now I'm wondering, how can I "unstuck" this IRQ, and how did it get 
>> into this strange state?
>>
>> Roger.
> 
> Do you have a xen dmesg with full debugging?  According to the first
> line from 'i', Xen believes that the irq in question is not in need of
> an EOI, which is clearly contrary to the IOAPICs view of the world.
> 
> Some random suggestions: does altering interrupt remapping make a
> difference? does altering the ioapic_ack_mode make a difference?

I've also added the following patch to Xen, and it reliably triggers on 
FreeBSD, while it seems to work fine on Linux:

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 6f8f62c..70977dc 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1729,6 +1729,8 @@ static void end_level_ioapic_irq_new(struct irq_desc 
*desc, u8 vector)
  * The idea is from Manfred Spraul.  --macro
  */
     unsigned int v, i = desc->arch.vector;
+    struct IO_APIC_route_entry rte;
+    struct irq_pin_list *entry = irq_2_pin + desc->irq;
 
     /* Manually EOI the old vector if we are moving to the new */
     if ( vector && i != vector )
@@ -1751,6 +1753,9 @@ static void end_level_ioapic_irq_new(struct irq_desc 
*desc, u8 vector)
             __unmask_IO_APIC_irq(desc->irq);
         spin_unlock(&ioapic_lock);
     }
+
+    rte = ioapic_read_entry(entry->apic, entry->pin, 0);
+    ASSERT(rte.irr == 0 || rte.mask != 0);
 }
 
 /*



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.