[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] dom0 kernel - irq nobody cared ... the continuing saga ..



Tuesday, February 10, 2015, 9:48:09 AM, you wrote:

>>>> On 09.02.15 at 18:13, <linux@xxxxxxxxxxxxxx> wrote:
>> Yes the device that tries to handle the interrupt seems to change .. 
>> however that device is always not actively used.
>> This time it was an unused IDE controller with driver loaded in dom0
>> and a mini-pcie wifi card passed through to a guest.
>> 
>> But i did a bios update like David suggested and now the IDE controller
>> is gone (which is chipset only since it lacks a physical connector anyway).
>> 
>> Now it is sharing the IRQ with the SMbus:
>> 
>> 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus 
>> Controller (rev 04)
>>         Subsystem: Intel Corporation Device 204f
>>         Flags: medium devsel, IRQ 18
>>         Memory at f7d35000 (64-bit, non-prefetchable) [size=256]
>>         I/O ports at f040 [size=32]
>> 
>> 02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless 
>> Network Adapter (PCI-Express) (rev 01)
>>         Subsystem: Lenovo Device 30a1
>>         Flags: bus master, fast devsel, latency 0, IRQ 18
>>         Memory at f7c00000 (64-bit, non-prefetchable) [size=64K]
>>         Capabilities: [40] Power Management version 3
>>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
>>         Capabilities: [60] Express Legacy Endpoint, MSI 00
>>         Capabilities: [100] Advanced Error Reporting
>>         Capabilities: [140] Virtual Channel
>>         Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
>>         Capabilities: [170] Power Budgeting <?>
>>         Kernel driver in use: pciback
>> 
>> Why it seems so keen on interrupt sharing eludes me completely.

> Coming back to the /proc/interrupts output you posted earlier:

> /proc/interrupts shows the high count:

>            CPU0       CPU1       CPU2       CPU3
>   8:          0          0          0          0  xen-pirq-ioapic-edge  rtc0
>   9:          1          0          0          0  xen-pirq-ioapic-level  acpi
>  16:         29          0          0          0  xen-pirq-ioapic-level  
> ehci_hcd:usb3
>  18:     200000          0          0          0  xen-pirq-ioapic-level  
> ata_generic

> I would have thought that xen-pciback would install an interrupt
> handler here too when a device using IRQ 18 gets handed to a
> guest. May there be something broken in xen_pcibk_control_isr()?

It seems to only do that for PV guests, not for HVM.

Don't know how it wil go now after the bios update,
lspci lists the SMBUS is also using irq 18 now, but it doesn't register
a driver (according to lspci -k) and it doesn't appear in dom0's 
/proc/interrupts.

How are things supposed to work with a machine with:
- a shared irq
- iommu + interrupt remapping enabled
- HVM guests

Would dom0 always see the legacy irq or is Xen or the iommu routing it directly 
to 
the guest ?
And what would i suppose to see when using Xen's debug key 'i', should there be 
an entry routing it to both guests ?

In dom0 i see:
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus 
Controller (rev 04)
        Subsystem: Intel Corporation Device 204f
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin C routed to IRQ 18
        Region 0: Memory at f7d35000 (64-bit, non-prefetchable) [size=256]
        Region 4: I/O ports at f040 [size=32]

02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network 
Adapter (PCI-Express) (rev 01)
        Subsystem: Lenovo Device 30a1
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 18

But Xen says:

(XEN) [2015-02-10 09:10:27.040]    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge 
   status=00000000 timer_interrupt()
(XEN) [2015-02-10 09:10:27.040]    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge 
   status=00000030 in-flight=0 domain-list=0:  8(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:   9 affinity:1 vec:70 
type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0:  9(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  13 affinity:f vec:98 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge 
   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  16 affinity:1 vec:b0 
type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0: 16(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  18 affinity:1 vec:c0 
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=3: 18(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  19 affinity:f vec:d0 
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN) [2015-02-10 09:10:27.040]    IRQ:  20 affinity:2 vec:c8 
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=3: 20(-M-),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  23 affinity:8 vec:b8 
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 23(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  24 affinity:f vec:28 type=DMA_MSI      
   status=00000000 iommu_page_fault()
(XEN) [2015-02-10 09:10:27.040]    IRQ:  25 affinity:f vec:30 type=DMA_MSI      
   status=00000000 iommu_page_fault()
(XEN) [2015-02-10 09:10:27.040]    IRQ:  26 affinity:1 vec:d8 type=PCI-MSI      
   status=00000030 in-flight=0 domain-list=0:599(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  27 affinity:2 vec:21 type=PCI-MSI      
   status=00000030 in-flight=0 domain-list=0:598(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  28 affinity:1 vec:29 type=PCI-MSI      
   status=00000030 in-flight=0 domain-list=0:597(---),
(XEN) [2015-02-10 09:10:27.040]    IRQ:  29 affinity:1 vec:59 type=PCI-MSI      
   status=00000010 in-flight=0 domain-list=3: 54(---),

So since domain-list only has domain 3 listed for irq 18,
I would interpret that as that Xen  (/ the iommu if it's involved) should never 
let irq 18 arrive at dom0, but my interpretations regarding irq's at how that 
works have been proven to be quite flaky :-)

At the moment i have annotated the dom0 kernel with (in the hope it does 
trigger 
again some day):

@@ -215,6 +216,13 @@ __report_bad_irq(unsigned int irq, struct irq_desc *desc,
        action = desc->action;
        while (action) {
                printk(KERN_ERR "[<%p>] %pf", action->handler, action->handler);
+
+               if(action->name){
+                       printk(KERN_ERR "?!?!?!? action->name: %s shared:%d 
dev_id:%d\n", action->name, (action->flags & IRQF_SHARED) ? 1 : 0, 
action->dev_id ? 1 : 0);
+               } else {
+                       printk(KERN_ERR "?!?!?!? action->name: No name  
shared:%d dev_id:%d\n", (action->flags & IRQF_SHARED) ? 1 : 0, action->dev_id ? 
1 : 0);
+               }
+
                if (action->thread_fn)
                        printk(KERN_CONT " threaded [<%p>] %pf",
                                        action->thread_fn, action->thread_fn);

It seems the dev_id cookie seems to be device specific, so probably a bit 
harder 
to print .. if you know some better way to figure out where the irq storm is 
coming from or headed to .. i'm all ears (or eyes) ..

--
Sander 

>> Also wondering why it doesn't enable MSI on the WIFI NIC, but perhaps the 
>> driver
>> doesn't support it .. will have to look at that later and see what it does 
>> when 
>> booting baremetal.

> Did you check whether the respective driver is capable of using
> MSI?

> Jan




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.