[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: pci-passthrough "irq 16: nobody cared" on HVM guest shutdown on irq of device not passed through.



Friday, September 26, 2014, 11:43:04 AM, you wrote:

>>>> On 26.09.14 at 11:18, <linux@xxxxxxxxxxxxxx> wrote:
>> Friday, September 26, 2014, 8:59:22 AM, you wrote:
>>>>>> On 26.09.14 at 00:09, <linux@xxxxxxxxxxxxxx> wrote:
>>>> - Tried switching off the onboard soundcard in the bios. Now irq16 is not 
>>>> bound 
>>>>   to any device, but the machine still freezes without any error (on 
>>>> serial 
>>>>   console with sync-console on, triple ctrl-a also doesn't work anymore)
>> 
>>> I suppose that hang is with irqpoll still in use? Ctrl-a not working anymore
>>> makes me wonder whether you use a PCI serial card sitting on that same
>>> IRQ for the Xen console... 
>> 
>> With and without irqpoll in use. 

> Oh, even without. That's worrying indeed. But again, the main thing
> to understand is who sets up and unmasks IRQ 16 when and for
> what reason.

>> I don't know what happens if there is a race / ordering problem in say 
>> xen/iommu/pciback pulling 
>> the device from the guest on shutdown while there are still irq's pending ?
>> 
>> (the code of xen_pciback's release and resetting function for instance seems 
>> to do a different 
>> ordering compared to vfio-pci's)

> That's a different aspect, as long as I recall correctly that the passed
> through device isn't itself sitting on IRQ 16.

>>> Furthermore in that mode (with supposedly no
>>> handler set up for IRQ 16) monitoring (with a little bit of debugging code)
>>> how/when IRQ 16 gets setup and unmasked may provide further hints.
>> 
>> I don't know if i made it clear enough, but without the device occupying 
>> irq16 
>> it doesn't give the irq16 nobody cared (or any such message or error), it 
>> just 
>> freezes.

> Because likely it doesn't even get that far.

> Jan

Ok done some more testing and it seems really borked to me.

What i have done is:
- I re-enabled the soundcard on 00:14.2 and it again is setup with irq 16 
according to lspci.
- I enabled initcall_debug and driver/bus debug in the dom0 kernel.
- I added some extra debug info around the code paths involved in setting up 
the irq's and added a WARN_ON(gsi==16) in xen_register_gsi()

The result at the boot of dom0, the first time gsi/irq16 is registered is:

[   66.819396] bus: 'pci': really_probe: bound device 0000:00:0d.0 to driver 
pcieport
[   66.842286] bus: 'pci': driver_probe_device: matched device 0000:00:15.0 
with driver pcieport
[   66.868014] bus: 'pci': really_probe: probing driver pcieport with device 
0000:00:15.0
[   66.892106] xen: registering gsi 16 triggering 0 polarity 1
[   66.908861] ------------[ cut here ]------------
[   66.922902] WARNING: CPU: 2 PID: 1 at arch/x86/pci/xen.c:131 
xen_register_gsi.part.5+0x40/0xc4()
[   66.949416] Modules linked in:
[   66.958775] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 
3.17.0-rc6-20140925-vanilla-printk6+ #1
[   66.984510] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[   67.008433]  0000000000000009 ffff880059b6fa18 ffffffff81b3164e 
ffff880059b90000
[   67.030783]  0000000000000000 ffff880059b6fa58 ffffffff810c42c2 
ffff880059b6fa78
[   67.053152]  0000000000000010 0000000000000000 0000000000000001 
00000000ffffffff
[   67.075503] Call Trace:
[   67.083051]  [<ffffffff81b3164e>] dump_stack+0x46/0x58
[   67.098651]  [<ffffffff810c42c2>] warn_slowpath_common+0x82/0xb0
[   67.116845]  [<ffffffff810c4305>] warn_slowpath_null+0x15/0x20
[   67.134525]  [<ffffffff81b35207>] xen_register_gsi.part.5+0x40/0xc4
[   67.153507]  [<ffffffff819466fd>] acpi_register_gsi_xen+0x1d/0x20
[   67.171967]  [<ffffffff8103c8ca>] acpi_register_gsi+0xa/0x10
[   67.189124]  [<ffffffff814ebdf7>] acpi_pci_irq_enable+0xed/0x1bd
[   67.207327]  [<ffffffff81499688>] ? pci_bus_read_config_word+0x88/0xa0
[   67.227084]  [<ffffffff81949139>] pcibios_enable_device+0x39/0x40
[   67.245543]  [<ffffffff814a122d>] do_pci_enable_device+0x4d/0x100
[   67.263997]  [<ffffffff814a2116>] pci_enable_device_flags+0xc6/0x120
[   67.283236]  [<ffffffff814a217e>] pci_enable_device+0xe/0x10
[   67.300398]  [<ffffffff814b0ea4>] pcie_port_device_register+0x24/0x4f0
[   67.320156]  [<ffffffff8110176a>] ? lock_release+0x12a/0x240
[   67.337315]  [<ffffffff814b1533>] pcie_portdrv_probe+0x43/0x70
[   67.354994]  [<ffffffff814a3428>] local_pci_probe+0x28/0x70
[   67.371892]  [<ffffffff814a3701>] pci_device_probe+0xd1/0x120
[   67.389316]  [<ffffffff8169f5f8>] driver_probe_device+0xe8/0x340
[   67.407509]  [<ffffffff8169f8f3>] __driver_attach+0xa3/0xb0
[   67.424407]  [<ffffffff8169f850>] ? driver_probe_device+0x340/0x340
[   67.443392]  [<ffffffff8169d7ad>] bus_for_each_dev+0x5d/0xa0
[   67.460550]  [<ffffffff8169f039>] driver_attach+0x19/0x20
[   67.476927]  [<ffffffff8169ec3d>] bus_add_driver+0xfd/0x210
[   67.493829]  [<ffffffff82343237>] ? dmi_pcie_pme_disable_msi+0x1a/0x1a
[   67.513584]  [<ffffffff816a000f>] driver_register+0x5f/0xf0
[   67.530479]  [<ffffffff814a383f>] __pci_register_driver+0x5f/0x70
[   67.548946]  [<ffffffff8234329d>] pcie_portdrv_init+0x66/0x77
[   67.566362]  [<ffffffff810021bb>] do_one_initcall+0x13b/0x1f0
[   67.583782]  [<ffffffff8231528a>] kernel_init_freeable+0x204/0x298
[   67.602503]  [<ffffffff82314971>] ? do_early_param+0x8c/0x8c
[   67.619661]  [<ffffffff810e4e2f>] ? finish_task_switch+0x7f/0xf0
[   67.637858]  [<ffffffff81b27f30>] ? rest_init+0xc0/0xc0
[   67.653721]  [<ffffffff81b27f39>] kernel_init+0x9/0xf0
[   67.669314]  [<ffffffff81b3cefc>] ret_from_fork+0x7c/0xb0
[   67.685697]  [<ffffffff81b27f30>] ? rest_init+0xc0/0xc0
[   67.701561] ---[ end trace c3cbc73c89ab2f64 ]---
[   67.715662] xen: --> pirq=16 -> irq=16 (gsi=16)
(XEN) [2014-09-27 13:04:15.981] IOAPIC[0]: Set PCI routing entry (6-16 -> 0x71 
-> IRQ 16 Mode:1 Active:1)
[   67.757210] pcieport 0000:00:15.0: ?!?!? acpi_pci_irq_enable: PCI INT A -> 
GSI 16 (level, low) -> IRQ/rc 16
[   67.786816] device: '0000:00:15.0:pcie01': device_add
[   67.801916] bus: 'pci_express': add device 0000:00:15.0:pcie01
[   67.819592] PM: Adding info for pci_express:0000:00:15.0:pcie01
[   67.837603] driver: 'pcieport': driver_bound: bound to device '0000:00:15.0'
[   67.858843] bus: 'pci': really_probe: bound device 0000:00:15.0 to driver 
pcieport


The second time it's registered to:
[  104.432241] calling  patch_hdmi_init+0x0/0x12 @ 1
[  104.432243] initcall patch_hdmi_init+0x0/0x12 returned 0 after 0 usecs
[  104.432245] calling  azx_driver_init+0x0/0x1b @ 1
[  104.432249] bus: 'pci': add driver snd_hda_intel
[  104.432267] bus: 'pci': driver_probe_device: matched device 0000:00:14.2 
with driver snd_hda_intel
[  104.432268] bus: 'pci': really_probe: probing driver snd_hda_intel with 
device 0000:00:14.2
[  104.432490] xen: registering gsi 16 triggering 0 polarity 1
[  104.432491] ------------[ cut here ]------------
[  104.432494] WARNING: CPU: 1 PID: 1 at arch/x86/pci/xen.c:131 
xen_register_gsi.part.5+0x40/0xc4()
[  104.432495] Modules linked in:
[  104.432498] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W      
3.17.0-rc6-20140925-vanilla-printk6+ #1
[  104.432499] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[  104.432502]  0000000000000009 ffff880059b6fa78 ffffffff81b3164e 
ffff880059b90000
[  104.432503]  0000000000000000 ffff880059b6fab8 ffffffff810c42c2 
ffff880059b6fad8
[  104.432505]  0000000000000010 0000000000000000 0000000000000001 
00000000ffffffff
[  104.432505] Call Trace:
[  104.432508]  [<ffffffff81b3164e>] dump_stack+0x46/0x58
[  104.432510]  [<ffffffff810c42c2>] warn_slowpath_common+0x82/0xb0
[  104.432512]  [<ffffffff810c4305>] warn_slowpath_null+0x15/0x20
[  104.432514]  [<ffffffff81b35207>] xen_register_gsi.part.5+0x40/0xc4
[  104.432517]  [<ffffffff819466fd>] acpi_register_gsi_xen+0x1d/0x20
[  104.432519]  [<ffffffff8103c8ca>] acpi_register_gsi+0xa/0x10
[  104.432522]  [<ffffffff814ebdf7>] acpi_pci_irq_enable+0xed/0x1bd
[  104.432524]  [<ffffffff81499688>] ? pci_bus_read_config_word+0x88/0xa0
[  104.432526]  [<ffffffff81949139>] pcibios_enable_device+0x39/0x40
[  104.432529]  [<ffffffff814a122d>] do_pci_enable_device+0x4d/0x100
[  104.432530]  [<ffffffff814a2116>] pci_enable_device_flags+0xc6/0x120
[  104.432532]  [<ffffffff814a217e>] pci_enable_device+0xe/0x10
[  104.432533]  [<ffffffff81943f51>] azx_probe+0x81/0x700
[  104.432535]  [<ffffffff814a3428>] local_pci_probe+0x28/0x70
[  104.432536]  [<ffffffff814a3701>] pci_device_probe+0xd1/0x120
[  104.432538]  [<ffffffff8169f5f8>] driver_probe_device+0xe8/0x340
[  104.432540]  [<ffffffff8169f8f3>] __driver_attach+0xa3/0xb0
[  104.432541]  [<ffffffff8169f850>] ? driver_probe_device+0x340/0x340
[  104.432543]  [<ffffffff8169d7ad>] bus_for_each_dev+0x5d/0xa0
[  104.432544]  [<ffffffff8169f039>] driver_attach+0x19/0x20
[  104.432545]  [<ffffffff8169ec3d>] bus_add_driver+0xfd/0x210
[  104.432547]  [<ffffffff8235765a>] ? patch_hdmi_init+0x12/0x12
[  104.432548]  [<ffffffff816a000f>] driver_register+0x5f/0xf0
[  104.432550]  [<ffffffff814a383f>] __pci_register_driver+0x5f/0x70
[  104.432552]  [<ffffffff810021b5>] ? do_one_initcall+0x135/0x1f0
[  104.432554]  [<ffffffff82357673>] azx_driver_init+0x19/0x1b
[  104.432555]  [<ffffffff810021bb>] do_one_initcall+0x13b/0x1f0
[  104.432558]  [<ffffffff8231528a>] kernel_init_freeable+0x204/0x298
[  104.432560]  [<ffffffff82314971>] ? do_early_param+0x8c/0x8c
[  104.432562]  [<ffffffff810e4e2f>] ? finish_task_switch+0x7f/0xf0
[  104.432564]  [<ffffffff81b27f30>] ? rest_init+0xc0/0xc0
[  104.432568]  [<ffffffff81b27f39>] kernel_init+0x9/0xf0
[  104.432570]  [<ffffffff81b3cefc>] ret_from_fork+0x7c/0xb0
[  104.432572]  [<ffffffff81b27f30>] ? rest_init+0xc0/0xc0
[  104.432573] ---[ end trace c3cbc73c89ab2f65 ]---
[  104.432576] ?!?!? xen_register_pirq 1: --> pirq=-30720 -> irq=16 (gsi=16)
[  104.432578] Already setup the GSI :16
[  104.432580] snd_hda_intel 0000:00:14.2: ?!?!? acpi_pci_irq_enable: PCI INT A 
-> GSI 16 (level, low) -> IRQ/rc 16
[  104.432594] driver: 'snd_hda_intel': driver_bound: bound to device 
'0000:00:14.2'
[  104.432815] bus: 'pci': really_probe: bound device 0000:00:14.2 to driver 
snd_hda_intel

But in the lspci output i have:
00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel 
HDA) (rev 40)
        Subsystem: Micro-Star International Co., Ltd. Device 7640
        Flags: bus master, slow devsel, latency 64, IRQ 16
        Memory at fdbf8000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 2
        Kernel driver in use: snd_hda_intel

00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 PCI 
to PCI bridge (PCIE port 0) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot+), MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [b0] Subsystem: Advanced Micro Devices [AMD] nee ATI 
Device 0000
        Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 
<?>
        Kernel driver in use: pcieport

And in tree form:
-[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 Northbridge only 
single slot PCI-e GFX Hydra part
           +-00.2  Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory 
Management Unit (IOMMU)
           +-02.0-[0f]--+-00.0  Advanced Micro Devices [AMD] nee ATI RV620 LE 
[Radeon HD 3450]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI RV620 HDMI 
Audio [Radeon HD 3400 Series]
           +-03.0-[0e]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks 
[Radeon HD 6570]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI 
Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
           +-05.0-[0d]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B 
PCI Express Gigabit Ethernet controller
           +-06.0-[0c]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B 
PCI Express Gigabit Ethernet controller
           +-09.0-[0b]----00.0  NEC Corporation uPD720200 USB 3.0 Host 
Controller
           +-0a.0-[0a]----00.0  Conexant Systems, Inc. Device 8210
           +-0b.0-[09]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks 
[Radeon HD 6570]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI 
Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
           +-0c.0-[05-08]----00.0-[06-08]--+-01.0-[08]----00.0  NEC Corporation 
uPD720200 USB 3.0 Host Controller
           |                               \-02.0-[07]----00.0  Marvell 
Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller
           +-0d.0-[04]----00.0  NEC Corporation uPD720200 USB 3.0 Host 
Controller
           +-11.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA 
Controller [AHCI mode]
           +-12.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
           +-12.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
           +-13.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
           +-13.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
           +-14.0  Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller
           +-14.2  Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
           +-14.3  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC 
host controller
           +-14.4-[03]----06.0  C-Media Electronics Inc CMI8738/CMI8768 PCI 
Audio
           +-14.5  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI2 Controller
           +-15.0-[02]--
           +-16.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
           +-16.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
           +-18.0  Advanced Micro Devices [AMD] Family 10h Processor 
HyperTransport Configuration
           +-18.1  Advanced Micro Devices [AMD] Family 10h Processor Address Map
           +-18.2  Advanced Micro Devices [AMD] Family 10h Processor DRAM 
Controller
           +-18.3  Advanced Micro Devices [AMD] Family 10h Processor 
Miscellaneous Control
           \-18.4  Advanced Micro Devices [AMD] Family 10h Processor Link 
Control



However i still don't see a clear connection with the device 09:00.0, i'm 
passing through and functions, and gives the "irq 16 nobody cared" on shutdown.


--
Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.