On 04/10/13 08:44, Kristoffer Egefelt wrote:

I'm trying to pass through a NIC (intel X520 with ixgbevf driver) to domU, but since kernel 3.8 this has not worked.

The dom0 kernel seems to cause the problem.
Xen version, domU kernel version and driver version seems to be unrelated to this bug, meaning
it works as long as dom0 kernel is 3.8.
I tried kernel version 3.9, 3.10 and 3.11 - all show the same bug pattern when used as dom0.

The BUG appears on xl pci attach.
On pci detach the dom0 panics.

I have attached logs from a working setup (kernel 3.8) and from a setup not working (kernel 3.11) and also the kernel config for 3.11.

In short, this is what domU logs after pci attach:

 BUG: unable to handle kernel paging request at ffffc9000030200c
 IP: [<ffffffff81205812>] __msix_mask_irq+0x21/0x24
 PGD 75a40067 PUD 75a41067 PMD 75b44067 PTE 8010000000000464
 Oops: 0003 [#1] SMP
 Modules linked in: ixgbevf(+) xen_pcifront nfnetlink_log nfnetlink ipt_ULOG x_tables x86_pkg_temp_thermal thermal_sys coretemp crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper microcode ext4 crc16 jbd2 mbcache xen_blkfront
 CPU: 0 PID: 2122 Comm: modprobe Not tainted 3.11.3-kernel-v1.0.0.21+ #1

And this is dom0 on pci detach:

(XEN) Assertion '_raw_spin_is_locked(lock)' failed at /usr/src/xen/xen/include/asm/spinlock.h:16402
(XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d0801258ef>] _spin_unlock_irqrestore+0x27/0x32
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: ffff83201ba07724   rcx: 0000000000000001
(XEN) rdx: ffff83201bb97020   rsi: 0000000000000286   rdi: ffff83201ba07724
(XEN) rbp: ffff83203ffcfdd8   rsp: ffff83203ffcfdd8   r8:  ffff8141002000e0
(XEN) r9:  000000000000001c   r10: 0000000000000082   r11: 0000000000000001
(XEN) r12: 0000000000000000   r13: ffff8320e13c8240   r14: ffff880148047df4
(XEN) r15: 0000000000000286   cr0: 0000000080050033   cr4: 00000000000426f0
(XEN) cr3: 000000206f3ff000   cr2: 00007fa5ec560c49
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83203ffcfdd8:
(XEN)    ffff83203ffcfe68 ffff82d080166a4f ffff83203ffcfe18 0000000280118988
(XEN)    0000000000000cfe 0000000000000cfe ffff832015d3b8a0 ffff8320e13c83f0
(XEN)    ffff832015d3b880 0000000000000001 00000000fee00678 0000000000000000
(XEN)    ffff83200000f800 000000000000001b ffff8300bcef5000 ffffffffffffffed
(XEN)    ffff880148047df4 ffffffff814530e0 ffff83203ffcfef8 ffff82d08017dee4
(XEN)    ffff832000000002 0000000000000008 ffff83203ffcfef8 ffff82d000a0fb00
(XEN)    0000000000000000 ffffffff93010000 ffff82d0802e8000 ffff83203ffc80ef
(XEN)    82d080222c00b948 c390ef66d1ffffff ffff83203ffcfef8 ffff8300bcef5000
(XEN)    ffff880145951868 ffff880145bb2a60 ffff880148047f50 ffffffff814530e0
(XEN)    00007cdfc00300c7 ffff82d08022213b ffffffff8100142a 0000000000000021
(XEN)    ffffffff814530e0 ffff880148047f50 000000000000c002 0000000000009300
(XEN)    ffff88013faf1a80 ffff880145951000 0000000000000202 0000000000000093
(XEN)    ffff880148047df4 0000000000000002 0000000000000021 ffffffff8100142a
(XEN)    0000000000000000 ffff880148047df4 000000000000001b 0001010000000000
(XEN)    ffffffff8100142a 000000000000e033 0000000000000202 ffff880148047dc8
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000001 ffff8300bcef5000 0000004f9b885e00
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0801258ef>] _spin_unlock_irqrestore+0x27/0x32
(XEN)    [<ffff82d080166a4f>] pci_restore_msi_state+0x1c9/0x2f0
(XEN)    [<ffff82d08017dee4>] do_physdev_op+0xe4f/0x114f
(XEN)    [<ffff82d08022213b>] syscall_enter+0xeb/0x145
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '_raw_spin_is_locked(lock)' failed at /usr/src/xen/xen/include/asm/spinlock.h:16402
(XEN) ****************************************
(XEN) Manual reset required ('noreboot' specified)

This Xen crash appears to be as a result of d1b6d0a0248 "x86: enable multi-vector MSI", and is likely a mishanding of some error state caused by the initial pci attach failure/partial setup.

Can you printk() in xen/arch/x86/msi.c:pci_restore_msi_state() just before the list_for_each_entry_safe() and work out which is the problematic pci device, then run `xl debug-keys iQM` and provide the output.  Perhaps an lspci of the affected device might also help.

