[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] pci-passthrough on dom0 kernel versions above 3.8 crashes dom0



>>> On 04.10.13 at 09:44, Kristoffer Egefelt <kristoffer@xxxxxxx> wrote:
> Hi,
> 
> I'm trying to pass through a NIC (intel X520 with ixgbevf driver) to domU, 
> but since kernel 3.8 this has not worked.
> 
> The dom0 kernel seems to cause the problem.
> Xen version, domU kernel version and driver version seems to be unrelated to 
> this bug, meaning
> it works as long as dom0 kernel is 3.8.
> I tried kernel version 3.9, 3.10 and 3.11 - all show the same bug pattern 
> when used as dom0.
> 
> The BUG appears on xl pci attach.
> On pci detach the dom0 panics.
> 
> I have attached logs from a working setup (kernel 3.8) and from a setup not 
> working (kernel 3.11) and also the kernel config for 3.11.
> 
> In short, this is what domU logs after pci attach:
> 
>  BUG: unable to handle kernel paging request at ffffc9000030200c
>  IP: [<ffffffff81205812>] __msix_mask_irq+0x21/0x24
>  PGD 75a40067 PUD 75a41067 PMD 75b44067 PTE 8010000000000464
>  Oops: 0003 [#1] SMP 
>  Modules linked in: ixgbevf(+) xen_pcifront nfnetlink_log nfnetlink ipt_ULOG 
> x_tables x86_pkg_temp_thermal thermal_sys coretemp crc32c_intel 
> ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul 
> glue_helper microcode ext4 crc16 jbd2 mbcache xen_blkfront
>  CPU: 0 PID: 2122 Comm: modprobe Not tainted 3.11.3-kernel-v1.0.0.21+ #1

Are you certain this is kernel (rather than hypervisor) version
dependent? Iirc this is a manifestation of a guest kernel not being
permitted to write to the MSI-X mask bit.

> And this is dom0 on pci detach:
> 
> (XEN) Assertion '_raw_spin_is_locked(lock)' failed at 
> /usr/src/xen/xen/include/asm/spinlock.h:16402
> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82d0801258ef>] _spin_unlock_irqrestore+0x27/0x32
> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: ffff83201ba07724   rcx: 0000000000000001
> (XEN) rdx: ffff83201bb97020   rsi: 0000000000000286   rdi: ffff83201ba07724
> (XEN) rbp: ffff83203ffcfdd8   rsp: ffff83203ffcfdd8   r8:  ffff8141002000e0
> (XEN) r9:  000000000000001c   r10: 0000000000000082   r11: 0000000000000001
> (XEN) r12: 0000000000000000   r13: ffff8320e13c8240   r14: ffff880148047df4
> (XEN) r15: 0000000000000286   cr0: 0000000080050033   cr4: 00000000000426f0
> (XEN) cr3: 000000206f3ff000   cr2: 00007fa5ec560c49
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83203ffcfdd8:
> (XEN)    ffff83203ffcfe68 ffff82d080166a4f ffff83203ffcfe18 0000000280118988
> (XEN)    0000000000000cfe 0000000000000cfe ffff832015d3b8a0 ffff8320e13c83f0
> (XEN)    ffff832015d3b880 0000000000000001 00000000fee00678 0000000000000000
> (XEN)    ffff83200000f800 000000000000001b ffff8300bcef5000 ffffffffffffffed
> (XEN)    ffff880148047df4 ffffffff814530e0 ffff83203ffcfef8 ffff82d08017dee4
> (XEN)    ffff832000000002 0000000000000008 ffff83203ffcfef8 ffff82d000a0fb00
> (XEN)    0000000000000000 ffffffff93010000 ffff82d0802e8000 ffff83203ffc80ef
> (XEN)    82d080222c00b948 c390ef66d1ffffff ffff83203ffcfef8 ffff8300bcef5000
> (XEN)    ffff880145951868 ffff880145bb2a60 ffff880148047f50 ffffffff814530e0
> (XEN)    00007cdfc00300c7 ffff82d08022213b ffffffff8100142a 0000000000000021
> (XEN)    ffffffff814530e0 ffff880148047f50 000000000000c002 0000000000009300
> (XEN)    ffff88013faf1a80 ffff880145951000 0000000000000202 0000000000000093
> (XEN)    ffff880148047df4 0000000000000002 0000000000000021 ffffffff8100142a
> (XEN)    0000000000000000 ffff880148047df4 000000000000001b 0001010000000000
> (XEN)    ffffffff8100142a 000000000000e033 0000000000000202 ffff880148047dc8
> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000001 ffff8300bcef5000 0000004f9b885e00
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801258ef>] _spin_unlock_irqrestore+0x27/0x32
> (XEN)    [<ffff82d080166a4f>] pci_restore_msi_state+0x1c9/0x2f0
> (XEN)    [<ffff82d08017dee4>] do_physdev_op+0xe4f/0x114f
> (XEN)    [<ffff82d08022213b>] syscall_enter+0xeb/0x145
> (XEN)    
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion '_raw_spin_is_locked(lock)' failed at 
> /usr/src/xen/xen/include/asm/spinlock.h:16402
> (XEN) ****************************************
> (XEN) 
> (XEN) Manual reset required ('noreboot' specified)

This, otoh, is clearly a hypervisor bug. Afaict the patch below
should help.

But - this code is supposed to be executed on host S3 resume only
(i.e. there might also be some kernel flaw involved here).

Jan

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1158,11 +1158,11 @@ int pci_restore_msi_state(struct pci_dev
         for ( i = 0; ; )
         {
             msi_set_mask_bit(desc, entry[i].msi_attrib.masked);
-            spin_unlock_irqrestore(&desc->lock, flags);
 
             if ( !--nr )
                 break;
 
+            spin_unlock_irqrestore(&desc->lock, flags);
             desc = &irq_desc[entry[++i].irq];
             spin_lock_irqsave(&desc->lock, flags);
             if ( desc->msi_desc != entry + i )


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.