[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq



Tuesday, November 18, 2014, 12:07:41 PM, you wrote:
> Tuesday, November 18, 2014, 3:49:27 AM, you wrote:
>> On Mon, Nov 17, 2014 at 11:40:11PM +0100, Sander Eikelenboom wrote:
>>> 
>>> Monday, November 17, 2014, 9:43:47 PM, you wrote:
>>> 

<BIG SNIP>

>>> 
>>> > I am puzzled by the driver binding twice to the same interrupt, but 
>>> > perhaps that
>>> > is just a buggy driver.
>>> 
>>> Doesn't that happen more often like with integrated USB controllers ?
>>>   17:          4          0          0          0          0          0  
>>> xen-pirq-ioapic-level  ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
>>>   18:       4385          0          0          0          0          0  
>>> xen-pirq-ioapic-level  ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, 
>>> ohci_hcd:usb7

>> That was my thinking too. I passed in all my USB devices that looked
>> like that to my guest but it instead of making them be on the same
>> IRQ line - QEMU put them on seperate IRQ!
>  
>> And even with that I couldn't reproduce this crash.
> Hmm I am now testing with qemu-xen-traditional, i just noticed the output at 
> guest start is different between qemu-xen-traditional and qemu-xen:

> qemu-xen-traditional gives:
> (XEN) [2014-11-18 08:46:33.409] io.c:550: d16: bind: m_gsi=87 g_gsi=36 
> dev=00.00.5 intx=0
> (XEN) [2014-11-18 08:46:33.798] AMD-Vi: Disable: device id = 0x800, domain = 
> 0, paging mode = 3
> (XEN) [2014-11-18 08:46:33.798] AMD-Vi: Setup I/O page table: device id = 
> 0x800, type = 0x1, root table = 0x3fab6a000, domain = 16, paging mode = 3
> (XEN) [2014-11-18 08:46:33.798] AMD-Vi: Re-assign 0000:08:00.0 from dom0 to 
> dom16
> (XEN) [2014-11-18 08:46:34.917] io.c:550: d16: bind: m_gsi=86 g_gsi=40 
> dev=00.00.6 intx=0
> (XEN) [2014-11-18 08:46:34.923] AMD-Vi: Disable: device id = 0xa00, domain = 
> 0, paging mode = 3
> (XEN) [2014-11-18 08:46:34.923] AMD-Vi: Setup I/O page table: device id = 
> 0xa00, type = 0x1, root table = 0x3fab6a000, domain = 16, paging mode = 3
> (XEN) [2014-11-18 08:46:34.923] AMD-Vi: Re-assign 0000:0a:00.0 from dom0 to 
> dom16
> and when the guest is booting it gives:
> (XEN) [2014-11-18 08:47:02.128] io.c:584: d16: unbind: m_gsi=87 g_gsi=36 
> dev=00:00.5 intx=0
> (XEN) [2014-11-18 08:47:02.128] io.c:684: d16 final unmap: m_irq=87 
> dev=00:00.5 intx=0
> (XEN) [2014-11-18 08:47:02.128] io.c:550: d16: bind: m_gsi=37 g_gsi=16 
> dev=00.00.0 intx=0

> with qemu-xen it only gives the first part:
> (XEN) [2014-11-18 10:51:18.481] io.c:550: d16: bind: m_gsi=37 g_gsi=36 
> dev=00.00.5 intx=0
> (XEN) [2014-11-18 10:51:18.889] AMD-Vi: Disable: device id = 0x800, domain = 
> 0, paging mode = 3
> (XEN) [2014-11-18 10:51:18.889] AMD-Vi: Setup I/O page table: device id = 
> 0x800, type = 0x1, root table = 0x5071a6000, domain = 16, paging mode = 3
> (XEN) [2014-11-18 10:51:18.889] AMD-Vi: Re-assign 0000:08:00.0 from dom0 to 
> dom16
> (XEN) [2014-11-18 10:51:20.016] io.c:550: d16: bind: m_gsi=47 g_gsi=40 
> dev=00.00.6 intx=0
> (XEN) [2014-11-18 10:51:20.022] AMD-Vi: Disable: device id = 0xa00, domain = 
> 0, paging mode = 3
> (XEN) [2014-11-18 10:51:20.022] AMD-Vi: Setup I/O page table: device id = 
> 0xa00, type = 0x1, root table = 0x5071a6000, domain = 16, paging mode = 3
> (XEN) [2014-11-18 10:51:20.022] AMD-Vi: Re-assign 0000:0a:00.0 from dom0 to 
> dom16

> Looking at the m_gsi numbers .. could it be "pci_msitranslate=1" is not 
> working for qemu-xen and that this causes this difference in output ?


> Another strange thing i noticed with qemu-xen-traditional ..  after a while 
> the 
> irq number in /proc/interrupts is "stuck"  .. it doesn't increase anymore
>  40:      10851          0          0          0  xen-pirq-ioapic-level  
> cx25821[1]
> however the device still continues to grab video ... 

> I left it running for 2 hours, of which at least 1 hour the number of irq's 
> in /proc/interrupts did
> not change for the legacy irq 40 of the videograbber. 
> The other number of IRQ's in /proc/interrupts do keep increasing (also for 
> the passed
> through USB device which enabled MSI-X). 
> There is no crash and no debug output or errors in xl dmesg or guest dmesg 
> and the device was
> still working until shutdown. 
> This is not good for one's sanity .. :-)

I have to amend this one, the videograbber was still working, however it seemed 
it had some timing issues in the video stream. So it was working, but less than 
with qemu-xen.
I don't know if that due to the interrupt count anomaly or that it's something 
else related 
to qemu-xen-traditional (irrespective of the dpci-patches and this issue).

>> Anyhow I was wondering if you could send (or point me to)
>> your xen-syms file(s). I've also attached an extra debug code that
>> should give me an idea if the crash/issue shows up in certain
>> situations - when we have_two_entries to deal with on one CPU.

I have xen-syms available with your latest patch for 
both with and without the #define DIFF_LIST 1 in a tarball at:
http://www.eikelenboom.it/xen-syms.tar.gz

>> It should apply cleanly on top of the other one.

> This one included your previous debug patch, so i had to revert that one,
> than it applied cleanly, so no problem !

>> Oh, and the xen-syms  - it can be either before this patch or
>> after - it won't matter much as I will be looking at the
>> assembler code.

>> Also what version of GCC compiler are you using ?

> # gcc -v
> Using built-in specs.
> COLLECT_GCC=/usr/bin/gcc-4.7.real
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' 
> --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs 
> --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr 
> --program-suffix=-4.7 --enable-shared --enable-linker-build-id 
> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 
> --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
> --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
> --enable-gnu-unique-object --enable-plugin --enable-objc-gc 
> --with-arch-32=i586 --with-tune=generic --enable-checking=release 
> --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 4.7.2 (Debian 4.7.2-5)

> That's default Debian wheezy/stable.

>> And lastly, the code also has an #ifdef DIFF_LIST - if you
>> want to turn that on (just add #define DIFF_LIST 1 at the top of 
>> the file) - it might stop the crash. Or not  :-(

>> If it does stop the crash then I think we are looking at an
>> GCC bug - in which case the xen-syms of that build (with
>> the DIFF_LIST) would also be interesting!

> Will give this patch with and without the #define DIFF_LIST 1 a shot with 
> qemu-xen and report back.

Well 3 results here:

Without #define DIFF_LIST 1:
1) The guest still crashes (xl-dmesg-not-defined.txt)

With #define DIFF_LIST 1:
2) Nor the guest or the host crashed (let it run for about an hour and 15 
minutes), 
   but the USB XHCI driver bailed out quickly after guest boot, so there were 
no MSI-X interrupts anymore.
   (xl-dmesg-defined-nousb.txt, dmesg-guest-defined-nousb.txt)
3) On another boot the USB XHCI didn't bail out, after a while the host 
crashes. (serial.log)
   (XEN) [2014-11-18 14:53:37.364] RIP:    e008:[<ffff82d08014a4de>] 
hvm_do_IRQ_dpci+0xf4/0x131
   which resolves to:
   # addr2line -e xen-syms ffff82d08014a4de
   /usr/src/new/xen-unstable-vanilla/xen/include/xen/list.h:67
   which is:
   static inline void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next)
    {
    next->prev = new;
    new->next = next;
    new->prev = prev;
Here ->    prev->next = new;
    }

When i look at the combination of (2) and (3), It seems it could be an 
interaction between the two passed through devices and/or different IRQ types.

So i will now test without #define DIFF_LIST 1 and not passing through the USB 
controller, see
if that still crashes, if it doesn't i will see if i can passthrough a device 
which also only uses legacy
interrupts instead of MSI / MSI-X, see if that crashes or not.

>> Thank you.

Attachment: dmesg-guest-defined-nousb.txt
Description: Text document

Attachment: serial.log
Description: Binary data

Attachment: xl-dmesg-defined-nousb.txt
Description: Text document

Attachment: xl-dmesg-not-defined.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.