[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.



On 18/07/2016 20:26, Sander Eikelenboom wrote:
> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>
>> On 18/07/16 11:21, linux@xxxxxxxxxxxxxx wrote:
>>> Hi Jan,
>>>
>>> It seems that since your patch series starting with commit:
>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>
>>> The shutdown of a guest which has a PCI device passed through which
>>> uses MSI-X interrupts causes
>>> a host crash, see the splat below. Somehow it also doesn't reboot in 5
>>> seconds as it is supposed to (i don't have no-reboot on the command
>>> line).
>>>
>>> -- 
>>> Sander
>>>
>>>
>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64 
>>> debug=y  Not tainted ]----
>>> (XEN) [2016-07-16 16:03:17.069] CPU:    0
>>> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
>>> msixtbl_pt_unregister+0x7b/0xd9
>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
>>> hypervisor (d0v0)
>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
>>> ffff83055c685500   rcx: 0000000000000001
>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
>>> 0000000000001ab0   rdi: ffff8305313b85a0
>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
>>> ffff83009fd07c68   r8:  ffff8305356dfff0
>>> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
>>> ffff830503420c50   r11: 0000000000000282
>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
>>> ffff83009fd07e48   r14: ffff8305313b8000
>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
>>> 0000000080050033   cr4: 00000000000006e0
>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
>>> 0000000000000000
>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
>>> 0000   ss: e010   cs: e008
>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b 0a 0f
>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>> rsp=ffff83009fd07c68:
>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
>>> ffff83009fd07ce8 ffff82d08014c394
>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
>>> 0000000000000293 ffff8305313b80cc
>>> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
>>> ffff83009fd07cd8 ffff83009fd07e38
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
>>> 00007fc25a33e004 ffff8305313b8000
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
>>> 0000000000000000 ffff83053b1191f0
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
>>> ffff82d0801300ae 000000000000000e
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
>>> ffff83009fd07d78 000000020001d17b
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
>>> ffff83009fd07d68 ffff82d080130280
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
>>> 0000000000000202 0000000000000000
>>> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
>>> 0000000000305000 00007fc25a33e004
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
>>> 0000000000000206 0000000000000002
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
>>> 0000000000000cfe 0000000000000002
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
>>> ffff83009fd07e48 ffff82d08019c119
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
>>> ffff83009fd07e38 0000000000000cfe
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
>>> 0000000c00000030 000056082bb90013
>>> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
>>> 0000305600000000 000056082b87465d
>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
>>> 0000000000000000 000056082b8746cf
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
>>> 00007ffe26820740 000056082b8797be
>>> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
>>> 00007ffe26820740 0000000000003056
>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
>>> 00007ffe26820580 ffff88005716d320
>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
>>> msixtbl_pt_unregister+0x7b/0xd9
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
>>> pt_irq_destroy_bind+0x2be/0x3f0
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
>>> arch_do_domctl+0xc77/0x2414
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
>>> do_domctl+0x19db/0x1d26
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
>>> lstar_enter+0xdd/0x137
>>> (XEN) [2016-07-16 16:03:17.070]
>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
>>> ffffffffffffffff
>>> (XEN) [2016-07-16 16:03:18.147]
>>> (XEN) [2016-07-16 16:03:18.155] ****************************************
>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
>>> (XEN) [2016-07-16 16:03:18.233] ****************************************
>>> (XEN) [2016-07-16 16:03:18.252]
>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>
>> Can you paste the disassembly of msixtbl_pt_unregister() please?  That
>> is a dereference of %rdx which is NULL at this point, but I need to
>> figure out which pointer it is supposed to be.
> Hi Andrew,

<snip>

Thanks.  What has happened is that the msixtbl linked list is still
uninitialised at this point.  The only way I can see for this to happen
is that msixtbl_init() hasn't been called, or hasn't passed its first if
condition.  The INIT_LIST_HEAD() visible in the context of the 2nd hunk
of identified changeset is the line of code which changes the list from
0 to initialised, and I don't see anywhere which re-zeros it later.

This alone suggests that the VM in question isn't actually using MSI-X
interrupts, even if the device passed through is capable.

Following the style of the identified changeset,

andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index e418b98..c533719 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
pirq *pirq)
     ASSERT(pcidevs_locked());
     ASSERT(spin_is_locked(&d->event_lock));

-    if ( !has_vlapic(d) )
+    if ( !d->arch.hvm_domain.msixtbl_list.next )
         return;

     irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);

should resolve your issue, although I am very tempted to replace the
opencoded list logic with a msixtbl_initialised() predicate instead.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.