[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 11/04/15 15:11, Sander Eikelenboom wrote: > Friday, April 10, 2015, 8:55:27 PM, you wrote: > >> On 10/04/15 11:24, Sander Eikelenboom wrote: >>> Hi Andrew, >>> >>> Finally got some time to figure this out .. and i have narrowed it down to: >>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git >>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the ioreq-server >>> API when available" >>> A straight revert of this commit prevents the issue from happening. >>> >>> The reason i had a hard time figuring this out was: >>> - I wasn't aware of this earlier, since git pulling the main xen tree, >>> doesn't >>> auto update the qemu-* trees. >> This has caught me out so many times. It is very non-obvious behaviour. >>> - So i happen to get this when i cloned a fresh tree to try to figure out >>> the >>> other issue i was seeing. >>> - After that checking out previous versions of the main xen tree didn't >>> resolve >>> this new issue, because the qemu tree doesn't get auto updated and is set >>> "master". >>> - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific >>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not >>> master. >>> >>> *sigh* >>> >>> This is tested with xen main tree at last commit >>> 3a28f760508fb35c430edac17a9efde5aff6d1d5 >>> (normal xen-unstable, not the staging branch) >>> >>> Ok so i have added some extra debug info (see attached diff) and this is >>> the >>> output when it crashes due to something the commit above triggered, the >>> level is out of bounds and the pfn looks fishy too. >>> Complete serial log from both bad and good (specific commit reverted) are >>> attached. >> Just to confirm, you are positively identifying a qemu changeset as >> causing this crash? >> If so, the qemu change has discovered a pre-existing issue in the >> toolstack pci-passthrough interface. Whatever qemu is or isn't doing, >> it should not be able to cause a crash like this. >> With this in mind, I need to brush up on my AMD-Vi details. >> In the meantime, can you run with the following patch to identify what >> is going on, domctl wise? I assume it is the assign_device which is >> failing, but it will be nice to observe the differences between the >> working and failing case, which might offer a hint. > Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: > > (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable x86_64 debug=y > Tainted: C ]---- > (XEN) [2015-04-11 14:03:31.857] CPU: 5 > (XEN) [2015-04-11 14:03:31.868] RIP: e008:[<ffff82d08014c52c>] > iommu_do_pci_domctl+0x2dc/0x740 > (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256 CONTEXT: hypervisor > (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008 rbx: 0000000000000800 > rcx: ffffffffffebe5ed > (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800 rsi: 0000000000000000 > rdi: ffff830256ef7e38 > (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98 rsp: ffff830256ef7c08 > r8: 00000000deadbeef > (XEN) [2015-04-11 14:03:31.995] r9: 00000000deadbeef r10: ffff82d08024e500 > r11: 0000000000000282 > (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000 r13: 0000000000000008 > r14: 0000000000000000 > (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000 cr0: 0000000080050033 > cr4: 00000000000006f0 > (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000 cr2: 0000000000000000 > (XEN) [2015-04-11 14:03:32.096] ds: 0000 es: 0000 fs: 0000 gs: 0000 > ss: e010 cs: e008 > (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=ffff830256ef7c08: > (XEN) [2015-04-11 14:03:32.141] ffff830256ef7c78 ffff82d08012c178 > ffff830256ef7c28 ffff830256ef7c28 > (XEN) [2015-04-11 14:03:32.168] 0000000000000010 0000000000000000 > 0000000000000000 0000000000000000 > (XEN) [2015-04-11 14:03:32.195] 00000000000006f0 00007fe300000000 > ffff830256eb7790 ffff83025cc6d300 > (XEN) [2015-04-11 14:03:32.222] ffff82d080330c60 00007fe396bab004 > 0000000000000000 00007fe396bab004 > (XEN) [2015-04-11 14:03:32.249] 0000000000000000 0000000000000005 > ffff830256ef7ca8 ffff82d08014900b > (XEN) [2015-04-11 14:03:32.276] ffff830256ef7d98 ffff82d080161f2d > 0000000000000010 0000000000000000 > (XEN) [2015-04-11 14:03:32.303] 0000000000000000 ffff830256ef7ce8 > ffff82d08018b655 ffff830256ef7d48 > (XEN) [2015-04-11 14:03:32.330] ffff830256ef7cf8 ffff82d08018b66a > ffff830256ef7d38 ffff82d08012925e > (XEN) [2015-04-11 14:03:32.357] ffff830256efc068 0000000800000001 > 800000022e12c167 0000000000000000 > (XEN) [2015-04-11 14:03:32.384] 0000000000000002 ffff830256ef7e38 > 0000000800000000 800000022e12c167 > (XEN) [2015-04-11 14:03:32.411] 0000000000000003 ffff830256ef7db8 > 0000000000000000 00007fe396780eb0 > (XEN) [2015-04-11 14:03:32.439] 0000000000000202 ffffffffffffffff > 0000000000000000 00007fe396bab004 > (XEN) [2015-04-11 14:03:32.466] 0000000000000000 0000000000000005 > ffff830256ef7ef8 ffff82d08010497f > (XEN) [2015-04-11 14:03:32.493] 0000000000000001 0000000000100001 > 800000022e12c167 ffff88001f7ecc00 > (XEN) [2015-04-11 14:03:32.520] 00007fe396780eb0 ffff88001c849508 > 0000000e00000007 ffffffff8105594a > (XEN) [2015-04-11 14:03:32.547] 000000000000e033 0000000000000202 > ffff88001ece3d40 000000000000e02b > (XEN) [2015-04-11 14:03:32.574] ffff830256ef7e28 ffff82d080194933 > 000000000000beef ffffffff81bd6c85 > (XEN) [2015-04-11 14:03:32.601] ffff830256ef7f08 ffff82d080193edd > 0000000b0000002d 0000000000000001 > (XEN) [2015-04-11 14:03:32.628] 0000000100000800 00007fe3962abbd0 > ffff000a81050001 00007fe39656ce6e > (XEN) [2015-04-11 14:03:32.655] 00007ffdf2a654f0 00007fe39656d0c9 > 00007fe39656ce6e 00007fe3969a9a55 > (XEN) [2015-04-11 14:03:32.682] Xen call trace: > (XEN) [2015-04-11 14:03:32.695] [<ffff82d08014c52c>] > iommu_do_pci_domctl+0x2dc/0x740 > (XEN) [2015-04-11 14:03:32.718] [<ffff82d08014900b>] > iommu_do_domctl+0x17/0x1a > (XEN) [2015-04-11 14:03:32.739] [<ffff82d080161f2d>] > arch_do_domctl+0x2469/0x26e1 > (XEN) [2015-04-11 14:03:32.762] [<ffff82d08010497f>] > do_domctl+0x1a1f/0x1d60 > (XEN) [2015-04-11 14:03:32.783] [<ffff82d080234c6b>] > syscall_enter+0xeb/0x145 > (XEN) [2015-04-11 14:03:32.804] > (XEN) [2015-04-11 14:03:32.813] Pagetable walk from 0000000000000000: > (XEN) [2015-04-11 14:03:32.831] L4[0x000] = 0000000234075067 000000000001f2a8 > (XEN) [2015-04-11 14:03:32.852] L3[0x000] = 0000000229ad4067 0000000000014c49 > (XEN) [2015-04-11 14:03:32.873] L2[0x000] = 0000000000000000 > ffffffffffffffff > (XEN) [2015-04-11 14:03:32.894] > (XEN) [2015-04-11 14:03:32.903] **************************************** > (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5: > (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT > (XEN) [2015-04-11 14:03:32.948] [error_code=0000] > (XEN) [2015-04-11 14:03:32.961] Faulting linear address: 0000000000000000 > (XEN) [2015-04-11 14:03:32.981] **************************************** > (XEN) [2015-04-11 14:03:33.000] > (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds... > > The RIP resolves to the prink added by your patch in: > > case XEN_DOMCTL_test_assign_device: > ret = xsm_test_assign_device(XSM_HOOK, > domctl->u.assign_device.machine_sbdf); > if ( ret ) > break; > > seg = domctl->u.assign_device.machine_sbdf >> 16; > bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; > devfn = domctl->u.assign_device.machine_sbdf & 0xff; > > printk("*** %pv->d%d: test_assign_device({%04x:%02x:%02x.%u})\n", > current, d->domain_id, > seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); > > if ( device_assigned(seg, bus, devfn) ) > { > printk(XENLOG_G_INFO > "%04x:%02x:%02x.%u already assigned, or non-existent\n", > seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); > ret = -EINVAL; > } > break; hmm - 'd' is NULL. This ought to work better. diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 9f3413c..85ff1fc 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl( max_sdevs = domctl->u.get_device_group.max_sdevs; sdevs = domctl->u.get_device_group.sdev_array; + printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u, %u})\n", + current, d->domain_id, + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), + max_sdevs); + ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, max_sdevs); if ( ret < 0 ) { @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl( bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; devfn = domctl->u.assign_device.machine_sbdf & 0xff; + printk("*** %pv: test_assign_device({%04x:%02x:%02x.%u})\n", + current, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); + if ( device_assigned(seg, bus, devfn) ) { printk(XENLOG_G_INFO @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl( bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; devfn = domctl->u.assign_device.machine_sbdf & 0xff; + printk("*** %pv->d%d: assign_device({%04x:%02x:%02x.%u})\n", + current, d->domain_id, + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); + ret = device_assigned(seg, bus, devfn) ?: assign_device(d, seg, bus, devfn); if ( ret == -ERESTART ) @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl( bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; devfn = domctl->u.assign_device.machine_sbdf & 0xff; + printk("*** %pv->d%d: deassign_device({%04x:%02x:%02x.%u})\n", + current, d->domain_id, + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); + spin_lock(&pcidevs_lock); ret = deassign_device(d, seg, bus, devfn); spin_unlock(&pcidevs_lock); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |