Xen project Mailing List

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>

Date: Sat, 11 Apr 2015 19:25:19 +0200

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Sat, 11 Apr 2015 17:25:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Saturday, April 11, 2015, 6:38:17 PM, you wrote: > On 11/04/15 17:32, Andrew Cooper wrote: >> On 11/04/15 17:21, Sander Eikelenboom wrote: >>> Saturday, April 11, 2015, 4:21:56 PM, you wrote: >>> >>>> On 11/04/15 15:11, Sander Eikelenboom wrote: >>>>> Friday, April 10, 2015, 8:55:27 PM, you wrote: >>>>> >>>>>> On 10/04/15 11:24, Sander Eikelenboom wrote: >>>>>>> Hi Andrew, >>>>>>> >>>>>>> Finally got some time to figure this out .. and i have narrowed it down >>>>>>> to: >>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git >>>>>>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the >>>>>>> ioreq-server API when available" >>>>>>> A straight revert of this commit prevents the issue from happening. >>>>>>> >>>>>>> The reason i had a hard time figuring this out was: >>>>>>> - I wasn't aware of this earlier, since git pulling the main xen tree, >>>>>>> doesn't >>>>>>> auto update the qemu-* trees. >>>>>> This has caught me out so many times. It is very non-obvious behaviour. >>>>>>> - So i happen to get this when i cloned a fresh tree to try to figure >>>>>>> out the >>>>>>> other issue i was seeing. >>>>>>> - After that checking out previous versions of the main xen tree didn't >>>>>>> resolve >>>>>>> this new issue, because the qemu tree doesn't get auto updated and is >>>>>>> set >>>>>>> "master". >>>>>>> - Cloning a xen-stable-4.5.0 made it go away .. because that has a >>>>>>> specific >>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is >>>>>>> not >>>>>>> master. >>>>>>> >>>>>>> *sigh* >>>>>>> >>>>>>> This is tested with xen main tree at last commit >>>>>>> 3a28f760508fb35c430edac17a9efde5aff6d1d5 >>>>>>> (normal xen-unstable, not the staging branch) >>>>>>> >>>>>>> Ok so i have added some extra debug info (see attached diff) and this >>>>>>> is the >>>>>>> output when it crashes due to something the commit above triggered, the >>>>>>> level is out of bounds and the pfn looks fishy too. >>>>>>> Complete serial log from both bad and good (specific commit reverted) >>>>>>> are >>>>>>> attached. >>>>>> Just to confirm, you are positively identifying a qemu changeset as >>>>>> causing this crash? >>>>>> If so, the qemu change has discovered a pre-existing issue in the >>>>>> toolstack pci-passthrough interface. Whatever qemu is or isn't doing, >>>>>> it should not be able to cause a crash like this. >>>>>> With this in mind, I need to brush up on my AMD-Vi details. >>>>>> In the meantime, can you run with the following patch to identify what >>>>>> is going on, domctl wise? I assume it is the assign_device which is >>>>>> failing, but it will be nice to observe the differences between the >>>>>> working and failing case, which might offer a hint. >>>>> Hrrm with your patch i end up with a fatal page fault in >>>>> iommu_do_pci_domctl: >>>>> >>>>> (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable x86_64 debug=y >>>>> Tainted: C ]---- >>>>> (XEN) [2015-04-11 14:03:31.857] CPU: 5 >>>>> (XEN) [2015-04-11 14:03:31.868] RIP: e008:[<ffff82d08014c52c>] >>>>> iommu_do_pci_domctl+0x2dc/0x740 >>>>> (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256 CONTEXT: >>>>> hypervisor >>>>> (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008 rbx: >>>>> 0000000000000800 rcx: ffffffffffebe5ed >>>>> (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800 rsi: >>>>> 0000000000000000 rdi: ffff830256ef7e38 >>>>> (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98 rsp: >>>>> ffff830256ef7c08 r8: 00000000deadbeef >>>>> (XEN) [2015-04-11 14:03:31.995] r9: 00000000deadbeef r10: >>>>> ffff82d08024e500 r11: 0000000000000282 >>>>> (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000 r13: >>>>> 0000000000000008 r14: 0000000000000000 >>>>> (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000 cr0: >>>>> 0000000080050033 cr4: 00000000000006f0 >>>>> (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000 cr2: >>>>> 0000000000000000 >>>>> (XEN) [2015-04-11 14:03:32.096] ds: 0000 es: 0000 fs: 0000 gs: 0000 >>>>> ss: e010 cs: e008 >>>>> (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=ffff830256ef7c08: >>>>> (XEN) [2015-04-11 14:03:32.141] ffff830256ef7c78 ffff82d08012c178 >>>>> ffff830256ef7c28 ffff830256ef7c28 >>>>> (XEN) [2015-04-11 14:03:32.168] 0000000000000010 0000000000000000 >>>>> 0000000000000000 0000000000000000 >>>>> (XEN) [2015-04-11 14:03:32.195] 00000000000006f0 00007fe300000000 >>>>> ffff830256eb7790 ffff83025cc6d300 >>>>> (XEN) [2015-04-11 14:03:32.222] ffff82d080330c60 00007fe396bab004 >>>>> 0000000000000000 00007fe396bab004 >>>>> (XEN) [2015-04-11 14:03:32.249] 0000000000000000 0000000000000005 >>>>> ffff830256ef7ca8 ffff82d08014900b >>>>> (XEN) [2015-04-11 14:03:32.276] ffff830256ef7d98 ffff82d080161f2d >>>>> 0000000000000010 0000000000000000 >>>>> (XEN) [2015-04-11 14:03:32.303] 0000000000000000 ffff830256ef7ce8 >>>>> ffff82d08018b655 ffff830256ef7d48 >>>>> (XEN) [2015-04-11 14:03:32.330] ffff830256ef7cf8 ffff82d08018b66a >>>>> ffff830256ef7d38 ffff82d08012925e >>>>> (XEN) [2015-04-11 14:03:32.357] ffff830256efc068 0000000800000001 >>>>> 800000022e12c167 0000000000000000 >>>>> (XEN) [2015-04-11 14:03:32.384] 0000000000000002 ffff830256ef7e38 >>>>> 0000000800000000 800000022e12c167 >>>>> (XEN) [2015-04-11 14:03:32.411] 0000000000000003 ffff830256ef7db8 >>>>> 0000000000000000 00007fe396780eb0 >>>>> (XEN) [2015-04-11 14:03:32.439] 0000000000000202 ffffffffffffffff >>>>> 0000000000000000 00007fe396bab004 >>>>> (XEN) [2015-04-11 14:03:32.466] 0000000000000000 0000000000000005 >>>>> ffff830256ef7ef8 ffff82d08010497f >>>>> (XEN) [2015-04-11 14:03:32.493] 0000000000000001 0000000000100001 >>>>> 800000022e12c167 ffff88001f7ecc00 >>>>> (XEN) [2015-04-11 14:03:32.520] 00007fe396780eb0 ffff88001c849508 >>>>> 0000000e00000007 ffffffff8105594a >>>>> (XEN) [2015-04-11 14:03:32.547] 000000000000e033 0000000000000202 >>>>> ffff88001ece3d40 000000000000e02b >>>>> (XEN) [2015-04-11 14:03:32.574] ffff830256ef7e28 ffff82d080194933 >>>>> 000000000000beef ffffffff81bd6c85 >>>>> (XEN) [2015-04-11 14:03:32.601] ffff830256ef7f08 ffff82d080193edd >>>>> 0000000b0000002d 0000000000000001 >>>>> (XEN) [2015-04-11 14:03:32.628] 0000000100000800 00007fe3962abbd0 >>>>> ffff000a81050001 00007fe39656ce6e >>>>> (XEN) [2015-04-11 14:03:32.655] 00007ffdf2a654f0 00007fe39656d0c9 >>>>> 00007fe39656ce6e 00007fe3969a9a55 >>>>> (XEN) [2015-04-11 14:03:32.682] Xen call trace: >>>>> (XEN) [2015-04-11 14:03:32.695] [<ffff82d08014c52c>] >>>>> iommu_do_pci_domctl+0x2dc/0x740 >>>>> (XEN) [2015-04-11 14:03:32.718] [<ffff82d08014900b>] >>>>> iommu_do_domctl+0x17/0x1a >>>>> (XEN) [2015-04-11 14:03:32.739] [<ffff82d080161f2d>] >>>>> arch_do_domctl+0x2469/0x26e1 >>>>> (XEN) [2015-04-11 14:03:32.762] [<ffff82d08010497f>] >>>>> do_domctl+0x1a1f/0x1d60 >>>>> (XEN) [2015-04-11 14:03:32.783] [<ffff82d080234c6b>] >>>>> syscall_enter+0xeb/0x145 >>>>> (XEN) [2015-04-11 14:03:32.804] >>>>> (XEN) [2015-04-11 14:03:32.813] Pagetable walk from 0000000000000000: >>>>> (XEN) [2015-04-11 14:03:32.831] L4[0x000] = 0000000234075067 >>>>> 000000000001f2a8 >>>>> (XEN) [2015-04-11 14:03:32.852] L3[0x000] = 0000000229ad4067 >>>>> 0000000000014c49 >>>>> (XEN) [2015-04-11 14:03:32.873] L2[0x000] = 0000000000000000 >>>>> ffffffffffffffff >>>>> (XEN) [2015-04-11 14:03:32.894] >>>>> (XEN) [2015-04-11 14:03:32.903] **************************************** >>>>> (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5: >>>>> (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT >>>>> (XEN) [2015-04-11 14:03:32.948] [error_code=0000] >>>>> (XEN) [2015-04-11 14:03:32.961] Faulting linear address: 0000000000000000 >>>>> (XEN) [2015-04-11 14:03:32.981] **************************************** >>>>> (XEN) [2015-04-11 14:03:33.000] >>>>> (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds... >>>>> >>>>> The RIP resolves to the prink added by your patch in: >>>>> >>>>> case XEN_DOMCTL_test_assign_device: >>>>> ret = xsm_test_assign_device(XSM_HOOK, >>>>> domctl->u.assign_device.machine_sbdf); >>>>> if ( ret ) >>>>> break; >>>>> >>>>> seg = domctl->u.assign_device.machine_sbdf >> 16; >>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>> >>>>> printk("*** %pv->d%d: test_assign_device({%04x:%02x:%02x.%u})\n", >>>>> current, d->domain_id, >>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>> >>>>> if ( device_assigned(seg, bus, devfn) ) >>>>> { >>>>> printk(XENLOG_G_INFO >>>>> "%04x:%02x:%02x.%u already assigned, or >>>>> non-existent\n", >>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>> ret = -EINVAL; >>>>> } >>>>> break; >>>> hmm - 'd' is NULL. This ought to work better. >>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >>>> index 9f3413c..85ff1fc 100644 >>>> --- a/xen/drivers/passthrough/pci.c >>>> +++ b/xen/drivers/passthrough/pci.c >>>> @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl( >>>> max_sdevs = domctl->u.get_device_group.max_sdevs; >>>> sdevs = domctl->u.get_device_group.sdev_array; >>>> >>>> + printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u, >>>> %u})\n", >>>> + current, d->domain_id, >>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), >>>> + max_sdevs); >>>> + >>>> ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, >>>> max_sdevs); >>>> if ( ret < 0 ) >>>> { >>>> @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl( >>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>> >>>> + printk("*** %pv: test_assign_device({%04x:%02x:%02x.%u})\n", >>>> + current, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>> + >>>> if ( device_assigned(seg, bus, devfn) ) >>>> { >>>> printk(XENLOG_G_INFO >>>> @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl( >>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>> >>>> + printk("*** %pv->d%d: assign_device({%04x:%02x:%02x.%u})\n", >>>> + current, d->domain_id, >>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>> + >>>> ret = device_assigned(seg, bus, devfn) ?: >>>> assign_device(d, seg, bus, devfn); >>>> if ( ret == -ERESTART ) >>>> @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl( >>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>> >>>> + printk("*** %pv->d%d: deassign_device({%04x:%02x:%02x.%u})\n", >>>> + current, d->domain_id, >>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>> + >>>> spin_lock(&pcidevs_lock); >>>> ret = deassign_device(d, seg, bus, devfn); >>>> spin_unlock(&pcidevs_lock); >>> Hi Andrew, >>> >>> Attached are the serial logs good (with revert) and bad (without): >>> >>> Some things that seems strange to me: >>> - The numerous calls to get the device 08:00.0 assigned ... for 0a:00.0 >>> there >>> was only one call to both test assign and assign. >>> - However these numerous calls are there in both the good and the bad case, >>> so perhaps it's strange and wrong .. but not the cause .. >>> - I had a hunch it could be due to the 08:00.0 using MSI-X, but when only >>> passing through 0a:00.0, i get the same numerous calls but now for the >>> 0a:00.0 which uses IntX, so I think that is more related to being the >>> *first* >>> device to be passed through to a guest. >> I have also observed this behaviour, but not had time to investigate. >> It doesn't appear problematic in the longrun but it probably a toolstack >> issue which wants fixing (if only in the name of efficiency). > And just after I sent this email, I have realised why. > The first assign device will have to build IO pagetables, which is a > long operation and subject to hypercall continuations. The second > device will reused the same pagetables, so is quick to complete. So .. is the ioreq patch from Paul involved in providing something used in building the pagetables .. and could it have say some off by one resulting in the 0xffffffffffff .. which could lead to the pagetable building going beserk, requiring a paging_mode far greater than normally would be required .. which get's set .. since that isn't checked properly .. leading to things breaking a bit further when it does get checked ? -- Sander > ~Andrew >> >>> - On both the good and the bad case the "current" indicates more than one >>> vpcu >>> was used, so that doesn't seem to be a pointer either. >> That is to be expected. It will be whichever vcpu the toolstack process >> happened to be scheduled on at the point that it made the hypercall, and >> is liable to change at the whim of the dom0 scheduler. >> >>> - The bad log seems to indicate it crashes on or before >>> hd->arch.paging_mode gets to >>> 3. >>> >>> Is there anything useful to dump in update_paging_mode() when it updates >>> the >>> paging mode to the invalid value (8)? >>> (and shouldn't that be denied in the first place ?) >>> >>> To me that looks like the first indication that things are starting to go >>> wrong. >> I would concur. I am currently trying to develop another debugging >> patch, but to do that I also what to understand what I am debugging, >> which is why I have the AMD-Vi spec open :) >> >> ~Andrew >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxx >> http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.