[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 7:35:57 PM, you wrote: > On 11/04/15 18:25, Sander Eikelenboom wrote: >> Saturday, April 11, 2015, 6:38:17 PM, you wrote: >> >>> On 11/04/15 17:32, Andrew Cooper wrote: >>>> On 11/04/15 17:21, Sander Eikelenboom wrote: >>>>> Saturday, April 11, 2015, 4:21:56 PM, you wrote: >>>>> >>>>>> On 11/04/15 15:11, Sander Eikelenboom wrote: >>>>>>> Friday, April 10, 2015, 8:55:27 PM, you wrote: >>>>>>> >>>>>>>> On 10/04/15 11:24, Sander Eikelenboom wrote: >>>>>>>>> Hi Andrew, >>>>>>>>> >>>>>>>>> Finally got some time to figure this out .. and i have narrowed it >>>>>>>>> down to: >>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git >>>>>>>>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the >>>>>>>>> ioreq-server API when available" >>>>>>>>> A straight revert of this commit prevents the issue from happening. >>>>>>>>> >>>>>>>>> The reason i had a hard time figuring this out was: >>>>>>>>> - I wasn't aware of this earlier, since git pulling the main xen >>>>>>>>> tree, doesn't >>>>>>>>> auto update the qemu-* trees. >>>>>>>> This has caught me out so many times. It is very non-obvious >>>>>>>> behaviour. >>>>>>>>> - So i happen to get this when i cloned a fresh tree to try to figure >>>>>>>>> out the >>>>>>>>> other issue i was seeing. >>>>>>>>> - After that checking out previous versions of the main xen tree >>>>>>>>> didn't resolve >>>>>>>>> this new issue, because the qemu tree doesn't get auto updated and >>>>>>>>> is set >>>>>>>>> "master". >>>>>>>>> - Cloning a xen-stable-4.5.0 made it go away .. because that has a >>>>>>>>> specific >>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which >>>>>>>>> is not >>>>>>>>> master. >>>>>>>>> >>>>>>>>> *sigh* >>>>>>>>> >>>>>>>>> This is tested with xen main tree at last commit >>>>>>>>> 3a28f760508fb35c430edac17a9efde5aff6d1d5 >>>>>>>>> (normal xen-unstable, not the staging branch) >>>>>>>>> >>>>>>>>> Ok so i have added some extra debug info (see attached diff) and this >>>>>>>>> is the >>>>>>>>> output when it crashes due to something the commit above triggered, >>>>>>>>> the >>>>>>>>> level is out of bounds and the pfn looks fishy too. >>>>>>>>> Complete serial log from both bad and good (specific commit reverted) >>>>>>>>> are >>>>>>>>> attached. >>>>>>>> Just to confirm, you are positively identifying a qemu changeset as >>>>>>>> causing this crash? >>>>>>>> If so, the qemu change has discovered a pre-existing issue in the >>>>>>>> toolstack pci-passthrough interface. Whatever qemu is or isn't doing, >>>>>>>> it should not be able to cause a crash like this. >>>>>>>> With this in mind, I need to brush up on my AMD-Vi details. >>>>>>>> In the meantime, can you run with the following patch to identify what >>>>>>>> is going on, domctl wise? I assume it is the assign_device which is >>>>>>>> failing, but it will be nice to observe the differences between the >>>>>>>> working and failing case, which might offer a hint. >>>>>>> Hrrm with your patch i end up with a fatal page fault in >>>>>>> iommu_do_pci_domctl: >>>>>>> >>>>>>> (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable x86_64 debug=y >>>>>>> Tainted: C ]---- >>>>>>> (XEN) [2015-04-11 14:03:31.857] CPU: 5 >>>>>>> (XEN) [2015-04-11 14:03:31.868] RIP: e008:[<ffff82d08014c52c>] >>>>>>> iommu_do_pci_domctl+0x2dc/0x740 >>>>>>> (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256 CONTEXT: >>>>>>> hypervisor >>>>>>> (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008 rbx: >>>>>>> 0000000000000800 rcx: ffffffffffebe5ed >>>>>>> (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800 rsi: >>>>>>> 0000000000000000 rdi: ffff830256ef7e38 >>>>>>> (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98 rsp: >>>>>>> ffff830256ef7c08 r8: 00000000deadbeef >>>>>>> (XEN) [2015-04-11 14:03:31.995] r9: 00000000deadbeef r10: >>>>>>> ffff82d08024e500 r11: 0000000000000282 >>>>>>> (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000 r13: >>>>>>> 0000000000000008 r14: 0000000000000000 >>>>>>> (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000 cr0: >>>>>>> 0000000080050033 cr4: 00000000000006f0 >>>>>>> (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000 cr2: >>>>>>> 0000000000000000 >>>>>>> (XEN) [2015-04-11 14:03:32.096] ds: 0000 es: 0000 fs: 0000 gs: >>>>>>> 0000 ss: e010 cs: e008 >>>>>>> (XEN) [2015-04-11 14:03:32.121] Xen stack trace from >>>>>>> rsp=ffff830256ef7c08: >>>>>>> (XEN) [2015-04-11 14:03:32.141] ffff830256ef7c78 ffff82d08012c178 >>>>>>> ffff830256ef7c28 ffff830256ef7c28 >>>>>>> (XEN) [2015-04-11 14:03:32.168] 0000000000000010 0000000000000000 >>>>>>> 0000000000000000 0000000000000000 >>>>>>> (XEN) [2015-04-11 14:03:32.195] 00000000000006f0 00007fe300000000 >>>>>>> ffff830256eb7790 ffff83025cc6d300 >>>>>>> (XEN) [2015-04-11 14:03:32.222] ffff82d080330c60 00007fe396bab004 >>>>>>> 0000000000000000 00007fe396bab004 >>>>>>> (XEN) [2015-04-11 14:03:32.249] 0000000000000000 0000000000000005 >>>>>>> ffff830256ef7ca8 ffff82d08014900b >>>>>>> (XEN) [2015-04-11 14:03:32.276] ffff830256ef7d98 ffff82d080161f2d >>>>>>> 0000000000000010 0000000000000000 >>>>>>> (XEN) [2015-04-11 14:03:32.303] 0000000000000000 ffff830256ef7ce8 >>>>>>> ffff82d08018b655 ffff830256ef7d48 >>>>>>> (XEN) [2015-04-11 14:03:32.330] ffff830256ef7cf8 ffff82d08018b66a >>>>>>> ffff830256ef7d38 ffff82d08012925e >>>>>>> (XEN) [2015-04-11 14:03:32.357] ffff830256efc068 0000000800000001 >>>>>>> 800000022e12c167 0000000000000000 >>>>>>> (XEN) [2015-04-11 14:03:32.384] 0000000000000002 ffff830256ef7e38 >>>>>>> 0000000800000000 800000022e12c167 >>>>>>> (XEN) [2015-04-11 14:03:32.411] 0000000000000003 ffff830256ef7db8 >>>>>>> 0000000000000000 00007fe396780eb0 >>>>>>> (XEN) [2015-04-11 14:03:32.439] 0000000000000202 ffffffffffffffff >>>>>>> 0000000000000000 00007fe396bab004 >>>>>>> (XEN) [2015-04-11 14:03:32.466] 0000000000000000 0000000000000005 >>>>>>> ffff830256ef7ef8 ffff82d08010497f >>>>>>> (XEN) [2015-04-11 14:03:32.493] 0000000000000001 0000000000100001 >>>>>>> 800000022e12c167 ffff88001f7ecc00 >>>>>>> (XEN) [2015-04-11 14:03:32.520] 00007fe396780eb0 ffff88001c849508 >>>>>>> 0000000e00000007 ffffffff8105594a >>>>>>> (XEN) [2015-04-11 14:03:32.547] 000000000000e033 0000000000000202 >>>>>>> ffff88001ece3d40 000000000000e02b >>>>>>> (XEN) [2015-04-11 14:03:32.574] ffff830256ef7e28 ffff82d080194933 >>>>>>> 000000000000beef ffffffff81bd6c85 >>>>>>> (XEN) [2015-04-11 14:03:32.601] ffff830256ef7f08 ffff82d080193edd >>>>>>> 0000000b0000002d 0000000000000001 >>>>>>> (XEN) [2015-04-11 14:03:32.628] 0000000100000800 00007fe3962abbd0 >>>>>>> ffff000a81050001 00007fe39656ce6e >>>>>>> (XEN) [2015-04-11 14:03:32.655] 00007ffdf2a654f0 00007fe39656d0c9 >>>>>>> 00007fe39656ce6e 00007fe3969a9a55 >>>>>>> (XEN) [2015-04-11 14:03:32.682] Xen call trace: >>>>>>> (XEN) [2015-04-11 14:03:32.695] [<ffff82d08014c52c>] >>>>>>> iommu_do_pci_domctl+0x2dc/0x740 >>>>>>> (XEN) [2015-04-11 14:03:32.718] [<ffff82d08014900b>] >>>>>>> iommu_do_domctl+0x17/0x1a >>>>>>> (XEN) [2015-04-11 14:03:32.739] [<ffff82d080161f2d>] >>>>>>> arch_do_domctl+0x2469/0x26e1 >>>>>>> (XEN) [2015-04-11 14:03:32.762] [<ffff82d08010497f>] >>>>>>> do_domctl+0x1a1f/0x1d60 >>>>>>> (XEN) [2015-04-11 14:03:32.783] [<ffff82d080234c6b>] >>>>>>> syscall_enter+0xeb/0x145 >>>>>>> (XEN) [2015-04-11 14:03:32.804] >>>>>>> (XEN) [2015-04-11 14:03:32.813] Pagetable walk from 0000000000000000: >>>>>>> (XEN) [2015-04-11 14:03:32.831] L4[0x000] = 0000000234075067 >>>>>>> 000000000001f2a8 >>>>>>> (XEN) [2015-04-11 14:03:32.852] L3[0x000] = 0000000229ad4067 >>>>>>> 0000000000014c49 >>>>>>> (XEN) [2015-04-11 14:03:32.873] L2[0x000] = 0000000000000000 >>>>>>> ffffffffffffffff >>>>>>> (XEN) [2015-04-11 14:03:32.894] >>>>>>> (XEN) [2015-04-11 14:03:32.903] **************************************** >>>>>>> (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5: >>>>>>> (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT >>>>>>> (XEN) [2015-04-11 14:03:32.948] [error_code=0000] >>>>>>> (XEN) [2015-04-11 14:03:32.961] Faulting linear address: >>>>>>> 0000000000000000 >>>>>>> (XEN) [2015-04-11 14:03:32.981] **************************************** >>>>>>> (XEN) [2015-04-11 14:03:33.000] >>>>>>> (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds... >>>>>>> >>>>>>> The RIP resolves to the prink added by your patch in: >>>>>>> >>>>>>> case XEN_DOMCTL_test_assign_device: >>>>>>> ret = xsm_test_assign_device(XSM_HOOK, >>>>>>> domctl->u.assign_device.machine_sbdf); >>>>>>> if ( ret ) >>>>>>> break; >>>>>>> >>>>>>> seg = domctl->u.assign_device.machine_sbdf >> 16; >>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>>> >>>>>>> printk("*** %pv->d%d: >>>>>>> test_assign_device({%04x:%02x:%02x.%u})\n", >>>>>>> current, d->domain_id, >>>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>> >>>>>>> if ( device_assigned(seg, bus, devfn) ) >>>>>>> { >>>>>>> printk(XENLOG_G_INFO >>>>>>> "%04x:%02x:%02x.%u already assigned, or >>>>>>> non-existent\n", >>>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>> ret = -EINVAL; >>>>>>> } >>>>>>> break; >>>>>> hmm - 'd' is NULL. This ought to work better. >>>>>> diff --git a/xen/drivers/passthrough/pci.c >>>>>> b/xen/drivers/passthrough/pci.c >>>>>> index 9f3413c..85ff1fc 100644 >>>>>> --- a/xen/drivers/passthrough/pci.c >>>>>> +++ b/xen/drivers/passthrough/pci.c >>>>>> @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl( >>>>>> max_sdevs = domctl->u.get_device_group.max_sdevs; >>>>>> sdevs = domctl->u.get_device_group.sdev_array; >>>>>> >>>>>> + printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u, >>>>>> %u})\n", >>>>>> + current, d->domain_id, >>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), >>>>>> + max_sdevs); >>>>>> + >>>>>> ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, >>>>>> max_sdevs); >>>>>> if ( ret < 0 ) >>>>>> { >>>>>> @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl( >>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>> >>>>>> + printk("*** %pv: test_assign_device({%04x:%02x:%02x.%u})\n", >>>>>> + current, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>> + >>>>>> if ( device_assigned(seg, bus, devfn) ) >>>>>> { >>>>>> printk(XENLOG_G_INFO >>>>>> @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl( >>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>> >>>>>> + printk("*** %pv->d%d: assign_device({%04x:%02x:%02x.%u})\n", >>>>>> + current, d->domain_id, >>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>> + >>>>>> ret = device_assigned(seg, bus, devfn) ?: >>>>>> assign_device(d, seg, bus, devfn); >>>>>> if ( ret == -ERESTART ) >>>>>> @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl( >>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>> >>>>>> + printk("*** %pv->d%d: deassign_device({%04x:%02x:%02x.%u})\n", >>>>>> + current, d->domain_id, >>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>> + >>>>>> spin_lock(&pcidevs_lock); >>>>>> ret = deassign_device(d, seg, bus, devfn); >>>>>> spin_unlock(&pcidevs_lock); >>>>> Hi Andrew, >>>>> >>>>> Attached are the serial logs good (with revert) and bad (without): >>>>> >>>>> Some things that seems strange to me: >>>>> - The numerous calls to get the device 08:00.0 assigned ... for 0a:00.0 >>>>> there >>>>> was only one call to both test assign and assign. >>>>> - However these numerous calls are there in both the good and the bad >>>>> case, >>>>> so perhaps it's strange and wrong .. but not the cause .. >>>>> - I had a hunch it could be due to the 08:00.0 using MSI-X, but when only >>>>> passing through 0a:00.0, i get the same numerous calls but now for the >>>>> 0a:00.0 which uses IntX, so I think that is more related to being the >>>>> *first* >>>>> device to be passed through to a guest. >>>> I have also observed this behaviour, but not had time to investigate. >>>> It doesn't appear problematic in the longrun but it probably a toolstack >>>> issue which wants fixing (if only in the name of efficiency). >>> And just after I sent this email, I have realised why. >>> The first assign device will have to build IO pagetables, which is a >>> long operation and subject to hypercall continuations. The second >>> device will reused the same pagetables, so is quick to complete. >> So .. is the ioreq patch from Paul involved in providing something used in >> building >> the pagetables .. and could it have say some off by one resulting in the >> 0xffffffffffff .. which could lead to the pagetable building going beserk, >> requiring a paging_mode far greater than normally would be required .. which >> get's set .. since that isn't checked properly .. leading to things breaking >> a bit further when it does get checked ? > A -1 is slipping in somewhere and ending up in the gfn field. > The result is that update_paging_mode() attempts to construct > iopagetables to cover a 76bit address space, which is how level ends up > at 8. (Note that a level of 7 is reserved, and a level of anything > greater than 4 is implausible on your system.) > I think the crash is collateral damage following on from > update_paging_mode() not properly sanitising its input, but that there > is still some other issue causing -1 to be passed in the first place. > I am still trying to locate where a -1 might plausibly be coming from. I have just added some extra debug code to store the values from the start of update_paging_mode() .. so i can print them at the end if the paging_mode gets out of band and do a dump_stack() as well. Hopefully is will confirm this. -- Sander > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |