[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 10:22:16 PM, you wrote: > On 11/04/2015 20:33, Sander Eikelenboom wrote: >> Saturday, April 11, 2015, 8:25:52 PM, you wrote: >> >>> On 11/04/15 18:42, Sander Eikelenboom wrote: >>>> Saturday, April 11, 2015, 7:35:57 PM, you wrote: >>>> >>>>> On 11/04/15 18:25, Sander Eikelenboom wrote: >>>>>> Saturday, April 11, 2015, 6:38:17 PM, you wrote: >>>>>> >>>>>>> On 11/04/15 17:32, Andrew Cooper wrote: >>>>>>>> On 11/04/15 17:21, Sander Eikelenboom wrote: >>>>>>>>> Saturday, April 11, 2015, 4:21:56 PM, you wrote: >>>>>>>>> >>>>>>>>>> On 11/04/15 15:11, Sander Eikelenboom wrote: >>>>>>>>>>> Friday, April 10, 2015, 8:55:27 PM, you wrote: >>>>>>>>>>> >>>>>>>>>>>> On 10/04/15 11:24, Sander Eikelenboom wrote: >>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>> >>>>>>>>>>>>> Finally got some time to figure this out .. and i have narrowed >>>>>>>>>>>>> it down to: >>>>>>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git >>>>>>>>>>>>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the >>>>>>>>>>>>> ioreq-server API when available" >>>>>>>>>>>>> A straight revert of this commit prevents the issue from >>>>>>>>>>>>> happening. >>>>>>>>>>>>> >>>>>>>>>>>>> The reason i had a hard time figuring this out was: >>>>>>>>>>>>> - I wasn't aware of this earlier, since git pulling the main xen >>>>>>>>>>>>> tree, doesn't >>>>>>>>>>>>> auto update the qemu-* trees. >>>>>>>>>>>> This has caught me out so many times. It is very non-obvious >>>>>>>>>>>> behaviour. >>>>>>>>>>>>> - So i happen to get this when i cloned a fresh tree to try to >>>>>>>>>>>>> figure out the >>>>>>>>>>>>> other issue i was seeing. >>>>>>>>>>>>> - After that checking out previous versions of the main xen tree >>>>>>>>>>>>> didn't resolve >>>>>>>>>>>>> this new issue, because the qemu tree doesn't get auto updated >>>>>>>>>>>>> and is set >>>>>>>>>>>>> "master". >>>>>>>>>>>>> - Cloning a xen-stable-4.5.0 made it go away .. because that has >>>>>>>>>>>>> a specific >>>>>>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag >>>>>>>>>>>>> which is not >>>>>>>>>>>>> master. >>>>>>>>>>>>> >>>>>>>>>>>>> *sigh* >>>>>>>>>>>>> >>>>>>>>>>>>> This is tested with xen main tree at last commit >>>>>>>>>>>>> 3a28f760508fb35c430edac17a9efde5aff6d1d5 >>>>>>>>>>>>> (normal xen-unstable, not the staging branch) >>>>>>>>>>>>> >>>>>>>>>>>>> Ok so i have added some extra debug info (see attached diff) and >>>>>>>>>>>>> this is the >>>>>>>>>>>>> output when it crashes due to something the commit above >>>>>>>>>>>>> triggered, the >>>>>>>>>>>>> level is out of bounds and the pfn looks fishy too. >>>>>>>>>>>>> Complete serial log from both bad and good (specific commit >>>>>>>>>>>>> reverted) are >>>>>>>>>>>>> attached. >>>>>>>>>>>> Just to confirm, you are positively identifying a qemu changeset as >>>>>>>>>>>> causing this crash? >>>>>>>>>>>> If so, the qemu change has discovered a pre-existing issue in the >>>>>>>>>>>> toolstack pci-passthrough interface. Whatever qemu is or isn't >>>>>>>>>>>> doing, >>>>>>>>>>>> it should not be able to cause a crash like this. >>>>>>>>>>>> With this in mind, I need to brush up on my AMD-Vi details. >>>>>>>>>>>> In the meantime, can you run with the following patch to identify >>>>>>>>>>>> what >>>>>>>>>>>> is going on, domctl wise? I assume it is the assign_device which >>>>>>>>>>>> is >>>>>>>>>>>> failing, but it will be nice to observe the differences between the >>>>>>>>>>>> working and failing case, which might offer a hint. >>>>>>>>>>> Hrrm with your patch i end up with a fatal page fault in >>>>>>>>>>> iommu_do_pci_domctl: >>>>>>>>>>> >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable x86_64 >>>>>>>>>>> debug=y Tainted: C ]---- >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.857] CPU: 5 >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.868] RIP: e008:[<ffff82d08014c52c>] >>>>>>>>>>> iommu_do_pci_domctl+0x2dc/0x740 >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256 CONTEXT: >>>>>>>>>>> hypervisor >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008 rbx: >>>>>>>>>>> 0000000000000800 rcx: ffffffffffebe5ed >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800 rsi: >>>>>>>>>>> 0000000000000000 rdi: ffff830256ef7e38 >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98 rsp: >>>>>>>>>>> ffff830256ef7c08 r8: 00000000deadbeef >>>>>>>>>>> (XEN) [2015-04-11 14:03:31.995] r9: 00000000deadbeef r10: >>>>>>>>>>> ffff82d08024e500 r11: 0000000000000282 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000 r13: >>>>>>>>>>> 0000000000000008 r14: 0000000000000000 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000 cr0: >>>>>>>>>>> 0000000080050033 cr4: 00000000000006f0 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000 cr2: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.096] ds: 0000 es: 0000 fs: 0000 >>>>>>>>>>> gs: 0000 ss: e010 cs: e008 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.121] Xen stack trace from >>>>>>>>>>> rsp=ffff830256ef7c08: >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.141] ffff830256ef7c78 >>>>>>>>>>> ffff82d08012c178 ffff830256ef7c28 ffff830256ef7c28 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.168] 0000000000000010 >>>>>>>>>>> 0000000000000000 0000000000000000 0000000000000000 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.195] 00000000000006f0 >>>>>>>>>>> 00007fe300000000 ffff830256eb7790 ffff83025cc6d300 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.222] ffff82d080330c60 >>>>>>>>>>> 00007fe396bab004 0000000000000000 00007fe396bab004 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.249] 0000000000000000 >>>>>>>>>>> 0000000000000005 ffff830256ef7ca8 ffff82d08014900b >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.276] ffff830256ef7d98 >>>>>>>>>>> ffff82d080161f2d 0000000000000010 0000000000000000 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.303] 0000000000000000 >>>>>>>>>>> ffff830256ef7ce8 ffff82d08018b655 ffff830256ef7d48 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.330] ffff830256ef7cf8 >>>>>>>>>>> ffff82d08018b66a ffff830256ef7d38 ffff82d08012925e >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.357] ffff830256efc068 >>>>>>>>>>> 0000000800000001 800000022e12c167 0000000000000000 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.384] 0000000000000002 >>>>>>>>>>> ffff830256ef7e38 0000000800000000 800000022e12c167 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.411] 0000000000000003 >>>>>>>>>>> ffff830256ef7db8 0000000000000000 00007fe396780eb0 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.439] 0000000000000202 >>>>>>>>>>> ffffffffffffffff 0000000000000000 00007fe396bab004 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.466] 0000000000000000 >>>>>>>>>>> 0000000000000005 ffff830256ef7ef8 ffff82d08010497f >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.493] 0000000000000001 >>>>>>>>>>> 0000000000100001 800000022e12c167 ffff88001f7ecc00 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.520] 00007fe396780eb0 >>>>>>>>>>> ffff88001c849508 0000000e00000007 ffffffff8105594a >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.547] 000000000000e033 >>>>>>>>>>> 0000000000000202 ffff88001ece3d40 000000000000e02b >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.574] ffff830256ef7e28 >>>>>>>>>>> ffff82d080194933 000000000000beef ffffffff81bd6c85 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.601] ffff830256ef7f08 >>>>>>>>>>> ffff82d080193edd 0000000b0000002d 0000000000000001 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.628] 0000000100000800 >>>>>>>>>>> 00007fe3962abbd0 ffff000a81050001 00007fe39656ce6e >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.655] 00007ffdf2a654f0 >>>>>>>>>>> 00007fe39656d0c9 00007fe39656ce6e 00007fe3969a9a55 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.682] Xen call trace: >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.695] [<ffff82d08014c52c>] >>>>>>>>>>> iommu_do_pci_domctl+0x2dc/0x740 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.718] [<ffff82d08014900b>] >>>>>>>>>>> iommu_do_domctl+0x17/0x1a >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.739] [<ffff82d080161f2d>] >>>>>>>>>>> arch_do_domctl+0x2469/0x26e1 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.762] [<ffff82d08010497f>] >>>>>>>>>>> do_domctl+0x1a1f/0x1d60 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.783] [<ffff82d080234c6b>] >>>>>>>>>>> syscall_enter+0xeb/0x145 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.804] >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.813] Pagetable walk from >>>>>>>>>>> 0000000000000000: >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.831] L4[0x000] = 0000000234075067 >>>>>>>>>>> 000000000001f2a8 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.852] L3[0x000] = 0000000229ad4067 >>>>>>>>>>> 0000000000014c49 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.873] L2[0x000] = 0000000000000000 >>>>>>>>>>> ffffffffffffffff >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.894] >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.903] >>>>>>>>>>> **************************************** >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5: >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.948] [error_code=0000] >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.961] Faulting linear address: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> (XEN) [2015-04-11 14:03:32.981] >>>>>>>>>>> **************************************** >>>>>>>>>>> (XEN) [2015-04-11 14:03:33.000] >>>>>>>>>>> (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds... >>>>>>>>>>> >>>>>>>>>>> The RIP resolves to the prink added by your patch in: >>>>>>>>>>> >>>>>>>>>>> case XEN_DOMCTL_test_assign_device: >>>>>>>>>>> ret = xsm_test_assign_device(XSM_HOOK, >>>>>>>>>>> domctl->u.assign_device.machine_sbdf); >>>>>>>>>>> if ( ret ) >>>>>>>>>>> break; >>>>>>>>>>> >>>>>>>>>>> seg = domctl->u.assign_device.machine_sbdf >> 16; >>>>>>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>>>>>>> >>>>>>>>>>> printk("*** %pv->d%d: >>>>>>>>>>> test_assign_device({%04x:%02x:%02x.%u})\n", >>>>>>>>>>> current, d->domain_id, >>>>>>>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>>>>>> >>>>>>>>>>> if ( device_assigned(seg, bus, devfn) ) >>>>>>>>>>> { >>>>>>>>>>> printk(XENLOG_G_INFO >>>>>>>>>>> "%04x:%02x:%02x.%u already assigned, or >>>>>>>>>>> non-existent\n", >>>>>>>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>>>>>> ret = -EINVAL; >>>>>>>>>>> } >>>>>>>>>>> break; >>>>>>>>>> hmm - 'd' is NULL. This ought to work better. >>>>>>>>>> diff --git a/xen/drivers/passthrough/pci.c >>>>>>>>>> b/xen/drivers/passthrough/pci.c >>>>>>>>>> index 9f3413c..85ff1fc 100644 >>>>>>>>>> --- a/xen/drivers/passthrough/pci.c >>>>>>>>>> +++ b/xen/drivers/passthrough/pci.c >>>>>>>>>> @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl( >>>>>>>>>> max_sdevs = domctl->u.get_device_group.max_sdevs; >>>>>>>>>> sdevs = domctl->u.get_device_group.sdev_array; >>>>>>>>>> >>>>>>>>>> + printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u, >>>>>>>>>> %u})\n", >>>>>>>>>> + current, d->domain_id, >>>>>>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), >>>>>>>>>> + max_sdevs); >>>>>>>>>> + >>>>>>>>>> ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, >>>>>>>>>> max_sdevs); >>>>>>>>>> if ( ret < 0 ) >>>>>>>>>> { >>>>>>>>>> @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl( >>>>>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>>>>>> >>>>>>>>>> + printk("*** %pv: test_assign_device({%04x:%02x:%02x.%u})\n", >>>>>>>>>> + current, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>>>>> + >>>>>>>>>> if ( device_assigned(seg, bus, devfn) ) >>>>>>>>>> { >>>>>>>>>> printk(XENLOG_G_INFO >>>>>>>>>> @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl( >>>>>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>>>>>> >>>>>>>>>> + printk("*** %pv->d%d: assign_device({%04x:%02x:%02x.%u})\n", >>>>>>>>>> + current, d->domain_id, >>>>>>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>>>>> + >>>>>>>>>> ret = device_assigned(seg, bus, devfn) ?: >>>>>>>>>> assign_device(d, seg, bus, devfn); >>>>>>>>>> if ( ret == -ERESTART ) >>>>>>>>>> @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl( >>>>>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>>>>>> >>>>>>>>>> + printk("*** %pv->d%d: >>>>>>>>>> deassign_device({%04x:%02x:%02x.%u})\n", >>>>>>>>>> + current, d->domain_id, >>>>>>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>>>>>> + >>>>>>>>>> spin_lock(&pcidevs_lock); >>>>>>>>>> ret = deassign_device(d, seg, bus, devfn); >>>>>>>>>> spin_unlock(&pcidevs_lock); >>>>>>>>> Hi Andrew, >>>>>>>>> >>>>>>>>> Attached are the serial logs good (with revert) and bad (without): >>>>>>>>> >>>>>>>>> Some things that seems strange to me: >>>>>>>>> - The numerous calls to get the device 08:00.0 assigned ... for >>>>>>>>> 0a:00.0 there >>>>>>>>> was only one call to both test assign and assign. >>>>>>>>> - However these numerous calls are there in both the good and the bad >>>>>>>>> case, >>>>>>>>> so perhaps it's strange and wrong .. but not the cause .. >>>>>>>>> - I had a hunch it could be due to the 08:00.0 using MSI-X, but when >>>>>>>>> only >>>>>>>>> passing through 0a:00.0, i get the same numerous calls but now for >>>>>>>>> the >>>>>>>>> 0a:00.0 which uses IntX, so I think that is more related to being >>>>>>>>> the *first* >>>>>>>>> device to be passed through to a guest. >>>>>>>> I have also observed this behaviour, but not had time to investigate. >>>>>>>> It doesn't appear problematic in the longrun but it probably a >>>>>>>> toolstack >>>>>>>> issue which wants fixing (if only in the name of efficiency). >>>>>>> And just after I sent this email, I have realised why. >>>>>>> The first assign device will have to build IO pagetables, which is a >>>>>>> long operation and subject to hypercall continuations. The second >>>>>>> device will reused the same pagetables, so is quick to complete. >>>>>> So .. is the ioreq patch from Paul involved in providing something used >>>>>> in building >>>>>> the pagetables .. and could it have say some off by one resulting in the >>>>>> 0xffffffffffff .. which could lead to the pagetable building going >>>>>> beserk, >>>>>> requiring a paging_mode far greater than normally would be required .. >>>>>> which >>>>>> get's set .. since that isn't checked properly .. leading to things >>>>>> breaking >>>>>> a bit further when it does get checked ? >>>>> A -1 is slipping in somewhere and ending up in the gfn field. >>>>> The result is that update_paging_mode() attempts to construct >>>>> iopagetables to cover a 76bit address space, which is how level ends up >>>>> at 8. (Note that a level of 7 is reserved, and a level of anything >>>>> greater than 4 is implausible on your system.) >>>>> I think the crash is collateral damage following on from >>>>> update_paging_mode() not properly sanitising its input, but that there >>>>> is still some other issue causing -1 to be passed in the first place. >>>>> I am still trying to locate where a -1 might plausibly be coming from. >>>> I have just added some extra debug code to store the values from the start >>>> of update_paging_mode() .. so i can print them at the end if the >>>> paging_mode >>>> gets out of band and do a dump_stack() as well. Hopefully is will confirm >>>> this. >>> Right - arch_iommu_populate_page_table() is falling over a page >>> allocated to the domain which doesn't have a valid gfn. >>> The ioreq server allocates itself some guest pages and then shoots them >>> out as part of setting the server up. (This is a kudge to work around >>> the fact that Xen doesn't have an interface for device models etc to >>> allocate memory on behalf of the domain which will strictly never find >>> its way into the guest physmap.) >>> Can you try this patch and see whether some of the numbers printed out >>> start matching up? >>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c >>> index bfde380..d1adfa7 100644 >>> --- a/xen/arch/x86/hvm/hvm.c >>> +++ b/xen/arch/x86/hvm/hvm.c >>> @@ -534,6 +534,10 @@ static int hvm_map_ioreq_page( >>> static void hvm_remove_ioreq_gmfn( >>> struct domain *d, struct hvm_ioreq_page *iorp) >>> { >>> + printk("*** %s() d%d, page %p, mfn %lx, gfn %lx, va %p\n", >>> + __func__, d->domain_id, >> + iorp->>page, page_to_mfn(iorp->page), iorp->gmfn, iorp->va); >>> + >>> guest_physmap_remove_page(d, iorp->gmfn, >>> page_to_mfn(iorp->page), 0); >>> clear_page(iorp->va); >>> diff --git a/xen/drivers/passthrough/x86/iommu.c >>> b/xen/drivers/passthrough/x86/iommu.c >>> index 9eb8d33..048a1a9 100644 >>> --- a/xen/drivers/passthrough/x86/iommu.c >>> +++ b/xen/drivers/passthrough/x86/iommu.c >>> @@ -59,7 +59,16 @@ int arch_iommu_populate_page_table(struct domain *d) >>> if ( has_hvm_container_domain(d) || >>> (page->u.inuse.type_info & PGT_type_mask) == >>> PGT_writable_page ) >>> { >>> - BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page)))); >>> + unsigned long mfn = page_to_mfn(page); >>> + unsigned long gfn = mfn_to_gmfn(d, mfn); >>> + >>> + BUG_ON(SHARED_M2P(gfn)); >>> + >>> + if ( gfn == INVALID_MFN ) >>> + { >>> + printk("*** %s() d%d, page %p, mfn %lx, gfn %lx - about >>> to break\n", >>> + __func__, d->domain_id, page, mfn, gfn); >>> + } >>> rc = hd->platform_ops->map_page( >>> d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page), >>> IOMMUF_readable|IOMMUF_writable); >> >> Ok .. so here we go: >> >> (XEN) [2015-04-11 19:24:59.418] *** hvm_remove_ioreq_gmfn() d1, page >> ffff82e0049f7700, mfn 24fbb8, gfn feff0, va ffff82c00082b000 >> (XEN) [2015-04-11 19:24:59.452] *** hvm_remove_ioreq_gmfn() d1, page >> ffff82e0049f76e0, mfn 24fbb7, gfn feff1, va ffff82c00082d000 >> (XEN) [2015-04-11 19:25:00.158] *** d0v5: test_assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:02.221] io.c:429: d1: bind: m_gsi=47 g_gsi=36 >> dev=00.00.5 intx=0 >> (XEN) [2015-04-11 19:25:02.248] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:02.268] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> (XEN) [2015-04-11 19:25:02.290] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:02.310] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> (XEN) [2015-04-11 19:25:02.333] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:02.353] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> (XEN) [2015-04-11 19:25:02.375] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:02.395] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> <BIG SNIP> >> (XEN) [2015-04-11 19:25:45.444] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:45.464] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> (XEN) [2015-04-11 19:25:45.486] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:45.506] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> (XEN) [2015-04-11 19:25:45.529] *** d0v1->d1: assign_device({0000:0a:00.0}) >> (XEN) [2015-04-11 19:25:45.549] ?!?!? d1: pci_dev:0000:0a:00.0 >> hd->arch.paging_mode:2 >> (XEN) [2015-04-11 19:25:45.571] *** arch_iommu_populate_page_table() d1, >> page ffff82e0049f7700, mfn 24fbb8, gfn ffffffffffffffff - about to break >> (XEN) [2015-04-11 19:25:45.610] AMD-Vi: ?!?!? amd_iommu_map_page level >> before:3 gfn:0xffffffffffffffff mfn:0x24fbb8 flags:3 >> (XEN) [2015-04-11 19:25:45.642] AMD-Vi: ?!?!? update_paging_mode level >> before:3 gfn:0xffffffffffffffff >> (XEN) [2015-04-11 19:25:45.669] AMD-Vi: ?!?!? update_paging_mode end: >> paging_mode:6 offset:31 root: old_root_mfn:0x11d55b new_root_mfn:0x11d55a >> gfn:0xffffffffffffffff req_id:0 PTE_PER_TABLE_SIZE:512 >> (XEN) [2015-04-11 19:25:45.722] AMD-Vi: ?!?!? update_paging_mode end: values >> at start: paging_mode:3 offset:-1 gfn:0xffffffffffffffff >> (XEN) [2015-04-11 19:25:45.757] AMD-Vi: ?!?!? amd_iommu_map_page level after >> update paging mode:6 gfn:0xffffffffffffffff mfn:0x24fbb8 flags:3 >> (XEN) [2015-04-11 19:25:45.794] AMD-Vi: ?!?!? amd_iommu_map_page level end:6 >> gfn:0xffffffffffffffff mfn:0x24fbb8 flags:3 >> (XEN) [2015-04-11 19:25:45.826] *** arch_iommu_populate_page_table() d1, >> page ffff82e0049f76e0, mfn 24fbb7, gfn ffffffffffffffff - about to break >> (XEN) [2015-04-11 19:25:45.864] AMD-Vi: ?!?!? amd_iommu_map_page level >> before:6 gfn:0xffffffffffffffff mfn:0x24fbb7 flags:3 >> (XEN) [2015-04-11 19:25:45.897] AMD-Vi: ?!?!? update_paging_mode level >> before:6 gfn:0xffffffffffffffff >> (XEN) [2015-04-11 19:25:45.924] AMD-Vi: ?!?!? update_paging_mode end: >> paging_mode:8 offset:1 root: old_root_mfn:0x11d554 new_root_mfn:0x11d553 >> gfn:0xffffffffffffffff req_id:0 PTE_PER_TABLE_SIZE:512 >> (XEN) [2015-04-11 19:25:45.976] AMD-Vi: ?!?!? update_paging_mode end: values >> at start: paging_mode:6 offset:524287 gfn:0xffffffffffffffff >> (XEN) [2015-04-11 19:25:46.013] AMD-Vi: ?!?!? amd_iommu_map_page level after >> update paging mode:8 gfn:0xffffffffffffffff mfn:0x24fbb7 flags:3 >> (XEN) [2015-04-11 19:25:46.050] AMD-Vi: ?!?!? iommu_pde_from_gfn: domid:1 >> table:1 level:8 pfn:0xffffffffffffffff >> (XEN) [2015-04-11 19:25:46.079] Xen BUG at iommu_map.c:459 >> (XEN) [2015-04-11 19:25:46.095] ----[ Xen-4.6-unstable x86_64 debug=y >> Tainted: C ]---- >> (XEN) [2015-04-11 19:25:46.119] CPU: 2 >> (XEN) [2015-04-11 19:25:46.131] RIP: e008:[<ffff82d080155d03>] >> iommu_pde_from_gfn+0x82/0x47a >> (XEN) [2015-04-11 19:25:46.156] RFLAGS: 0000000000010202 CONTEXT: >> hypervisor >> (XEN) [2015-04-11 19:25:46.177] rax: 0000000000000000 rbx: >> 0000000000000008 rcx: 0000000000000000 >> (XEN) [2015-04-11 19:25:46.203] rdx: ffff830256f20000 rsi: >> 000000000000000a rdi: ffff82d0802986c0 >> (XEN) [2015-04-11 19:25:46.230] rbp: ffff830256f27ad8 rsp: >> ffff830256f27a78 r8: ffff830256f30000 >> (XEN) [2015-04-11 19:25:46.257] r9: 0000000000000002 r10: >> 0000000000000032 r11: 0000000000000002 >> (XEN) [2015-04-11 19:25:46.284] r12: ffff82e0023aaa60 r13: >> 000000000024fb00 r14: 00000000000000e9 >> (XEN) [2015-04-11 19:25:46.311] r15: 00007d2000000000 cr0: >> 0000000080050033 cr4: 00000000000006f0 >> (XEN) [2015-04-11 19:25:46.337] cr3: 000000025f176000 cr2: ffff8000007f6800 >> (XEN) [2015-04-11 19:25:46.358] ds: 0000 es: 0000 fs: 0000 gs: 0000 >> ss: e010 cs: e008 >> (XEN) [2015-04-11 19:25:46.383] Xen stack trace from rsp=ffff830256f27a78: >> (XEN) [2015-04-11 19:25:46.403] ffff83025f7bd000 ffff830256f27b30 >> ffffffffffffffff ffff830200000030 >> (XEN) [2015-04-11 19:25:46.430] ffff830256f27ae8 ffff830256f27aa8 >> 0000000000000000 ffff83025f7bd000 >> (XEN) [2015-04-11 19:25:46.457] ffff82e0049f76e0 000000000024fbb7 >> 00000000000000e9 00007d2000000000 >> (XEN) [2015-04-11 19:25:46.484] ffff830256f27b98 ffff82d0801562a9 >> 0000000000000206 ffff830256f27b08 >> (XEN) [2015-04-11 19:25:46.511] 000000000024fbb7 0000000000000003 >> ffff83025f7bd938 ffffffffffffffff >> (XEN) [2015-04-11 19:25:46.538] ffff83025f7bd000 ffff83025f7bd000 >> 000000000024fbb7 0000000000000000 >> (XEN) [2015-04-11 19:25:46.565] 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> (XEN) [2015-04-11 19:25:46.592] 0000000000000000 0000000000000000 >> ffff83025f7bd938 ffff83025f7bd000 >> (XEN) [2015-04-11 19:25:46.619] ffff82e0049f76e0 000000000024fbb7 >> 00000000000000e9 00007d2000000000 >> (XEN) [2015-04-11 19:25:46.646] ffff830256f27bf8 ffff82d08015a7f8 >> 0000000000000000 ffff83025f7bd020 >> (XEN) [2015-04-11 19:25:46.673] 000000000024fbb7 ffff830256f20000 >> ffff830256f27bf8 0000000000000000 >> (XEN) [2015-04-11 19:25:46.700] 000000000000000a 00007f017c8d1004 >> 0000000000000000 ffff83025f7bd000 >> (XEN) [2015-04-11 19:25:46.727] ffff830256f27c98 ffff82d08014c6e1 >> ffff830200000002 ffff82d08012c178 >> (XEN) [2015-04-11 19:25:46.754] 0000000000000000 ffff830256f27c28 >> 0000000000000001 0000000000000000 >> (XEN) [2015-04-11 19:25:46.781] 0000000000000000 0000000000000000 >> 00007f017c8d1004 0000000000000000 >> (XEN) [2015-04-11 19:25:46.808] ffff82d080331034 ffff830256f20000 >> 000000000025f176 00007f017c8d1004 >> (XEN) [2015-04-11 19:25:46.835] ffff83025f7bd000 00007f017c8d1004 >> ffff83025f7bd000 0000000000000005 >> (XEN) [2015-04-11 19:25:46.863] ffff830256f27ca8 ffff82d08014900b >> ffff830256f27d98 ffff82d080161f2d >> (XEN) [2015-04-11 19:25:46.890] 000000000023468c 0000000000000002 >> 0000000000000005 0000000000000001 >> (XEN) [2015-04-11 19:25:46.917] ffff82d080331bb8 0000000000000001 >> ffff830256f27de8 ffff82d080120c10 >> (XEN) [2015-04-11 19:25:46.944] Xen call trace: >> (XEN) [2015-04-11 19:25:46.956] [<ffff82d080155d03>] >> iommu_pde_from_gfn+0x82/0x47a >> (XEN) [2015-04-11 19:25:46.979] [<ffff82d0801562a9>] >> amd_iommu_map_page+0x1ae/0x5ec >> (XEN) [2015-04-11 19:25:47.002] [<ffff82d08015a7f8>] >> arch_iommu_populate_page_table+0x164/0x4c3 >> (XEN) [2015-04-11 19:25:47.028] [<ffff82d08014c6e1>] >> iommu_do_pci_domctl+0x491/0x740 >> (XEN) [2015-04-11 19:25:47.051] [<ffff82d08014900b>] >> iommu_do_domctl+0x17/0x1a >> (XEN) [2015-04-11 19:25:47.073] [<ffff82d080161f2d>] >> arch_do_domctl+0x2469/0x26e1 >> (XEN) [2015-04-11 19:25:47.095] [<ffff82d08010497f>] >> do_domctl+0x1a1f/0x1d60 >> (XEN) [2015-04-11 19:25:47.116] [<ffff82d080234c6b>] >> syscall_enter+0xeb/0x145 >> (XEN) [2015-04-11 19:25:47.137] >> (XEN) [2015-04-11 19:25:47.146] >> (XEN) [2015-04-11 19:25:47.155] **************************************** >> (XEN) [2015-04-11 19:25:47.174] Panic on CPU 2: >> (XEN) [2015-04-11 19:25:47.187] Xen BUG at iommu_map.c:459 >> (XEN) [2015-04-11 19:25:47.203] **************************************** >> (XEN) [2015-04-11 19:25:47.222] >> (XEN) [2015-04-11 19:25:47.231] Reboot in five seconds... >> >> > Right - does this fix the issue for you? Affirmative :) It survives and the device seems to work properly as well, will do some more tests tomorrow. Thanks for tracking it down ! -- Sander > diff --git a/xen/drivers/passthrough/x86/iommu.c > b/xen/drivers/passthrough/x86/iommu.c > index 9eb8d33..6094ba1 100644 > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -56,8 +56,9 @@ int arch_iommu_populate_page_table(struct domain *d) > while ( !rc && (page = page_list_remove_head(&d->page_list)) ) > { > - if ( has_hvm_container_domain(d) || - (page->>u.inuse.type_info & PGT_type_mask) == > PGT_writable_page ) > + if ( (mfn_to_gmfn(d, page_to_mfn(page)) != INVALID_MFN) && > + (has_hvm_container_domain(d) || > + ((page->u.inuse.type_info & PGT_type_mask) == > PGT_writable_page)) ) > { > BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page)))); > rc = hd->platform_ops->map_page( _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |