|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 7:35:57 PM, you wrote:
> On 11/04/15 18:25, Sander Eikelenboom wrote:
>> Saturday, April 11, 2015, 6:38:17 PM, you wrote:
>>
>>> On 11/04/15 17:32, Andrew Cooper wrote:
>>>> On 11/04/15 17:21, Sander Eikelenboom wrote:
>>>>> Saturday, April 11, 2015, 4:21:56 PM, you wrote:
>>>>>
>>>>>> On 11/04/15 15:11, Sander Eikelenboom wrote:
>>>>>>> Friday, April 10, 2015, 8:55:27 PM, you wrote:
>>>>>>>
>>>>>>>> On 10/04/15 11:24, Sander Eikelenboom wrote:
>>>>>>>>> Hi Andrew,
>>>>>>>>>
>>>>>>>>> Finally got some time to figure this out .. and i have narrowed it
>>>>>>>>> down to:
>>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git
>>>>>>>>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the
>>>>>>>>> ioreq-server API when available"
>>>>>>>>> A straight revert of this commit prevents the issue from happening.
>>>>>>>>>
>>>>>>>>> The reason i had a hard time figuring this out was:
>>>>>>>>> - I wasn't aware of this earlier, since git pulling the main xen
>>>>>>>>> tree, doesn't
>>>>>>>>> auto update the qemu-* trees.
>>>>>>>> This has caught me out so many times. It is very non-obvious
>>>>>>>> behaviour.
>>>>>>>>> - So i happen to get this when i cloned a fresh tree to try to figure
>>>>>>>>> out the
>>>>>>>>> other issue i was seeing.
>>>>>>>>> - After that checking out previous versions of the main xen tree
>>>>>>>>> didn't resolve
>>>>>>>>> this new issue, because the qemu tree doesn't get auto updated and
>>>>>>>>> is set
>>>>>>>>> "master".
>>>>>>>>> - Cloning a xen-stable-4.5.0 made it go away .. because that has a
>>>>>>>>> specific
>>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which
>>>>>>>>> is not
>>>>>>>>> master.
>>>>>>>>>
>>>>>>>>> *sigh*
>>>>>>>>>
>>>>>>>>> This is tested with xen main tree at last commit
>>>>>>>>> 3a28f760508fb35c430edac17a9efde5aff6d1d5
>>>>>>>>> (normal xen-unstable, not the staging branch)
>>>>>>>>>
>>>>>>>>> Ok so i have added some extra debug info (see attached diff) and this
>>>>>>>>> is the
>>>>>>>>> output when it crashes due to something the commit above triggered,
>>>>>>>>> the
>>>>>>>>> level is out of bounds and the pfn looks fishy too.
>>>>>>>>> Complete serial log from both bad and good (specific commit reverted)
>>>>>>>>> are
>>>>>>>>> attached.
>>>>>>>> Just to confirm, you are positively identifying a qemu changeset as
>>>>>>>> causing this crash?
>>>>>>>> If so, the qemu change has discovered a pre-existing issue in the
>>>>>>>> toolstack pci-passthrough interface. Whatever qemu is or isn't doing,
>>>>>>>> it should not be able to cause a crash like this.
>>>>>>>> With this in mind, I need to brush up on my AMD-Vi details.
>>>>>>>> In the meantime, can you run with the following patch to identify what
>>>>>>>> is going on, domctl wise? I assume it is the assign_device which is
>>>>>>>> failing, but it will be nice to observe the differences between the
>>>>>>>> working and failing case, which might offer a hint.
>>>>>>> Hrrm with your patch i end up with a fatal page fault in
>>>>>>> iommu_do_pci_domctl:
>>>>>>>
>>>>>>> (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable x86_64 debug=y
>>>>>>> Tainted: C ]----
>>>>>>> (XEN) [2015-04-11 14:03:31.857] CPU: 5
>>>>>>> (XEN) [2015-04-11 14:03:31.868] RIP: e008:[<ffff82d08014c52c>]
>>>>>>> iommu_do_pci_domctl+0x2dc/0x740
>>>>>>> (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256 CONTEXT:
>>>>>>> hypervisor
>>>>>>> (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008 rbx:
>>>>>>> 0000000000000800 rcx: ffffffffffebe5ed
>>>>>>> (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800 rsi:
>>>>>>> 0000000000000000 rdi: ffff830256ef7e38
>>>>>>> (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98 rsp:
>>>>>>> ffff830256ef7c08 r8: 00000000deadbeef
>>>>>>> (XEN) [2015-04-11 14:03:31.995] r9: 00000000deadbeef r10:
>>>>>>> ffff82d08024e500 r11: 0000000000000282
>>>>>>> (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000 r13:
>>>>>>> 0000000000000008 r14: 0000000000000000
>>>>>>> (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000 cr0:
>>>>>>> 0000000080050033 cr4: 00000000000006f0
>>>>>>> (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000 cr2:
>>>>>>> 0000000000000000
>>>>>>> (XEN) [2015-04-11 14:03:32.096] ds: 0000 es: 0000 fs: 0000 gs:
>>>>>>> 0000 ss: e010 cs: e008
>>>>>>> (XEN) [2015-04-11 14:03:32.121] Xen stack trace from
>>>>>>> rsp=ffff830256ef7c08:
>>>>>>> (XEN) [2015-04-11 14:03:32.141] ffff830256ef7c78 ffff82d08012c178
>>>>>>> ffff830256ef7c28 ffff830256ef7c28
>>>>>>> (XEN) [2015-04-11 14:03:32.168] 0000000000000010 0000000000000000
>>>>>>> 0000000000000000 0000000000000000
>>>>>>> (XEN) [2015-04-11 14:03:32.195] 00000000000006f0 00007fe300000000
>>>>>>> ffff830256eb7790 ffff83025cc6d300
>>>>>>> (XEN) [2015-04-11 14:03:32.222] ffff82d080330c60 00007fe396bab004
>>>>>>> 0000000000000000 00007fe396bab004
>>>>>>> (XEN) [2015-04-11 14:03:32.249] 0000000000000000 0000000000000005
>>>>>>> ffff830256ef7ca8 ffff82d08014900b
>>>>>>> (XEN) [2015-04-11 14:03:32.276] ffff830256ef7d98 ffff82d080161f2d
>>>>>>> 0000000000000010 0000000000000000
>>>>>>> (XEN) [2015-04-11 14:03:32.303] 0000000000000000 ffff830256ef7ce8
>>>>>>> ffff82d08018b655 ffff830256ef7d48
>>>>>>> (XEN) [2015-04-11 14:03:32.330] ffff830256ef7cf8 ffff82d08018b66a
>>>>>>> ffff830256ef7d38 ffff82d08012925e
>>>>>>> (XEN) [2015-04-11 14:03:32.357] ffff830256efc068 0000000800000001
>>>>>>> 800000022e12c167 0000000000000000
>>>>>>> (XEN) [2015-04-11 14:03:32.384] 0000000000000002 ffff830256ef7e38
>>>>>>> 0000000800000000 800000022e12c167
>>>>>>> (XEN) [2015-04-11 14:03:32.411] 0000000000000003 ffff830256ef7db8
>>>>>>> 0000000000000000 00007fe396780eb0
>>>>>>> (XEN) [2015-04-11 14:03:32.439] 0000000000000202 ffffffffffffffff
>>>>>>> 0000000000000000 00007fe396bab004
>>>>>>> (XEN) [2015-04-11 14:03:32.466] 0000000000000000 0000000000000005
>>>>>>> ffff830256ef7ef8 ffff82d08010497f
>>>>>>> (XEN) [2015-04-11 14:03:32.493] 0000000000000001 0000000000100001
>>>>>>> 800000022e12c167 ffff88001f7ecc00
>>>>>>> (XEN) [2015-04-11 14:03:32.520] 00007fe396780eb0 ffff88001c849508
>>>>>>> 0000000e00000007 ffffffff8105594a
>>>>>>> (XEN) [2015-04-11 14:03:32.547] 000000000000e033 0000000000000202
>>>>>>> ffff88001ece3d40 000000000000e02b
>>>>>>> (XEN) [2015-04-11 14:03:32.574] ffff830256ef7e28 ffff82d080194933
>>>>>>> 000000000000beef ffffffff81bd6c85
>>>>>>> (XEN) [2015-04-11 14:03:32.601] ffff830256ef7f08 ffff82d080193edd
>>>>>>> 0000000b0000002d 0000000000000001
>>>>>>> (XEN) [2015-04-11 14:03:32.628] 0000000100000800 00007fe3962abbd0
>>>>>>> ffff000a81050001 00007fe39656ce6e
>>>>>>> (XEN) [2015-04-11 14:03:32.655] 00007ffdf2a654f0 00007fe39656d0c9
>>>>>>> 00007fe39656ce6e 00007fe3969a9a55
>>>>>>> (XEN) [2015-04-11 14:03:32.682] Xen call trace:
>>>>>>> (XEN) [2015-04-11 14:03:32.695] [<ffff82d08014c52c>]
>>>>>>> iommu_do_pci_domctl+0x2dc/0x740
>>>>>>> (XEN) [2015-04-11 14:03:32.718] [<ffff82d08014900b>]
>>>>>>> iommu_do_domctl+0x17/0x1a
>>>>>>> (XEN) [2015-04-11 14:03:32.739] [<ffff82d080161f2d>]
>>>>>>> arch_do_domctl+0x2469/0x26e1
>>>>>>> (XEN) [2015-04-11 14:03:32.762] [<ffff82d08010497f>]
>>>>>>> do_domctl+0x1a1f/0x1d60
>>>>>>> (XEN) [2015-04-11 14:03:32.783] [<ffff82d080234c6b>]
>>>>>>> syscall_enter+0xeb/0x145
>>>>>>> (XEN) [2015-04-11 14:03:32.804]
>>>>>>> (XEN) [2015-04-11 14:03:32.813] Pagetable walk from 0000000000000000:
>>>>>>> (XEN) [2015-04-11 14:03:32.831] L4[0x000] = 0000000234075067
>>>>>>> 000000000001f2a8
>>>>>>> (XEN) [2015-04-11 14:03:32.852] L3[0x000] = 0000000229ad4067
>>>>>>> 0000000000014c49
>>>>>>> (XEN) [2015-04-11 14:03:32.873] L2[0x000] = 0000000000000000
>>>>>>> ffffffffffffffff
>>>>>>> (XEN) [2015-04-11 14:03:32.894]
>>>>>>> (XEN) [2015-04-11 14:03:32.903] ****************************************
>>>>>>> (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5:
>>>>>>> (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT
>>>>>>> (XEN) [2015-04-11 14:03:32.948] [error_code=0000]
>>>>>>> (XEN) [2015-04-11 14:03:32.961] Faulting linear address:
>>>>>>> 0000000000000000
>>>>>>> (XEN) [2015-04-11 14:03:32.981] ****************************************
>>>>>>> (XEN) [2015-04-11 14:03:33.000]
>>>>>>> (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds...
>>>>>>>
>>>>>>> The RIP resolves to the prink added by your patch in:
>>>>>>>
>>>>>>> case XEN_DOMCTL_test_assign_device:
>>>>>>> ret = xsm_test_assign_device(XSM_HOOK,
>>>>>>> domctl->u.assign_device.machine_sbdf);
>>>>>>> if ( ret )
>>>>>>> break;
>>>>>>>
>>>>>>> seg = domctl->u.assign_device.machine_sbdf >> 16;
>>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>>
>>>>>>> printk("*** %pv->d%d:
>>>>>>> test_assign_device({%04x:%02x:%02x.%u})\n",
>>>>>>> current, d->domain_id,
>>>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>>>
>>>>>>> if ( device_assigned(seg, bus, devfn) )
>>>>>>> {
>>>>>>> printk(XENLOG_G_INFO
>>>>>>> "%04x:%02x:%02x.%u already assigned, or
>>>>>>> non-existent\n",
>>>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>>> ret = -EINVAL;
>>>>>>> }
>>>>>>> break;
>>>>>> hmm - 'd' is NULL. This ought to work better.
>>>>>> diff --git a/xen/drivers/passthrough/pci.c
>>>>>> b/xen/drivers/passthrough/pci.c
>>>>>> index 9f3413c..85ff1fc 100644
>>>>>> --- a/xen/drivers/passthrough/pci.c
>>>>>> +++ b/xen/drivers/passthrough/pci.c
>>>>>> @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl(
>>>>>> max_sdevs = domctl->u.get_device_group.max_sdevs;
>>>>>> sdevs = domctl->u.get_device_group.sdev_array;
>>>>>>
>>>>>> + printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u,
>>>>>> %u})\n",
>>>>>> + current, d->domain_id,
>>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>>>>> + max_sdevs);
>>>>>> +
>>>>>> ret = iommu_get_device_group(d, seg, bus, devfn, sdevs,
>>>>>> max_sdevs);
>>>>>> if ( ret < 0 )
>>>>>> {
>>>>>> @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl(
>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>
>>>>>> + printk("*** %pv: test_assign_device({%04x:%02x:%02x.%u})\n",
>>>>>> + current, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>> +
>>>>>> if ( device_assigned(seg, bus, devfn) )
>>>>>> {
>>>>>> printk(XENLOG_G_INFO
>>>>>> @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl(
>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>
>>>>>> + printk("*** %pv->d%d: assign_device({%04x:%02x:%02x.%u})\n",
>>>>>> + current, d->domain_id,
>>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>> +
>>>>>> ret = device_assigned(seg, bus, devfn) ?:
>>>>>> assign_device(d, seg, bus, devfn);
>>>>>> if ( ret == -ERESTART )
>>>>>> @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl(
>>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>
>>>>>> + printk("*** %pv->d%d: deassign_device({%04x:%02x:%02x.%u})\n",
>>>>>> + current, d->domain_id,
>>>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>> +
>>>>>> spin_lock(&pcidevs_lock);
>>>>>> ret = deassign_device(d, seg, bus, devfn);
>>>>>> spin_unlock(&pcidevs_lock);
>>>>> Hi Andrew,
>>>>>
>>>>> Attached are the serial logs good (with revert) and bad (without):
>>>>>
>>>>> Some things that seems strange to me:
>>>>> - The numerous calls to get the device 08:00.0 assigned ... for 0a:00.0
>>>>> there
>>>>> was only one call to both test assign and assign.
>>>>> - However these numerous calls are there in both the good and the bad
>>>>> case,
>>>>> so perhaps it's strange and wrong .. but not the cause ..
>>>>> - I had a hunch it could be due to the 08:00.0 using MSI-X, but when only
>>>>> passing through 0a:00.0, i get the same numerous calls but now for the
>>>>> 0a:00.0 which uses IntX, so I think that is more related to being the
>>>>> *first*
>>>>> device to be passed through to a guest.
>>>> I have also observed this behaviour, but not had time to investigate.
>>>> It doesn't appear problematic in the longrun but it probably a toolstack
>>>> issue which wants fixing (if only in the name of efficiency).
>>> And just after I sent this email, I have realised why.
>>> The first assign device will have to build IO pagetables, which is a
>>> long operation and subject to hypercall continuations. The second
>>> device will reused the same pagetables, so is quick to complete.
>> So .. is the ioreq patch from Paul involved in providing something used in
>> building
>> the pagetables .. and could it have say some off by one resulting in the
>> 0xffffffffffff .. which could lead to the pagetable building going beserk,
>> requiring a paging_mode far greater than normally would be required .. which
>> get's set .. since that isn't checked properly .. leading to things breaking
>> a bit further when it does get checked ?
> A -1 is slipping in somewhere and ending up in the gfn field.
> The result is that update_paging_mode() attempts to construct
> iopagetables to cover a 76bit address space, which is how level ends up
> at 8. (Note that a level of 7 is reserved, and a level of anything
> greater than 4 is implausible on your system.)
> I think the crash is collateral damage following on from
> update_paging_mode() not properly sanitising its input, but that there
> is still some other issue causing -1 to be passed in the first place.
> I am still trying to locate where a -1 might plausibly be coming from.
I have just added some extra debug code to store the values from the start
of update_paging_mode() .. so i can print them at the end if the paging_mode
gets out of band and do a dump_stack() as well. Hopefully is will confirm this.
--
Sander
> ~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |