[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen pci-passthrough problem with pci-detach and pci-assignable-remove



Tuesday, April 1, 2014, 6:13:09 PM, you wrote:

> On Thu, Feb 20, 2014 at 05:18:46PM +0100, Sander Eikelenboom wrote:
>> 
>> Thursday, February 20, 2014, 9:53:59 AM, you wrote:
>> 
>> 
>> > Friday, January 24, 2014, 6:48:06 PM, you wrote:
>> 
>> >> On Fri, Jan 24, 2014 at 02:36:02PM +0100, Sander Eikelenboom wrote:
>> >>> 
>> >>> Friday, January 10, 2014, 6:38:10 PM, you wrote:
>> >>> 
>> >>> >> > Wow. You just walked in a pile of bugs didn't you? And on Friday
>> >>> >> > nonethless.
>> >>> >> 
>> >>> >> As usual ;-)
>> >>> 
>> >>> > Ha!
>> >>> > ..snip..
>> >>> >> >> [  489.082358]  [<ffffffff81087ac6>] ? 
>> >>> >> >> mutex_spin_on_owner+0x38/0x45
>> >>> >> >> [  489.106272]  [<ffffffff818e5e22>] ? 
>> >>> >> >> schedule_preempt_disabled+0x6/0x9
>> >>> >> >> [  489.130158]  [<ffffffff818e7034>] ? 
>> >>> >> >> __mutex_lock_slowpath+0x159/0x1b5
>> >>> >> >> [  489.154147]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
>> >>> >> >> [  489.177890]  [<ffffffff8135972d>] ? pci_reset_function+0x26/0x4e
>> >>> >> 
>> >>> >> > Yeah, that bug my RFC patchset (the one that does the slot/bus 
>> >>> >> > reset) should also fix.
>> >>> >> > I totally forgot about it !
>> >>> >> 
>> >>> >> Got a link to that patchset ?
>> >>> 
>> >>> > https://lkml.org/lkml/2013/12/13/315
>> >>> 
>> >>> >> I at least could give it a spin .. you never know when fortune is on 
>> >>> >> your side :-)
>> >>> 
>> >>> > It is also at this git tree:
>> >>> 
>> >>> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git and the
>> >>> > branch name is "devel/xen-pciback.slot_and_bus.v0". You will likely
>> >>> > want to merge it in your current Linus tree.
>> >>> 
>> >>> > Thank you!
>> >>> 
>> >>> 
>> >>> Hi Konrad,
>> >>> 
>> >>> Just got time to test this some more, when merging this branch *except* 
>> >>> the last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd)
>> >>> seems to help with my problem,i'm no capable of using:
>> >>> - xl pci-detach
>> >>> - xl pci-assignable-remove
>> >>> - echo "BDF" > /sys/bus/pci/drivers/<devicename>/bind
>> >>> 
>> >>> to remove a pci device from a running HVM guest and rebinding it to a 
>> >>> driver in dom0 without those nasty stacktraces :-)
>> >>> So the first 4 seem to be an improvement.
>> >>> 
>> >>> That last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd) seems to 
>> >>> give troubles of it's own.
>> 
>> >> Could you email me your lspci output and also which devices you 
>> >> move/switch etc?
>> 
>> > Hi Konrad,
>> 
>> > At the moment i found some time to figure out what goes wrong with the xl 
>> > pci-detach and xl pci-assignable-remove, i have been
>> > able to narrow it down a bit:
>> 
>> > The problem only occurs when you:
>> > - passthrough 2 (or more?) pci devices assigned to a guest ..
>> > - and only remove 1 of those devices with "xl pci-detach" followed by a 
>> > "xl pci-assignable-remove"
>> > - when you first detach both devices with "xl pci-detach" before doing the 
>> > "xl pci-assignable-remove" it works ok.
>> 
>> > In my case i'm passingthrough 2 devices (02:00.0 and 00:19.0)
>> 
>> > I added some printk's and what i found out is that:
>> > - after doing the pci-detach of 02:00.0, it doesn't call 
>> > pcistub_put_pci_dev for that device ...
>> > - but when i subsequently pci-detach the second (and last) device 00:19.0 
>> > .. it does call it for both 02:00.0 and 00:19.0 ...
>> > - so somehow that call for the first detached device gets deferred .. but 
>> > since it are different devices and not functions of the same device i don't
>> >   see any reason for it to wait until all other devices would have been 
>> > detached ...
>> 
>> 
>> > I tried to capture the console output but some how that didn't work out, 
>> > so i attached a screenshot of what happens when:
>> > - doing a xl pci-list for the guest
>> > - doing a xl pci-assignable-list
>> 
>> > - doing the xl pci-detach for 02:00.0
>> 
>> > - doing a xl pci-list for the guest
>> > - doing a xl pci-assignable-list
>> 
>> > - waiting some time ...
>> 
>> > - doing the xl pci-detach for 00:19.0
>> 
>> > - doing a xl pci-list for the guest
>> > - doing a xl pci-assignable-list
>> 
>> > There you can see this strange sequence of events :-)
>> 
>> > But i haven't been able to spot the culprit
>> 
>> Enabled some extra debugging and added some more printk's .. (see new 
>> screenshot)
>> 
>> From what it seems .. the frontend state for the first device isn't changed 
>> on the first pci-detach ..
>> 
>> Is the signaling on pci-detach the guests (pcifront) responsibility or the 
>> toolstacks (libxl) ?

> It usually is pcifront. And in the screenshot I see:
> .. frontend is gone! unregister device
> which should trigger the process. And it does look to do that.
> Hm, I am wondering what the toolstack is waiting for.
> Time to debug.

Ok thx :-)


>> 
>> 
>> 
>> > attached: screenshot.jpg

> and thanks for the screenshot (didn't have copy-n-paste option handy :-))

Well i didn't have KVM/SOL working on the intel NUC .. busy with that today .. 
it's a nifty little machine ..
just got to get the AMT/vPro stuff working. So although second best, the 
screenshot was all i had at that moment.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.