[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen pci-passthrough problem with pci-detach and pci-assignable-remove



Wednesday, April 16, 2014, 5:30:57 PM, you wrote:

> On Wed, Apr 02, 2014 at 12:43:12PM +0200, Sander Eikelenboom wrote:
>> 
>> Tuesday, April 1, 2014, 6:13:09 PM, you wrote:
>> 
>> > On Thu, Feb 20, 2014 at 05:18:46PM +0100, Sander Eikelenboom wrote:
>> >> 
>> >> Thursday, February 20, 2014, 9:53:59 AM, you wrote:
>> >> 
>> >> 
>> >> > Friday, January 24, 2014, 6:48:06 PM, you wrote:
>> >> 
>> >> >> On Fri, Jan 24, 2014 at 02:36:02PM +0100, Sander Eikelenboom wrote:
>> >> >>> 
>> >> >>> Friday, January 10, 2014, 6:38:10 PM, you wrote:
>> >> >>> 
>> >> >>> >> > Wow. You just walked in a pile of bugs didn't you? And on Friday
>> >> >>> >> > nonethless.
>> >> >>> >> 
>> >> >>> >> As usual ;-)
>> >> >>> 
>> >> >>> > Ha!
>> >> >>> > ..snip..
>> >> >>> >> >> [  489.082358]  [<ffffffff81087ac6>] ? 
>> >> >>> >> >> mutex_spin_on_owner+0x38/0x45
>> >> >>> >> >> [  489.106272]  [<ffffffff818e5e22>] ? 
>> >> >>> >> >> schedule_preempt_disabled+0x6/0x9
>> >> >>> >> >> [  489.130158]  [<ffffffff818e7034>] ? 
>> >> >>> >> >> __mutex_lock_slowpath+0x159/0x1b5
>> >> >>> >> >> [  489.154147]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
>> >> >>> >> >> [  489.177890]  [<ffffffff8135972d>] ? 
>> >> >>> >> >> pci_reset_function+0x26/0x4e
>> >> >>> >> 
>> >> >>> >> > Yeah, that bug my RFC patchset (the one that does the slot/bus 
>> >> >>> >> > reset) should also fix.
>> >> >>> >> > I totally forgot about it !
>> >> >>> >> 
>> >> >>> >> Got a link to that patchset ?
>> >> >>> 
>> >> >>> > https://lkml.org/lkml/2013/12/13/315
>> >> >>> 
>> >> >>> >> I at least could give it a spin .. you never know when fortune is 
>> >> >>> >> on your side :-)
>> >> >>> 
>> >> >>> > It is also at this git tree:
>> >> >>> 
>> >> >>> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git and the
>> >> >>> > branch name is "devel/xen-pciback.slot_and_bus.v0". You will likely
>> >> >>> > want to merge it in your current Linus tree.
>> >> >>> 
>> >> >>> > Thank you!
>> >> >>> 
>> >> >>> 
>> >> >>> Hi Konrad,
>> >> >>> 
>> >> >>> Just got time to test this some more, when merging this branch 
>> >> >>> *except* the last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd)
>> >> >>> seems to help with my problem,i'm no capable of using:
>> >> >>> - xl pci-detach
>> >> >>> - xl pci-assignable-remove
>> >> >>> - echo "BDF" > /sys/bus/pci/drivers/<devicename>/bind
>> >> >>> 
>> >> >>> to remove a pci device from a running HVM guest and rebinding it to a 
>> >> >>> driver in dom0 without those nasty stacktraces :-)
>> >> >>> So the first 4 seem to be an improvement.
>> >> >>> 
>> >> >>> That last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd) seems to 
>> >> >>> give troubles of it's own.
>> >> 
>> >> >> Could you email me your lspci output and also which devices you 
>> >> >> move/switch etc?
>> >> 
>> >> > Hi Konrad,
>> >> 
>> >> > At the moment i found some time to figure out what goes wrong with the 
>> >> > xl pci-detach and xl pci-assignable-remove, i have been
>> >> > able to narrow it down a bit:
>> >> 
>> >> > The problem only occurs when you:
>> >> > - passthrough 2 (or more?) pci devices assigned to a guest ..
>> >> > - and only remove 1 of those devices with "xl pci-detach" followed by a 
>> >> > "xl pci-assignable-remove"
>> >> > - when you first detach both devices with "xl pci-detach" before doing 
>> >> > the "xl pci-assignable-remove" it works ok.
>> >> 
>> >> > In my case i'm passingthrough 2 devices (02:00.0 and 00:19.0)
>> >> 
>> >> > I added some printk's and what i found out is that:
>> >> > - after doing the pci-detach of 02:00.0, it doesn't call 
>> >> > pcistub_put_pci_dev for that device ...
>> >> > - but when i subsequently pci-detach the second (and last) device 
>> >> > 00:19.0 .. it does call it for both 02:00.0 and 00:19.0 ...
>> >> > - so somehow that call for the first detached device gets deferred .. 
>> >> > but since it are different devices and not functions of the same device 
>> >> > i don't
>> >> >   see any reason for it to wait until all other devices would have been 
>> >> > detached ...
>> >> 
>> >> 
>> >> > I tried to capture the console output but some how that didn't work 
>> >> > out, so i attached a screenshot of what happens when:
>> >> > - doing a xl pci-list for the guest
>> >> > - doing a xl pci-assignable-list
>> >> 
>> >> > - doing the xl pci-detach for 02:00.0
>> >> 
>> >> > - doing a xl pci-list for the guest
>> >> > - doing a xl pci-assignable-list
>> >> 
>> >> > - waiting some time ...
>> >> 
>> >> > - doing the xl pci-detach for 00:19.0
>> >> 
>> >> > - doing a xl pci-list for the guest
>> >> > - doing a xl pci-assignable-list
>> >> 
>> >> > There you can see this strange sequence of events :-)
>> >> 
>> >> > But i haven't been able to spot the culprit
>> >> 
>> >> Enabled some extra debugging and added some more printk's .. (see new 
>> >> screenshot)
>> >> 
>> >> From what it seems .. the frontend state for the first device isn't 
>> >> changed on the first pci-detach ..
>> >> 
>> >> Is the signaling on pci-detach the guests (pcifront) responsibility or 
>> >> the toolstacks (libxl) ?
>> 
>> > It usually is pcifront. And in the screenshot I see:
>> > .. frontend is gone! unregister device
>> > which should trigger the process. And it does look to do that.
>> > Hm, I am wondering what the toolstack is waiting for.
>> > Time to debug.
>> 
>> Ok thx :-)

> Just to make sure - you are not using the xen-pciback.hide parameter right?
> Just doing the /sysfs dance of 'echo BDF'> to various places.

Nope, i always use xen-pciback.hide .. 
And normally i only create, shutdown or destroy guests .. and all goes well.

As said .. it only crashes the host when you detach *not all* of the devices 
from a 
guest but only part of them. (so with one device .. also never a problem).

Some how the detach isn't completed when there are still other devices attached.

(what i'm trying to do is give back the only ethernet nic the machine has 
...back to dom0 ..
 and leave the wireless NIC passed through to the openwrt router guest .. 
 the whole detach and rebind works perfectly .. as long as it's the only device 
 passed through to the guest .. so yes .. this is going to work ;-) )



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.