[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen pci-passthrough problem with pci-detach and pci-assignable-remove



On Fri, Jan 10, 2014 at 04:57:29PM +0100, Sander Eikelenboom wrote:
> 
> Friday, January 10, 2014, 4:12:18 PM, you wrote:
> 
> > On Fri, Jan 10, 2014 at 03:51:57PM +0100, Sander Eikelenboom wrote:
> >> Hi Konrad,
> >> 
> >> Normally i'm never reattaching pci devices to dom0, but at the moment i 
> >> have some use for it.
> >> 
> >> But it seems pci-detach isn't completely detaching the device from the 
> >> guest.
> >> 
> >> - Say i have a guest (HVM) with domid=2 and a pci device passedthrough 
> >> with bdf 00:19.0, the device is hidden on boot with 
> >> xen-pciback.hide=(00:19.0) in grub.
> >> 
> >> - Now i do a "xl pci-assignable-list"
> >>   This returns nothing, which is correct since all hidden devices have 
> >> already been assigned to guests.
> >> 
> >> - Then i do "xl -v pci-detach 2 00:19.0"
> >>   Which also returns nothing ...
> >> 
> >> - Now i do a "xl pci-assignable-list" again ..
> >>   This returns:
> >>   "0000:00:19.0"
> >>   So the pci-detach does seem to have done *something* :-)
> 
> > Or it thinks it has :-)
> 
> Well it has .. but probably not enough ;-)
> 
> >> 
> >> - But when now trying to remove the device from pciback to dom0 with "xl 
> >> pci-assignable-remove 00:19.0" it gives an error
> >>   and later it give some stacktraces ..
> >> 
> >>   xen_pciback: ****** removing device 0000:00:19.0 while still in-use! 
> >> ******
> >>   xen_pciback: ****** driver domain may still access this device's i/o 
> >> resources!
> >>   xen_pciback: ****** shutdown driver domain before binding device
> >>   xen_pciback: ****** to other drivers of domains
> 
> > What about /var/log/xen/qemu-dm* and the 'lspci' in the guest? Is the PCI 
> > device
> > removed from there?
> 
> Oeh i should have thought of that ...
> 
> in the guest i get a "e1000e 0000:00:06.0 removed PHC" and it's gone from 
> lspci ..
> in /var/log/xen/qemu-dm* .. i get nothing .. but i was using qemu-xen .. 
> which is totally non verbose ..
> 
> So let's try with qemu-xen-traditional .. which i also forgot to test ...
> 
> Which gives exact the same error / warning as above, but it has some output 
> in  /var/log/xen/qemu-dm*:
> 
> pt_msgctrl_reg_write: setup msi for dev 30
> pt_msi_setup: pt_msi_setup requested pirq = 54
> pt_msi_setup: msi mapped with pirq 36
> pt_msi_update: Update msi with pirq 36 gvec 0 gflags 3036
> pt_msgctrl_reg_write: setup msi for dev 28
> pt_msi_setup: pt_msi_setup requested pirq = 53
> pt_msi_setup: msi mapped with pirq 35
> pt_msi_update: Update msi with pirq 35 gvec 0 gflags 3035
> pt_msi_update: Update msi with pirq 36 gvec 0 gflags 3034
> dm-command: hot remove pass-through pci dev
> generate a sci for PHP.
> deassert due to disable GPE bit.
> ACPI:debug: write addr=0xb044, val=0x30.
> ACPI:debug: write addr=0xb045, val=0x3.
> ACPI:debug: write addr=0xb044, val=0x30.
> ACPI:debug: write addr=0xb045, val=0x88.
> ACPI PCI hotplug: write devfn=0x30.
> pci_intx: intx=1
> pci_intx: intx=1
> pt_msi_disable: Unbind msi with pirq 36, gvec 0
> pt_msi_disable: Unmap msi with pirq 36

Good, so the device is safely removed from the guest.
QEMU acted on 'libxl' command to remove it.

> 
> 
> 
> Also worth mentioninng is that the console on which the "xl 
> pci-assignable-remove 00:19.0" command is given, keeps hanging and eventually 
> the hungtask stacktrace will appear.
> 
> >> 
> >> 
> >> When i shut the guest down instead of using pci-detach, the "xl 
> >> pci-assignable-remove" works fine and i can rebind the device to it's 
> >> driver in dom0.
> >> 
> >> So am i misreading the wiki .. and is it not possible to detach a device 
> >> from a running domain or ... ?
> >> 
> >> Oh yes running xen-unstable and a 3.13-rc7 kernel
> 
> > Do you see the same issue with 'xend'?
> 
> Erhmmm haven't used that for what seems to be ages .. :-)

Heh.
> 
> Hmm i also forgot the hungtask stacktrace i get sometime after the "xl 
> pci-assignable-remove 00:19.0" ...


Wow. You just walked in a pile of bugs didn't you? And on Friday
nonethless.

> 
> It seems to be the pci_reset_function ...
> 
> [   52.099144] xen_bridge: port 4(vif2.0-emu) entered forwarding state
> [   55.683141] xen_bridge: port 1(vif1.0) entered forwarding state
> [   59.861385] xen-blkback:ring-ref 8, event-channel 22, protocol 1 
> (x86_64-abi) persistent grants
> [   66.043965] xen_bridge: port 3(vif2.0) entered forwarding state
> [   66.044549] xen_bridge: port 3(vif2.0) entered forwarding state
> [   81.091149] xen_bridge: port 3(vif2.0) entered forwarding state
> [  227.441191] xen_pciback: ****** removing device 0000:00:19.0 while still 
> in-use! ******
> [  227.443482] xen_pciback: ****** driver domain may still access this 
> device's i/o resources!
> [  227.445811] xen_pciback: ****** shutdown driver domain before binding 
> device
> [  227.447811] xen_pciback: ****** to other drivers or domains
> [  368.859343] INFO: task xl:3675 blocked for more than 120 seconds.
> [  368.860447]       Not tainted 3.13.0-rc7-20140110-creabox-nuc+ #1
> [  368.860990] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  368.861682] xl              D ffff88003fd93f00     0  3675   3489 
> 0x00000000
> [  368.862319]  ffff880038c0e880 0000000000000282 0000000000000000 
> ffff880038fd03d0
> [  368.863035]  0000000000013f00 0000000000013f00 ffff880038c0e880 
> ffff880036abffd8
> [  368.863802]  ffffffff81087ac6 ffff88003a0f00f8 ffff88003a0f00fc 
> ffff880038c0e880
> [  368.864514] Call Trace:
> [  368.864744]  [<ffffffff81087ac6>] ? mutex_spin_on_owner+0x38/0x45
> [  368.865273]  [<ffffffff818e5e22>] ? schedule_preempt_disabled+0x6/0x9
> [  368.865851]  [<ffffffff818e7034>] ? __mutex_lock_slowpath+0x159/0x1b5
> [  368.866409]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
> [  368.866892]  [<ffffffff8135972d>] ? pci_reset_function+0x26/0x4e
> [  368.867430]  [<ffffffff818e7dc1>] ? _raw_spin_lock_irqsave+0x14/0x36
> [  368.867996]  [<ffffffff818e7238>] ? down_write+0x9/0x26
> [  368.868467]  [<ffffffff813f1863>] ? pcistub_put_pci_dev+0x7b/0xe0
> [  368.868991]  [<ffffffff813f14a7>] ? pcistub_remove+0xd0/0x127
> [  368.869506]  [<ffffffff8135b5b8>] ? pci_device_remove+0x38/0x83
> [  368.870017]  [<ffffffff814cb37f>] ? __device_release_driver+0x82/0xdb
> [  368.870593]  [<ffffffff814cb602>] ? device_release_driver+0x1a/0x25
> [  368.871152]  [<ffffffff814ca993>] ? unbind_store+0x59/0x89
> [  368.871659]  [<ffffffff81178aa0>] ? sysfs_write_file+0x13f/0x18f
> [  368.872173]  [<ffffffff81122aa6>] ? vfs_write+0x95/0xfb
> [  368.872641]  [<ffffffff81122d8a>] ? SyS_write+0x51/0x85
> [  368.873087]  [<ffffffff818ed179>] ? system_call_fastpath+0x16/0x1b
> [  488.871331] INFO: task xl:3675 blocked for more than 120 seconds.
> [  488.913929]       Not tainted 3.13.0-rc7-20140110-creabox-nuc+ #1
> [  488.937031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  488.960945] xl              D ffff88003fd93f00     0  3675   3489 
> 0x00000004
> [  488.986090]  ffff880038c0e880 0000000000000282 0000000000000000 
> ffff880038fd03d0
> [  489.010383]  0000000000013f00 0000000000013f00 ffff880038c0e880 
> ffff880036abffd8
> [  489.034456]  ffffffff81087ac6 ffff88003a0f00f8 ffff88003a0f00fc 
> ffff880038c0e880
> [  489.058621] Call Trace:
> [  489.082358]  [<ffffffff81087ac6>] ? mutex_spin_on_owner+0x38/0x45
> [  489.106272]  [<ffffffff818e5e22>] ? schedule_preempt_disabled+0x6/0x9
> [  489.130158]  [<ffffffff818e7034>] ? __mutex_lock_slowpath+0x159/0x1b5
> [  489.154147]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
> [  489.177890]  [<ffffffff8135972d>] ? pci_reset_function+0x26/0x4e

Yeah, that bug my RFC patchset (the one that does the slot/bus reset) should 
also fix.
I totally forgot about it !


I hope.

> [  489.200927]  [<ffffffff818e7dc1>] ? _raw_spin_lock_irqsave+0x14/0x36
> [  489.224076]  [<ffffffff818e7238>] ? down_write+0x9/0x26
> [  489.246898]  [<ffffffff813f1863>] ? pcistub_put_pci_dev+0x7b/0xe0
> [  489.270086]  [<ffffffff813f14a7>] ? pcistub_remove+0xd0/0x127
> [  489.293053]  [<ffffffff8135b5b8>] ? pci_device_remove+0x38/0x83
> [  489.316068]  [<ffffffff814cb37f>] ? __device_release_driver+0x82/0xdb
> [  489.338896]  [<ffffffff814cb602>] ? device_release_driver+0x1a/0x25
> [  489.362459]  [<ffffffff814ca993>] ? unbind_store+0x59/0x89
> [  489.385396]  [<ffffffff81178aa0>] ? sysfs_write_file+0x13f/0x18f
> [  489.408605]  [<ffffffff81122aa6>] ? vfs_write+0x95/0xfb
> [  489.431407]  [<ffffffff81122d8a>] ? SyS_write+0x51/0x85
> [  489.454251]  [<ffffffff818ed179>] ? system_call_fastpath+0x16/0x1b
> 
> 
> >> 
> >> --
> >> Sander
> >> 
> >> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.