[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] libxl: Wait until QEMU removed the device before tearing it down



On 11/14/2014 02:24 PM, Sander Eikelenboom wrote:
Friday, November 14, 2014, 7:07:46 PM, you wrote:

Sander Eikelenboom writes ("Re: [Xen-devel] [PATCH 1/2] libxl: Wait until QEMU 
removed the device before tearing it down"):
1) xc_physdev_unmap_pirq does get called when destroying a HVM guest.
Yes, but I think that is so only because of the block structure bug
introduced in abfb006f.
OK, if this code is only for PV guests, this problem could also still be
there for PV guests then (they can also enable MSI AFAIK).

What i saw was that when a guest enables MSI for a device, the value of
/sys/bus/pci/devices/<BDF>/irq in dom0 changes from the intx to the msi irq
value. This occurs after libxl has read the irq and has called:

When shutting down the guest disables MSI for that device and the value of
/sys/bus/pci/devices/<BDF>/irq in dom0 is changed back to the intx irq before
libxl reads it again and calls:

So there is no error.
However in the destroy case the guest doesn't disable MSI, so the value of
/sys/bus/pci/devices/<BDF>/irq in dom0 isn't changed, so libxl tries to operate
on the wrong value.

I don't know if it is a kernel bug that /sys/bus/pci/devices/<BDF>/irq changes
when MSI are enabled, or that libxl shouldn't depend on that value.

I am unable to reproduce irq value change. But I can see that if it changes you will see libxl complain since we'd be trying to unmap an irq that has not been mapped.

What device are you passing through?



2) When setting and updating the device states in xenstore for
a[...]
The rest of your message is difficult to read on my screen due to wrap
damage.  Can you wrap it to <75 characters ?
Hmm it could do with some restructuring an rephrasing as well, sorry for that,
hope it's a bit clearer now (combined note 2 and 3):

2) When setting and updating the device states in xenstore for
    a passed through device, libxl doesn't mimic the states that
    pciback and pcifront set and use in their "dance", very well.
    (see f.e. 
http://lists.xen.org/archives/html/xen-devel/2014-08/msg00970.html )

    When a HVM guest has multiple pci devices passed through,
    pciback doesn't currently get a signal *what* pci-device has
    been removed from the guest on a 'xl pci-detach'.
Pciback currently only has a watch on the whole "pci-root"
    of a guest in xenstore, so it only knows *something* has changed:
    - in or under the "pci-root" entry
    - In the PV guest case, the xenbus-state of a individual pci device
      node is set to a different state by pcifront, so pciback can take
      action on cleaning up / resetting this particular physical device.
    - In the HVM guest case however, the xenbus-states are don't mimic
      what pcifront does, so pciback doesn't do all the clean up and
      resetting on the physical device  when doing a "xl pci-detach".
      It *does* cleanup when *all* devices and there for the complete"
      "pci-root" entry for a device is yanked out of xenstore. That's why
      you don't get intro trouble when you shutdown a guest with multiple
      devices passed through. You also don't get into trouble when a
      single device is passed through and removed (the "pci-root" entry in
      xenstore is removed when the last pci device of a guest is removed
      from the guest.
      But you *do* get into trouble when hot-unplugging
      any but the last device from a guest which has multiple pci devices
      passed through to it. You also won't notice it while it is still
      owned by pciback. You do notice it when you try to do a
      "xl pci-assignable-remove".

I don't know about detach but I apparently can't even properly attach a second device --- I get complaints about it already being in xenstore. But device does show up in the guest.

And then I can't remove it because, well, it's not in xenstore.

Maybe it has something to do with the fact that both devices are virtual functions of the same physical device. I might look at this sometime next week.

-boris



      I tried to fix this my self .. but ran into trouble because when you
      signal pciback via xenstore of the intent of removing a device from the
      guest, you need a callback but you can also run in to "timeout" issues.
      Another issue was that on shutdown you will remove multiple devices in
      short succession and i ran into locking/race issues, so it clearly
      became something beyond my skills at that point.

Thanks,
Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.