[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
Hello Pasi, Unfortunately, I am not able to continue the work on the Xen patches in the foreseeable future. For what it's worth: the xen-pciback workaround from this thread solves my current issue as confirmed by internal testing. -- Stanislav (apologies for ugly footer injected below by company SMTP server due to local laws) On Thu, 2019-09-26 at 13:13 +0300, Pasi Kärkkäinen wrote: > Hello Stanislav, > > On Fri, Sep 13, 2019 at 11:28:20PM +0800, Chao Gao wrote: > > On Fri, Sep 13, 2019 at 10:02:24AM +0000, Spassov, Stanislav wrote: > > > On Thu, Dec 13, 2018 at 07:54, Chao Gao wrote: > > > > On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote: > > > > > > > > On 13.12.18 at 04:46, <chao.gao@xxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich > > > > > > wrote: > > > > > > > > > > On 12.12.18 at 16:18, <chao.gao@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich > > > > > > > > wrote: > > > > > > > > > > > > On 12.12.18 at 08:06, <chao.gao@xxxxxxxxx> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris > > > > > > > > > > Ostrovsky wrote: > > > > > > > > > > > On 12/5/18 4:32 AM, Roger Pau Monné wrote: > > > > > > > > > > > > On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao > > > > > > > > > > > > Gao wrote: > > > > > > > > > > > > > I find some pass-thru devices don't work any > > > > > > > > > > > > > more across guest reboot. > > > > > > > > > > > > > Assigning it to another guest also meets the > > > > > > > > > > > > > same issue. And the only > > > > > > > > > > > > > way to make it work again is un-binding and > > > > > > > > > > > > > binding it to pciback. > > > > > > > > > > > > > Someone reported this issue one year ago [1]. > > > > > > > > > > > > > More detail also can be > > > > > > > > > > > > > found in [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > The root-cause is Xen's internal MSI-X state > > > > > > > > > > > > > isn't reset properly > > > > > > > > > > > > > during reboot or re-assignment. In the above > > > > > > > > > > > > > case, Xen set maskall bit > > > > > > > > > > > > > to mask all MSI interrupts after it detected > > > > > > > > > > > > > a potential security > > > > > > > > > > > > > issue. Even after device reset, Xen didn't > > > > > > > > > > > > > reset its internal maskall > > > > > > > > > > > > > bit. As a result, maskall bit would be set > > > > > > > > > > > > > again in next write to > > > > > > > > > > > > > MSI-X message control register. > > > > > > > > > > > > > > > > > > > > > > > > > > Given that PHYSDEVOPS_prepare_msix() also > > > > > > > > > > > > > triggers Xen resetting MSI-X > > > > > > > > > > > > > internal state of a device, we employ it to > > > > > > > > > > > > > fix this issue rather than > > > > > > > > > > > > > introducing another dedicated sub-hypercall. > > > > > > > > > > > > > > > > > > > > > > > > > > Note that PHYSDEVOPS_release_msix() will fail > > > > > > > > > > > > > if the mapping between > > > > > > > > > > > > > the device's msix and pirq has been created. > > > > > > > > > > > > > This limitation prevents > > > > > > > > > > > > > us calling this function when detaching a > > > > > > > > > > > > > device from a guest during > > > > > > > > > > > > > guest shutdown. Thus it is called right > > > > > > > > > > > > > before calling > > > > > > > > > > > > > PHYSDEVOPS_prepare_msix(). > > > > > > > > > > > > > > > > > > > > > > > > s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then > > > > > > > > > > > > I would also drop the > > > > > > > > > > > > () at the end of the hypercall name since it's > > > > > > > > > > > > not a function. > > > > > > > > > > > > > > > > > > > > > > > > I'm also wondering why the release can't be > > > > > > > > > > > > done when the device is > > > > > > > > > > > > detached from the guest (or the guest has been > > > > > > > > > > > > shut down). This makes > > > > > > > > > > > > me worry about the raciness of the > > > > > > > > > > > > attach/detach procedure: if there's > > > > > > > > > > > > a state where pciback assumes the device has > > > > > > > > > > > > been detached from the > > > > > > > > > > > > guest, but there are still pirqs bound, an > > > > > > > > > > > > attempt to attach to > > > > > > > > > > > > another guest in such state will fail. > > > > > > > > > > > > > > > > > > > > > > I wonder whether this additional reset > > > > > > > > > > > functionality could be done out > > > > > > > > > > > of xen_pcibk_xenbus_remove(). We first do a (best > > > > > > > > > > > effort) device reset > > > > > > > > > > > and then do the extra things that are not > > > > > > > > > > > properly done there. > > > > > > > > > > > > > > > > > > > > No. It cannot be done in xen_pcibk_xenbus_remove() > > > > > > > > > > without modifying > > > > > > > > > > the handler of PHYSDEVOP_release_msix. To do a > > > > > > > > > > successful Xen internal > > > > > > > > > > MSI-X state reset, PHYSDEVOP_{release, > > > > > > > > > > prepare}_msix should be finished > > > > > > > > > > without error. But ATM, xen expects that no msi is > > > > > > > > > > bound to pirq when > > > > > > > > > > doing PHYSDEVOP_release_msix. Otherwise it fails > > > > > > > > > > with error code -EBUSY. > > > > > > > > > > However, the expectation isn't guaranteed in > > > > > > > > > > xen_pcibk_xenbus_remove(). > > > > > > > > > > In some cases, if qemu fails to unmap MSIs, MSIs > > > > > > > > > > are unmapped by Xen > > > > > > > > > > at last minute, which happens after device reset > > > > > > > > > > in > > > > > > > > > > xen_pcibk_xenbus_remove(). > > > > > > > > > > > > > > > > > > But that may need taking care of: I don't think it is > > > > > > > > > a good idea to have > > > > > > > > > anything left from the prior owning domain when the > > > > > > > > > device gets reset. > > > > > > > > > I.e. left over IRQ bindings should perhaps be > > > > > > > > > forcibly cleared before > > > > > > > > > invoking the reset; > > > > > > > > > > > > > > > > Agree. How about pciback to track the established IRQ > > > > > > > > bindings? Then > > > > > > > > pciback can clear irq binding before invoking the > > > > > > > > reset. > > > > > > > > > > > > > > How would pciback even know of those mappings, when it's > > > > > > > qemu > > > > > > > who establishes (and manages) them? > > > > > > > > > > > > I meant to expose some interfaces from pciback. And pciback > > > > > > serves > > > > > > as the proxy of IRQ (un)binding APIs. > > > > > > > > > > If at all possible we should avoid having to change more > > > > > parties (qemu, > > > > > libxc, kernel, hypervisor) than really necessary. Remember > > > > > that such > > > > > a bug fix may want backporting, and making sure affected > > > > > people have > > > > > all relevant components updated is increasingly difficult > > > > > with their > > > > > number growing. > > > > > > > > > > > > > > in fact I'd expect this to happen in the course of > > > > > > > > > domain destruction, and I'd expect the device reset > > > > > > > > > to come after the > > > > > > > > > domain was cleaned up. Perhaps simply an ordering > > > > > > > > > issue in the tool > > > > > > > > > stack? > > > > > > > > > > > > > > > > I don't think reversing the sequences of device reset > > > > > > > > and domain > > > > > > > > destruction would be simple. Furthermore, during device > > > > > > > > hot-unplug, > > > > > > > > device reset is done when the owner is alive. So if we > > > > > > > > use domain > > > > > > > > destruction to enforce all irq binding cleared, in > > > > > > > > theory, it won't be > > > > > > > > applicable to hot-unplug case (if qemu's hot-unplug > > > > > > > > logic is > > > > > > > > compromised). > > > > > > > > > > > > > > Even in the hot-unplug case the tool stack could issue > > > > > > > unbind > > > > > > > requests, behind the back of the possibly compromised > > > > > > > qemu, > > > > > > > once neither the guest nor qemu have access to the device > > > > > > > anymore. > > > > > > > > > > > > But currently, tool stack doesn't know the remaining IRQ > > > > > > bindings. > > > > > > If tool stack can maintaine IRQ binding information of a > > > > > > pass-thru > > > > > > device (stored in Xenstore?), we can come up with a clean > > > > > > solution > > > > > > without modifying linux kernel and Xen. > > > > > > > > > > If there's no way for the tool stack to either find out the > > > > > bindings > > > > > or "blindly" issue unbind requests (accepting them to fail), > > > > > then a > > > > > "wildcard" unbind operation may want adding. Or, perhaps even > > > > > better, XEN_DOMCTL_deassign_device could unbind anything left > > > > > in place for the specified device. > > > > > > > > Good idea. I will take this advice. > > > > > > > > Thanks > > > > Chao > > > > > > I am having the same issue, and cannot find a fix in either xen- > > > pciback or the Xen codebase. > > > Was a solution ever pushed as a result of this thread? > > > > > > > I submitted patches [1] to Xen community. But I didn't get it > > merged. > > We made a change in device driver to disable MSI-X during guest OS > > shutdown to mitigate the issue. But when guest or qemu was crashed, > > we > > encountered this issue again. I have no plan to get back to these > > patches. But if you want to fix the issue completely along what the > > patches below did, please go ahead. > > > > [1]: > > https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01227.html > > > > Thanks > > Chao > > > > Stanislav: Are you able to continue the work with these patches, to > get them merged? > > > Thanks, > > -- Pasi > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |