[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/pt: fix some pass-thru devices don't work across reboot
>>> On 12.11.18 at 05:58, <chao.gao@xxxxxxxxx> wrote: > On Fri, Nov 09, 2018 at 02:14:04AM -0700, Jan Beulich wrote: >>>>> On 09.11.18 at 01:11, <chao.gao@xxxxxxxxx> wrote: >>> I find some pass-thru devices don't work any more across guest >>> reboot. Assigning it to another domain also meets the same issue. And >>> the only way to make it work again is un-binding and binding it to >>> pciback. Someone reported this issue one year ago [1]. >> >>I find "some" above and in the title misleading. According to the >>following description, the issue ought to affect exactly all MSI-X >>capable devices. > > Some devices whose driver doesn't disable MSI-X when shutdown. But Xen can't > rely on the guest driver to do this. The please clarify this in the description. > That's why I think this issue should be handled in Xen. I fully agree. >>Next I'm weary of you clearing the device's maskall >>bit without also clearing the enable bit (or ASSERT()ing that it's >>cleared, if that's guaranteed). > > I don't get why claring maskall bit without clearing enable bit would be an > issue. Because this means that not individually masked MSI-X entries would become usable for interrupt delivery by the device again. A misbehaving device (perhaps by having been programmed wrongly or maliciously) would then become a risk to the entire system. >>I also don't follow why you OR in >>->guest_maskall: Isn't it supposed to be clear anyway? > > Guest's first write to msixctrl register would override this value. > So don't clear it is also fine. Considering that do_pci_add() issues QMP > command to add pci device to qemu prior to xc_assign_device(), the question > here is whether there is any chance qemu's first write has happened. If some more general re-work (as discussed further down) was not going to happen, then this would need to be investigated in more detail. For now I'll leave this aside. >>From a more general perspective, forcing ->host_maskall to true >>in msi_set_mask_bit() when memory decoding is disabled is a >>safety measure. I'd like to see justification (in the description) why >>it is safe to clear the bit again at a later point. > > Currently, no idea how to prove it. My throught is simple: make sure > all status is benign. The host_maskall bit is set due to some actions of the > last owner. Clear it to avoid it affecting the new domain. But that's safe only if the device as a whole is in a sane state. One reason could therefore be that the clearing of the bit happens strictly after the device was reset. >>Thing is that _if_ it >>is safe to clear the bit when assigning the device to another guest, >>why wouldn't it even more so be safe to do so when assigning it >>back to Dom0 (i.e. in deassign_device())? > > I considered it. But deassign_device() happens before destroying domain during > which host_maskall is set. Meaning at that point the owner is Dom0 again. Perhaps there'd be a way for pciback to signal it has taken control of the device again and (as per above) reset it? >>And why would it not be safe to clear the bit perhaps already at >>the point MSI-X gets disabled? > > Yes, I also thought it might be an option. But doing this definitely > conflicts with the intention of the first if() in __pci_disable_msix() and > msi_set_mask_bit() which is trying to enable MSI-x to access MSI-x table to > set the per-vector mask bit. If it is safe, we don't need that also. But that's transient enabling/disabling. I'm only talking about the permanent disabling case. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |