[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/pt: fix some pass-thru devices don't work across reboot



>>> On 12.11.18 at 05:58, <chao.gao@xxxxxxxxx> wrote:
> On Fri, Nov 09, 2018 at 02:14:04AM -0700, Jan Beulich wrote:
>>>>> On 09.11.18 at 01:11, <chao.gao@xxxxxxxxx> wrote:
>>> I find some pass-thru devices don't work any more across guest
>>> reboot. Assigning it to another domain also meets the same issue. And
>>> the only way to make it work again is un-binding and binding it to
>>> pciback. Someone reported this issue one year ago [1].
>>
>>I find "some" above and in the title misleading. According to the
>>following description, the issue ought to affect exactly all MSI-X
>>capable devices.
> 
> Some devices whose driver doesn't disable MSI-X when shutdown. But Xen can't
> rely on the guest driver to do this.

The please clarify this in the description.

> That's why I think this issue should be handled in Xen.

I fully agree.

>>Next I'm weary of you clearing the device's maskall
>>bit without also clearing the enable bit (or ASSERT()ing that it's
>>cleared, if that's guaranteed).
> 
> I don't get why claring maskall bit without clearing enable bit would be an
> issue.

Because this means that not individually masked MSI-X entries would
become usable for interrupt delivery by the device again. A misbehaving
device (perhaps by having been programmed wrongly or maliciously)
would then become a risk to the entire system.

>>I also don't follow why you OR in
>>->guest_maskall: Isn't it supposed to be clear anyway?
> 
> Guest's first write to msixctrl register would override this value.
> So don't clear it is also fine. Considering that do_pci_add() issues QMP
> command to add pci device to qemu prior to xc_assign_device(), the question
> here is whether there is any chance qemu's first write has happened.

If some more general re-work (as discussed further down) was not
going to happen, then this would need to be investigated in more
detail. For now I'll leave this aside.

>>From a more general perspective, forcing ->host_maskall to true
>>in msi_set_mask_bit() when memory decoding is disabled is a
>>safety measure. I'd like to see justification (in the description) why
>>it is safe to clear the bit again at a later point.
> 
> Currently, no idea how to prove it. My throught is simple: make sure
> all status is benign. The host_maskall bit is set due to some actions of the
> last owner. Clear it to avoid it affecting the new domain.

But that's safe only if the device as a whole is in a sane state. One
reason could therefore be that the clearing of the bit happens
strictly after the device was reset.

>>Thing is that _if_ it
>>is safe to clear the bit when assigning the device to another guest,
>>why wouldn't it even more so be safe to do so when assigning it
>>back to Dom0 (i.e. in deassign_device())?
> 
> I considered it. But deassign_device() happens before destroying domain during
> which host_maskall is set.

Meaning at that point the owner is Dom0 again. Perhaps there'd be
a way for pciback to signal it has taken control of the device again
and (as per above) reset it?

>>And why would it not be safe to clear the bit perhaps already at
>>the point MSI-X gets disabled?
> 
> Yes, I also thought it might be an option. But doing this definitely
> conflicts with the intention of the first if() in __pci_disable_msix() and
> msi_set_mask_bit() which is trying to enable MSI-x to access MSI-x table to
> set the per-vector mask bit. If it is safe, we don't need that also. 

But that's transient enabling/disabling. I'm only talking about the
permanent disabling case.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.