[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.

> On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > Sent: Tuesday, January 26, 2016 11:53 PM

> > Once again: Before getting started, please assess which route is going
> > to be the better one. Remember that we had already discussed and put
> > aside some form of deferring the hiding of devices, so if you come
> > back with a patch doing that again, you'll have to be able to explain
> > why the alternative(s) are worse.
> >
> Quan, could you list pros/cons of those alternatives based on discussion so 
> far?
> Then we can decide which way should be done before you go to actual coding.
> Earlier suggestion on hiding device immediately is under the assumption that 
> all
> locks have been held. If this part becomes too complex, and you can explain
> clearly that deferring the hiding action doesn't lead to any race condition, 
> then
> people can see why you are proposing defer again.

The following are pros/cons of those alternatives. It is also why I propose 
defer again.

-- --
1. Hiding the devices immediately
     * it makes whatever changes are ASAP after the Device-TLB flush error.
     * It may break the code path.
     * It may lead to any race condition.
     * Hiding the devices immediately is under the assumption that all locks 
have been held.
      Different locking state is possible for different call trees. This part 
becomes too complex.
2. deferring the hiding of devices
    *It doesn't lead to any race condition.
    *It makes existing calling chains with all required error handling 
    * It needs to maintain a flag.
-- --

How to defer the hiding of devices:
    When Device-TLB flush is error, it is to set a new flag and then continue 
existing calling chains with all required error handling completed. Only at 
safe place we can safely invoke pci_hide_device() variant to hide devices. We 
do a lazy hide until next time when it's assigned to another guest while the 
new flag is recognized.

Furthermore explanation:
A new flag:
         In my previous email, I introduced a "pci_dev->info.is_unassignable" 
flag. When Device-TLB flush is error, I think it is not a good idea to modify 
the pci_dev,
         As it is uncertain whether pcidevs_lock is acquired.
         I'd better introduce a new iommu bitmap array to register the device 
with BDF. It may be quite similar to 'iommu->domid_bitmap'

Safe place: 
         I think the beginning of reassign_device_ownership() is a safe place 
to hide the device which is registered with the new flag.
         reassign_device_ownership() is under lock.

pci_hide_device() variant:
         Remove the pcidevs_lock, as the pcidevs_lock has been acquired at call 
tree of reassign_device_ownership().


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.