[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



> On January 25, 2016 at 10:09pm, <JBeulich@xxxxxxxx> wrote:
> >>> On 22.01.16 at 16:57, <quan.xu@xxxxxxxxx> wrote:
> >>  On January 22, 2016 at 12:31am, <JBeulich@xxxxxxxx> wrote:
> >> >>> On 21.01.16 at 17:16, <quan.xu@xxxxxxxxx> wrote:
> >> >>  On January 20, 2016 at 7:29 pm,  <JBeulich@xxxxxxxx> wrote:
> >> >> >>> On 20.01.16 at 11:26, <quan.xu@xxxxxxxxx> wrote:
> >> >> >> On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote:
> >> >> >> >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote:
> >> >> >> Also note that unused slots hold zero, i.e. there's at least a
> >> >> >> theoretical
> >> >> > risk of
> >> >> >> problems here when you don't also look at
> >> >> >> iommu->domid_bitmap.
> >> >> >>
> >> >> > I am not clear how to fix this point. Do you have good idea?
> >> >> > Add a lock to 'iommu->domid_bitmap'?
> >> >>
> >> >> How would a lock help avoiding mistaking an unused slot to mean Dom0?
> >> >> As already suggested - I think you simply need to consult the
> >> >> bitmap along with
> >> > the
> >> >> domain ID array.
> >> >>
> >> >
> >> > Try to get domain id with iommu->domid_map[did] ?
> >>
> >> ???
> >>
> >         +if ( iommu->domid_map )
> >         +   d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >
> > Is it right?
> 
> I don't see what this changes. Again - what your code has been lacking so far 
> is
> some mechanism to guarantee that what you read from domid_map[] is
> actually a valid domain ID. I can only once again point your attention to
> domid_bitmap, which afaics gives you exactly that valid/invalid indication.
> 
Jan,
Sorry, I was confused about this point as I didn't understand the domid_bitmap. 
I printed out the iommu->domid_bitmap[did], which was tricky to me.
Now I get it. 
As you mentioned , I simply need to consult the bitmap along with the domain ID 
array.

+If ( test_bit(did, iommu->domid_bitmap) && iommu->domid_map[did] >= 0 )
+   d = rcu_lock_domain_by_id(iommu->domid_map[did]);

Is it right now?

> >> >> >> > +            {
> >> >> >> > +                list_del(&pdev->domain_list);
> >> >> >>
> >> >> >> This should happen under pcidevs_lock - you need to either
> >> >> >> acquire it or
> >> >> >> ASSERT() it being held.
> >> >> >>
> >> >> >
> >> >> > I should check whether it is under pcidevs_lock -- with
> >> >> > spin_is_locked(&pcidevs_lock)
> >> >> > If it is not under pcidevs_lock, I should acquire it.
> >> >> > I think ASSERT() is not a good idea. Hypervisor acquires this
> >> >> > lock and then remove the resource.
> >> >>
> >> >> I don't understand this last sentence.
> >> >>
> >> > For example: in
> >> > pci_remove_device()
> >> > {
> >> > ...
> >> >     spin_lock(&pcidevs_lock);
> >> >     ..
> >> >     iommu_remove_device()..
> >> >     ..
> >> >     spin_unlock(&pcidevs_lock);
> >> > }
> >> >
> >> > Device-TLB is maybe flush error in iommu_remove_device()..
> >> > Then it is under pcidevs_lock..
> >> > In this case, I need to check whether it is under pcidevs_lock.
> >> > If not, I need to acquire the pcidevs_lock.
> >>
> >> Ah, okay. But you can't use spin_is_locked() for that purpose.
> >>
> > If I introduce a new parameter 'lock'.
> > + int lock = spin_is_locked(&pcidevs_lock);
> >
> >
> > + if ( !lock )
> > +    spin_lock(&pcidevs_lock);
> > ...
> > + if ( !lock )
> > +    spin_unlock(&pcidevs_lock);
> >
> > Is it right?
> > Jan, do you have some better idea?
> 
> If indeed different locking state is possible for different call trees, 

Indeed different. For example,
It is _not_under_ lock for the following call tree:
$ flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() 
--intel_iommu_iotlb_flush() --iommu_iotlb_flush() 
--xenmem_add_to_physmap()--do_memory_op() 

It is _under_ lock for the following call tree:
$flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()--domain_context_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_pci_domctl()



> then I'm
> afraid you won't get around passing down flags to indicate whether the needed
> lock is already being held.
> 
Agreed.

At first, I am open for any solution.
pcidevs_lock is quite a big lock. For this point, it looks much better to add a 
new flag to delay hiding device.
I am also afraid that it may raise further security issues.


Jan, thanks!!
-Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.