|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
Jan, thanks for your comments.
> On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote:
> >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote:
> > --- a/xen/drivers/passthrough/vtd/qinval.c
> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu
> > *iommu, static int invalidate_sync(struct iommu *iommu) {
> > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> > + int rc;
> >
> > if ( qi_ctrl->qinval_maddr )
> > - return queue_invalidate_wait(iommu, 0, 1, 1);
> > + {
> > + rc = queue_invalidate_wait(iommu, 0, 1, 1);
> > + if ( rc )
> > + {
> > + if ( rc == -ETIMEDOUT )
> > + panic("Queue invalidate wait descriptor was
> > + timeout.\n");
>
> Unless I'm overlooking something this doesn't replace and existing panic(),
> and I
> think I had indicated quite clearly that this series shouldn't add any new
> ones,
> unless the alternative of crashing the owning domain is completely unfeasible.
>
I will remove it in next v5.
[...]
> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
> > return 0;
> > }
> >
> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> > + u16 seg, u8 bus, u8 devfn)
> {
> > + struct domain *d;
> > + struct pci_dev *pdev;
> > +
> > + d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > + ASSERT(d);
>
> Don't you need to acquire some lock in order to safely assert this?
Referred to the other use case of 'rcu_lock_domain_by_id()', Xen checks whether
The domain is 'NULL'.
Could I also replace the 'ASSERT(d)' with
+ If ( d == NULL )
+ return;
?
> Also note that unused slots hold zero, i.e. there's at least a theoretical
> risk of
> problems here when you don't also look at
> iommu->domid_bitmap.
>
I am not clear how to fix this point. Do you have good idea?
Add a lock to 'iommu->domid_bitmap'?
> > + for_each_pdev(d, pdev)
> > + {
> > + if ( (pdev->seg == seg) &&
> > + (pdev->bus == bus) &&
> > + (pdev->devfn == devfn) )
> > + {
> > + if ( pdev->domain )
>
> If the device is on the domain's list, can this reasonably be false?
I can't find the false case.
> I.e. don't you rather mean ASSERT() here?
I'd better replace 'if' with ASSERT() for this case.
>
> > + {
> > + list_del(&pdev->domain_list);
>
> This should happen under pcidevs_lock - you need to either acquire it or
> ASSERT() it being held.
>
I should check whether it is under pcidevs_lock -- with
spin_is_locked(&pcidevs_lock)
If it is not under pcidevs_lock, I should acquire it.
I think ASSERT() is not a good idea. Hypervisor acquires this lock and then
remove the resource.
> > + }
> > +
> > + if ( !is_hardware_domain(d) )
> > + domain_crash(d);
>
> I wonder whether the device hiding shouldn't also be avoided if the device is
> owned by hwdom.
>
I think it should be avoided if the device is owned by hwdom.
Otherwise, it is quite similar to panic..
> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
> > queue_invalidate_iotlb(iommu,
> > type >>
> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> > dw, did, size_order, 0, addr);
> > +
> > + /*
> > + * Synchronize with hardware for invalidation request descriptors
> > + * submitted before Device-TLB invalidate descriptor.
> > + */
> > + rc = invalidate_sync(iommu);
> > + if ( rc )
> > + return rc;
> > +
> > if ( flush_dev_iotlb )
> > ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> > - rc = invalidate_sync(iommu);
> > +
> > if ( !ret )
> > ret = rc;
> > }
>
> This change is because of ...?
>
As Kevin's comments,
I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
Then i can know which device is flush timeout.
> > --- a/xen/drivers/passthrough/vtd/x86/ats.c
> > +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> > @@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16
> did,
> > return -EOPNOTSUPP;
> > }
> >
> > + /*
> > + * Synchronize with hardware for Device-TLB invalidate
> > + * descriptor.
> > + */
> > + ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
> > + pdev->bus, pdev->devfn);
> > + if ( ret )
> > + {
> > + dprintk(XENLOG_WARNING VTDPREFIX,
> > + "Device-TLB flush timeout.\n");
>
> Are you aware that dprintk() compiles to nothing in non-debug builds?
To be honest, I was not aware.
> Plus logging the error code and device coordinates might turn out helpful in
> such cases. But first of all - if there was a timeout, we'd make it here only
> for
> Dom0. Perhaps the printing should move into an else to the domain_crash()?
> And if there was another error, the message would be outright wrong.
>
IMO, the print for Dom0 is enough.
I think it is not a good idea to move into an else to domain_crash().
Domain_crash is quite a common part.
Anyway I can improve it in low priority.
Jan, thanks..
-Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |