[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
Jan, thanks for your comments. > On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote: > >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote: > > --- a/xen/drivers/passthrough/vtd/qinval.c > > +++ b/xen/drivers/passthrough/vtd/qinval.c > > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu > > *iommu, static int invalidate_sync(struct iommu *iommu) { > > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); > > + int rc; > > > > if ( qi_ctrl->qinval_maddr ) > > - return queue_invalidate_wait(iommu, 0, 1, 1); > > + { > > + rc = queue_invalidate_wait(iommu, 0, 1, 1); > > + if ( rc ) > > + { > > + if ( rc == -ETIMEDOUT ) > > + panic("Queue invalidate wait descriptor was > > + timeout.\n"); > > Unless I'm overlooking something this doesn't replace and existing panic(), > and I > think I had indicated quite clearly that this series shouldn't add any new > ones, > unless the alternative of crashing the owning domain is completely unfeasible. > I will remove it in next v5. [...] > > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu, > > return 0; > > } > > > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did, > > + u16 seg, u8 bus, u8 devfn) > { > > + struct domain *d; > > + struct pci_dev *pdev; > > + > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > > + ASSERT(d); > > Don't you need to acquire some lock in order to safely assert this? Referred to the other use case of 'rcu_lock_domain_by_id()', Xen checks whether The domain is 'NULL'. Could I also replace the 'ASSERT(d)' with + If ( d == NULL ) + return; ? > Also note that unused slots hold zero, i.e. there's at least a theoretical > risk of > problems here when you don't also look at > iommu->domid_bitmap. > I am not clear how to fix this point. Do you have good idea? Add a lock to 'iommu->domid_bitmap'? > > + for_each_pdev(d, pdev) > > + { > > + if ( (pdev->seg == seg) && > > + (pdev->bus == bus) && > > + (pdev->devfn == devfn) ) > > + { > > + if ( pdev->domain ) > > If the device is on the domain's list, can this reasonably be false? I can't find the false case. > I.e. don't you rather mean ASSERT() here? I'd better replace 'if' with ASSERT() for this case. > > > + { > > + list_del(&pdev->domain_list); > > This should happen under pcidevs_lock - you need to either acquire it or > ASSERT() it being held. > I should check whether it is under pcidevs_lock -- with spin_is_locked(&pcidevs_lock) If it is not under pcidevs_lock, I should acquire it. I think ASSERT() is not a good idea. Hypervisor acquires this lock and then remove the resource. > > + } > > + > > + if ( !is_hardware_domain(d) ) > > + domain_crash(d); > > I wonder whether the device hiding shouldn't also be avoided if the device is > owned by hwdom. > I think it should be avoided if the device is owned by hwdom. Otherwise, it is quite similar to panic.. > > @@ -349,9 +416,18 @@ static int flush_iotlb_qi( > > queue_invalidate_iotlb(iommu, > > type >> > DMA_TLB_FLUSH_GRANU_OFFSET, dr, > > dw, did, size_order, 0, addr); > > + > > + /* > > + * Synchronize with hardware for invalidation request descriptors > > + * submitted before Device-TLB invalidate descriptor. > > + */ > > + rc = invalidate_sync(iommu); > > + if ( rc ) > > + return rc; > > + > > if ( flush_dev_iotlb ) > > ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type); > > - rc = invalidate_sync(iommu); > > + > > if ( !ret ) > > ret = rc; > > } > > This change is because of ...? > As Kevin's comments, I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb(). Then i can know which device is flush timeout. > > --- a/xen/drivers/passthrough/vtd/x86/ats.c > > +++ b/xen/drivers/passthrough/vtd/x86/ats.c > > @@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 > did, > > return -EOPNOTSUPP; > > } > > > > + /* > > + * Synchronize with hardware for Device-TLB invalidate > > + * descriptor. > > + */ > > + ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg, > > + pdev->bus, pdev->devfn); > > + if ( ret ) > > + { > > + dprintk(XENLOG_WARNING VTDPREFIX, > > + "Device-TLB flush timeout.\n"); > > Are you aware that dprintk() compiles to nothing in non-debug builds? To be honest, I was not aware. > Plus logging the error code and device coordinates might turn out helpful in > such cases. But first of all - if there was a timeout, we'd make it here only > for > Dom0. Perhaps the printing should move into an else to the domain_crash()? > And if there was another error, the message would be outright wrong. > IMO, the print for Dom0 is enough. I think it is not a good idea to move into an else to domain_crash(). Domain_crash is quite a common part. Anyway I can improve it in low priority. Jan, thanks.. -Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |