[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
>>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote: > --- a/xen/drivers/passthrough/vtd/qinval.c > +++ b/xen/drivers/passthrough/vtd/qinval.c > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu *iommu, > static int invalidate_sync(struct iommu *iommu) > { > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); > + int rc; > > if ( qi_ctrl->qinval_maddr ) > - return queue_invalidate_wait(iommu, 0, 1, 1); > + { > + rc = queue_invalidate_wait(iommu, 0, 1, 1); > + if ( rc ) > + { > + if ( rc == -ETIMEDOUT ) > + panic("Queue invalidate wait descriptor was timeout.\n"); Unless I'm overlooking something this doesn't replace and existing panic(), and I think I had indicated quite clearly that this series shouldn't add any new ones, unless the alternative of crashing the owning domain is completely unfeasible. > + panic("Queue invalidate wait descriptor was timeout.\n"); > + return rc; > + } > + } > + > return 0; > } Please avoid introducing multiple return points when this can be trivially avoided. > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu, > return 0; > } > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did, > + u16 seg, u8 bus, u8 devfn) > +{ > + struct domain *d; > + struct pci_dev *pdev; > + > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > + ASSERT(d); Don't you need to acquire some lock in order to safely assert this? Also note that unused slots hold zero, i.e. there's at least a theoretical risk of problems here when you don't also look at iommu->domid_bitmap. > + for_each_pdev(d, pdev) > + { > + if ( (pdev->seg == seg) && > + (pdev->bus == bus) && > + (pdev->devfn == devfn) ) > + { > + if ( pdev->domain ) If the device is on the domain's list, can this reasonably be false? I.e. don't you rather mean ASSERT() here? > + { > + list_del(&pdev->domain_list); This should happen under pcidevs_lock - you need to either acquire it or ASSERT() it being held. > + pdev->domain = NULL; > + } > + > + if ( pci_hide_device(bus, devfn) ) > + { > + printk(XENLOG_ERR > + "IOMMU hide device %04x:%02x:%02x error.", > + seg, bus, devfn); > + break; > + } > + > + break; > + } Redundant breaks. > + } > + > + if ( !is_hardware_domain(d) ) > + domain_crash(d); I wonder whether the device hiding shouldn't also be avoided if the device is owned by hwdom. > @@ -349,9 +416,18 @@ static int flush_iotlb_qi( > queue_invalidate_iotlb(iommu, > type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr, > dw, did, size_order, 0, addr); > + > + /* > + * Synchronize with hardware for invalidation request descriptors > + * submitted before Device-TLB invalidate descriptor. > + */ > + rc = invalidate_sync(iommu); > + if ( rc ) > + return rc; > + > if ( flush_dev_iotlb ) > ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type); > - rc = invalidate_sync(iommu); > + > if ( !ret ) > ret = rc; > } This change is because of ...? > --- a/xen/drivers/passthrough/vtd/x86/ats.c > +++ b/xen/drivers/passthrough/vtd/x86/ats.c > @@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did, > return -EOPNOTSUPP; > } > > + /* > + * Synchronize with hardware for Device-TLB invalidate > + * descriptor. > + */ > + ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg, > + pdev->bus, pdev->devfn); > + if ( ret ) > + { > + dprintk(XENLOG_WARNING VTDPREFIX, > + "Device-TLB flush timeout.\n"); Are you aware that dprintk() compiles to nothing in non-debug builds? Plus logging the error code and device coordinates might turn out helpful in such cases. But first of all - if there was a timeout, we'd make it here only for Dom0. Perhaps the printing should move into an else to the domain_crash()? And if there was another error, the message would be outright wrong. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |