[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
>>> On 20.01.16 at 11:26, <quan.xu@xxxxxxxxx> wrote: >> On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote: >> >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote: >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu, >> > return 0; >> > } >> > >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did, >> > + u16 seg, u8 bus, u8 devfn) >> { >> > + struct domain *d; >> > + struct pci_dev *pdev; >> > + >> > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); >> > + ASSERT(d); >> >> Don't you need to acquire some lock in order to safely assert this? > > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen checks > whether > The domain is 'NULL'. > Could I also replace the 'ASSERT(d)' with > + If ( d == NULL ) > + return; > ? At a first glance this doesn't look right, but in the end that's something you need to answer. >> Also note that unused slots hold zero, i.e. there's at least a theoretical > risk of >> problems here when you don't also look at >> iommu->domid_bitmap. >> > I am not clear how to fix this point. Do you have good idea? > Add a lock to 'iommu->domid_bitmap'? How would a lock help avoiding mistaking an unused slot to mean Dom0? As already suggested - I think you simply need to consult the bitmap along with the domain ID array. >> > + { >> > + list_del(&pdev->domain_list); >> >> This should happen under pcidevs_lock - you need to either acquire it or >> ASSERT() it being held. >> > > I should check whether it is under pcidevs_lock -- with > spin_is_locked(&pcidevs_lock) > If it is not under pcidevs_lock, I should acquire it. > I think ASSERT() is not a good idea. Hypervisor acquires this lock and then > remove the resource. I don't understand this last sentence. >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi( >> > queue_invalidate_iotlb(iommu, >> > type >> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr, >> > dw, did, size_order, 0, addr); >> > + >> > + /* >> > + * Synchronize with hardware for invalidation request descriptors >> > + * submitted before Device-TLB invalidate descriptor. >> > + */ >> > + rc = invalidate_sync(iommu); >> > + if ( rc ) >> > + return rc; >> > + >> > if ( flush_dev_iotlb ) >> > ret = dev_invalidate_iotlb(iommu, did, addr, size_order, >> > type); >> > - rc = invalidate_sync(iommu); >> > + >> > if ( !ret ) >> > ret = rc; >> > } >> >> This change is because of ...? >> > As Kevin's comments, > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb(). > Then i can know which device is flush timeout. I don't understand: How is your reply related to you moving up the invocation of invalidate_sync()? >> Plus logging the error code and device coordinates might turn out helpful in >> such cases. But first of all - if there was a timeout, we'd make it here >> only for >> Dom0. Perhaps the printing should move into an else to the domain_crash()? >> And if there was another error, the message would be outright wrong. >> > IMO, the print for Dom0 is enough. > I think it is not a good idea to move into an else to domain_crash(). > Domain_crash is quite a common part. > Anyway I can improve it in low priority. At the very least the message should not end up being actively misleading. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |