Xen project Mailing List

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.

Date: Fri, 22 Jan 2016 15:57:44 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>, "george.dunlap@xxxxxxxxxxxxx" <george.dunlap@xxxxxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "tim@xxxxxxx" <tim@xxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, "keir@xxxxxxx" <keir@xxxxxxx>

Delivery-date: Fri, 22 Jan 2016 15:58:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHRPVuqCkN5cLf5KUG9uRZGE5va5J78Ky8AgAgnxgD//5fVAIACVqGw//+QFoCAAbNIcA==

Thread-topic: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.

> On January 22, 2016 at 12:31am, <JBeulich@xxxxxxxx> wrote: > >>> On 21.01.16 at 17:16, <quan.xu@xxxxxxxxx> wrote: > >> On January 20, 2016 at 7:29 pm, <JBeulich@xxxxxxxx> wrote: > >> >>> On 20.01.16 at 11:26, <quan.xu@xxxxxxxxx> wrote: > >> >> On January 15, 2016 at 9:10, <JBeulich@xxxxxxxx> wrote: > >> >> >>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote: > >> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu, > >> >> > return 0; > >> >> > } > >> >> > > >> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 > did, > >> >> > + u16 seg, u8 bus, u8 > >> >> > +devfn) > >> >> { > >> >> > + struct domain *d; > >> >> > + struct pci_dev *pdev; > >> >> > + > >> >> > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > >> >> > + ASSERT(d); > >> >> > >> >> Don't you need to acquire some lock in order to safely assert this? > >> > > >> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen > >> > checks whether The domain is 'NULL'. > >> > Could I also replace the 'ASSERT(d)' with > >> > + If ( d == NULL ) > >> > + return; > >> > ? > >> > >> At a first glance this doesn't look right, but in the end that's > >> something > > you need > >> to answer. > >> > > > > Is it quite similar to whether the domain has been destroyed when > > Device-TLB is flushing? > > Going back to the original question I raised, you need to ask > yourself: Can d validly be NULL when you get here (e.g. due to some other > processor changing something in parallel)? I think that is YES. Not all of the callers acquire RCU lock. Such as: $flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() --intel_iommu_iotlb_flush() --iommu_iotlb_flush() --xenmem_add_to_physmap()--do_memory_op() When Device-TLB is flushing for the above call path, the domain is destroyed in parallel. Then the d is NULL. > If yes, you can't ASSERT(), and the > next question then is what exactly it means that it got ripped off the table. > The > answer to that determines whether simply returning would be okay. > IMO, Simply returning would be okay, but the device would not be hidden. > >> >> Also note that unused slots hold zero, i.e. there's at least a > >> >> theoretical > >> > risk of > >> >> problems here when you don't also look at > >> >> iommu->domid_bitmap. > >> >> > >> > I am not clear how to fix this point. Do you have good idea? > >> > Add a lock to 'iommu->domid_bitmap'? > >> > >> How would a lock help avoiding mistaking an unused slot to mean Dom0? > >> As already suggested - I think you simply need to consult the bitmap > >> along with > > the > >> domain ID array. > >> > > > > Try to get domain id with iommu->domid_map[did] ? > > ??? > +if ( iommu->domid_map ) + d = rcu_lock_domain_by_id(iommu->domid_map[did]); Is it right? > >> >> > + { > >> >> > + list_del(&pdev->domain_list); > >> >> > >> >> This should happen under pcidevs_lock - you need to either acquire > >> >> it or > >> >> ASSERT() it being held. > >> >> > >> > > >> > I should check whether it is under pcidevs_lock -- with > >> > spin_is_locked(&pcidevs_lock) > >> > If it is not under pcidevs_lock, I should acquire it. > >> > I think ASSERT() is not a good idea. Hypervisor acquires this lock > >> > and then remove the resource. > >> > >> I don't understand this last sentence. > >> > > For example: in > > pci_remove_device() > > { > > ... > > spin_lock(&pcidevs_lock); > > .. > > iommu_remove_device().. > > .. > > spin_unlock(&pcidevs_lock); > > } > > > > Device-TLB is maybe flush error in iommu_remove_device().. > > Then it is under pcidevs_lock.. > > In this case, I need to check whether it is under pcidevs_lock. > > If not, I need to acquire the pcidevs_lock. > > Ah, okay. But you can't use spin_is_locked() for that purpose. > Why? If I introduce a new parameter 'lock'. + int lock = spin_is_locked(&pcidevs_lock); + if ( !lock ) + spin_lock(&pcidevs_lock); ... + if ( !lock ) + spin_unlock(&pcidevs_lock); Is it right? Jan, do you have some better idea? I think ASSERT is not right. > >> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi( > >> >> > queue_invalidate_iotlb(iommu, > >> >> > type >> > >> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr, > >> >> > dw, did, size_order, 0, addr); > >> >> > + > >> >> > + /* > >> >> > + * Synchronize with hardware for invalidation request > >> descriptors > >> >> > + * submitted before Device-TLB invalidate descriptor. > >> >> > + */ > >> >> > + rc = invalidate_sync(iommu); > >> >> > + if ( rc ) > >> >> > + return rc; > >> >> > + > >> >> > if ( flush_dev_iotlb ) > >> >> > ret = dev_invalidate_iotlb(iommu, did, addr, > >> >> > size_order, > >> type); > >> >> > - rc = invalidate_sync(iommu); > >> >> > + > >> >> > if ( !ret ) > >> >> > ret = rc; > >> >> > } > >> >> > >> >> This change is because of ...? > >> >> > >> > As Kevin's comments, > >> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb(). > >> > Then i can know which device is flush timeout. > >> > >> I don't understand: How is your reply related to you moving up the > > invocation of > >> invalidate_sync()? > > > > This invalidate_sync() is to synchronize with hardware for > > invalidation request descriptors submitted before Device-TLB > > invalidate descriptor. > > > > If I don't add this invalidate_sync().. > > dev_invalidate_iotlb_timeout() is not clear whether the flush time out > > is about Device-TLB flush error or the other previous > > iotlb/iec/context flush error. > > This being anything but obvious, maybe a brief comment would help? > Change " Synchronize with hardware for invalidation request descriptors submitted before Device-TLB invalidate descriptor. " TO "Synchronize invalidation completions before Device-TLB invalidation " ? > >> >> Plus logging the error code and device coordinates might turn out > >> >> helpful in such cases. But first of all - if there was a timeout, > >> >> we'd make it here only for Dom0. Perhaps the printing should move > >> >> into an > >> else to the domain_crash()? > >> >> And if there was another error, the message would be outright wrong. > >> >> > >> > IMO, the print for Dom0 is enough. > >> > I think it is not a good idea to move into an else to domain_crash(). > >> > Domain_crash is quite a common part. > >> > Anyway I can improve it in low priority. > >> > >> At the very least the message should not end up being actively misleading. > >> > > What about the below? > > + printk(XENLOG_ERR > > + "Device %04x:%02x:%02x is flush error.", > > + seg, bus, devfn); > > Well, for one you once again omit the error code. And then (I think others had > pointed that out already) a device cannot be flush error. > Perhaps you mean "Flush error %d on device ...\n"? Plus you absolutely should > print PCI device coordinates in the canonical way, What does 'print PCI device coordinates in the canonical way' mean ? Such as "0000:00:02.1"? > so that grep-ing a log file for > issues with a specific device will find everything related to that device. I am appreciate your kindness. Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.