[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
On January 06, 2016 7:26 PM, <quan.xu@xxxxxxxxx> wrote: > > > diff --git a/xen/drivers/passthrough/vtd/qinval.c > > > b/xen/drivers/passthrough/vtd/qinval.c > > > index b227e4e..7330c5d 100644 > > > --- a/xen/drivers/passthrough/vtd/qinval.c > > > +++ b/xen/drivers/passthrough/vtd/qinval.c > > > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu > > > *iommu, static int invalidate_sync(struct iommu *iommu) { > > > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); > > > + int rc; > > > > > > if ( qi_ctrl->qinval_maddr ) > > > - return queue_invalidate_wait(iommu, 0, 1, 1); > > > + { > > > + rc = queue_invalidate_wait(iommu, 0, 1, 1); > > > + if ( rc ) > > > + { > > > + if ( rc == -ETIMEDOUT ) > > > + panic("Queue invalidate wait descriptor was > > timeout.\n"); > > > + return rc; > > > + } > > > + } > > > + > > > > why do you still do panic here? w/ PATCH 1/3 you should return rc to > > caller directly, right? > > > > This panic is original in queue_invalidate_wait(). Patch 1/3 is just for > Device-TLB > flush error ( Actually it covers iotlb flush error). > If I fix VT-d iotlb/context/iec flush error to propagated callers, then I can > remove this panic and return rc to propagated caller directly. > > > > > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did, > > > + u16 seg, u8 bus, u8 > devfn) > > { > > > + struct domain *d; > > > + struct pci_dev *pdev; > > > + > > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > > > + ASSERT(d); > > > + for_each_pdev(d, pdev) > > > + { > > > + if ( (pdev->seg == seg) && > > > + (pdev->bus == bus) && > > > + (pdev->devfn == devfn) ) > > > + { > > > + if ( pdev->domain ) > > > + { > > > + list_del(&pdev->domain_list); > > > + pdev->domain = NULL; > > > + } > > > + > > > + if ( pci_hide_device(bus, devfn) ) > > > + { > > > + printk(XENLOG_ERR > > > + "IOMMU hide device %04x:%02x:%02x error.", > > > + seg, bus, devfn); > > > + break; > > > + } > > > > I'm not sure whether using pci_hide_device is enough or not break > > other code path here? For example, now you have errors propagated to > callers. > > More information, after call pci_hide_device(), .. > pdev->domain = dom_xen; > list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); .. > > Hidden PCI devices are associated with 'dom_xen'. > Now I found it may not clear rmrr mapping / context mapping. .i.e > iommu_remove_device() -- [hd->platform_ops is NULL for dom_xen]. > But I think it is acceptable, IMO dom_xen is a part hypervisor, and there > mappings would not a security issue. > Or I can remove these mappings before to hide this device. > > So I think it will not break other code break other code. > > > What's the > > final policy against flush error? > > > > If Device-TLB flush is timeout, we'll hide the target ATS device and crash the > domain owning this ATS device. > If impacted domain is hardware domain, just throw out a warning, instead of > crash the hardware domain. > The hided Device will be disallowed to be further assigned to any domain. > > > > >If they will finally go to cleanup some more state relying on > >pdev->domain, then that code path might be broken. If it's fine to do > >cleanup at this point, then just hidden is not enough. If you look at > >pci_remove_device, there's quite some state to be further cleaned up... > > > > > As the pdev->domain is 'dom_xen'; > So for this case, it is still working. Correct me if it is not correct. > BTW, a quick question, which case to call pci_remove_device()? > > > > I didn't think about the full logic thoroughly now. But it would > > always be good to not hide device now until a point where all related > > states have been cleaned up in error handling path chained up. > > Jan, could you help me to double check it? thanks. Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |