[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
> From: Xu, Quan > Sent: Wednesday, December 23, 2015 4:26 PM > > Now if IOTLB/Context/IETC flush is timeout, panic hypervisor. > The coming patch set will fix it. again, please adjust comment to reflect what this patch is doing, i.e. only about improvement on Device-TLB flush. > > If Device-TLB flush is timeout, we'll hide the target ATS > device and crash the domain owning this ATS device. > > If impacted domain is hardware domain, just throw out a warning. > > The hided Device will be disallowed to be further assigned to > any domain. hided Device -> hidden device > > Signed-off-by: Quan Xu <quan.xu@xxxxxxxxx> > --- > xen/drivers/passthrough/pci.c | 2 +- > xen/drivers/passthrough/vtd/extern.h | 2 + > xen/drivers/passthrough/vtd/qinval.c | 80 > ++++++++++++++++++++++++++++++++++- > xen/drivers/passthrough/vtd/x86/ats.c | 13 ++++++ > 4 files changed, 94 insertions(+), 3 deletions(-) > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index 27b3ca7..2d7dc59 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -407,7 +407,7 @@ static void _pci_hide_device(struct pci_dev *pdev) > list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); > } > > -int __init pci_hide_device(int bus, int devfn) > +int pci_hide_device(int bus, int devfn) > { > struct pci_dev *pdev; > int rc = -ENOMEM; > diff --git a/xen/drivers/passthrough/vtd/extern.h > b/xen/drivers/passthrough/vtd/extern.h > index ec9c513..a2123ce 100644 > --- a/xen/drivers/passthrough/vtd/extern.h > +++ b/xen/drivers/passthrough/vtd/extern.h > @@ -58,6 +58,8 @@ int ats_device(const struct pci_dev *, const struct > acpi_drhd_unit *); > > int dev_invalidate_iotlb(struct iommu *iommu, u16 did, > u64 addr, unsigned int size_order, u64 type); > +int dev_invalidate_iotlb_sync(struct iommu *iommu, u16 did, > + u16 seg, u8 bus, u8 devfn); > > int qinval_device_iotlb(struct iommu *iommu, > u32 max_invs_pend, u16 sid, u16 size, u64 addr); > diff --git a/xen/drivers/passthrough/vtd/qinval.c > b/xen/drivers/passthrough/vtd/qinval.c > index b227e4e..7330c5d 100644 > --- a/xen/drivers/passthrough/vtd/qinval.c > +++ b/xen/drivers/passthrough/vtd/qinval.c > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu *iommu, > static int invalidate_sync(struct iommu *iommu) > { > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); > + int rc; > > if ( qi_ctrl->qinval_maddr ) > - return queue_invalidate_wait(iommu, 0, 1, 1); > + { > + rc = queue_invalidate_wait(iommu, 0, 1, 1); > + if ( rc ) > + { > + if ( rc == -ETIMEDOUT ) > + panic("Queue invalidate wait descriptor was timeout.\n"); > + return rc; > + } > + } > + why do you still do panic here? w/ PATCH 1/3 you should return rc to caller directly, right? > return 0; > } > > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu, > return 0; > } > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did, > + u16 seg, u8 bus, u8 devfn) > +{ > + struct domain *d; > + struct pci_dev *pdev; > + > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > + ASSERT(d); > + for_each_pdev(d, pdev) > + { > + if ( (pdev->seg == seg) && > + (pdev->bus == bus) && > + (pdev->devfn == devfn) ) > + { > + if ( pdev->domain ) > + { > + list_del(&pdev->domain_list); > + pdev->domain = NULL; > + } > + > + if ( pci_hide_device(bus, devfn) ) > + { > + printk(XENLOG_ERR > + "IOMMU hide device %04x:%02x:%02x error.", > + seg, bus, devfn); > + break; > + } I'm not sure whether using pci_hide_device is enough or not break other code path here? For example, now you have errors propagated to callers. What's the final policy against flush error? If they will finally go to cleanup some more state relying on pdev->domain, then that code path might be broken. If it's fine to do cleanup at this point, then just hidden is not enough. If you look at pci_remove_device, there's quite some state to be further cleaned up... I didn't think about the full logic thoroughly now. But it would always be good to not hide device now until a point where all related states have been cleaned up in error handling path chained up. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |