[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
> On December 25 2015, 11:06 AM, <Tian, Kevin> wrote: > > From: Xu, Quan > > Sent: Wednesday, December 23, 2015 4:26 PM > > > > Now if IOTLB/Context/IETC flush is timeout, panic hypervisor. > > The coming patch set will fix it. > > again, please adjust comment to reflect what this patch is doing, i.e. only > about > improvement on Device-TLB flush. > Agreed. > > > > If Device-TLB flush is timeout, we'll hide the target ATS device and > > crash the domain owning this ATS device. > > > > If impacted domain is hardware domain, just throw out a warning. > > > > The hided Device will be disallowed to be further assigned to any > > domain. > > hided Device -> hidden device > Agreed. [...] > > diff --git a/xen/drivers/passthrough/vtd/qinval.c > > b/xen/drivers/passthrough/vtd/qinval.c > > index b227e4e..7330c5d 100644 > > --- a/xen/drivers/passthrough/vtd/qinval.c > > +++ b/xen/drivers/passthrough/vtd/qinval.c > > @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu > > *iommu, static int invalidate_sync(struct iommu *iommu) { > > struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); > > + int rc; > > > > if ( qi_ctrl->qinval_maddr ) > > - return queue_invalidate_wait(iommu, 0, 1, 1); > > + { > > + rc = queue_invalidate_wait(iommu, 0, 1, 1); > > + if ( rc ) > > + { > > + if ( rc == -ETIMEDOUT ) > > + panic("Queue invalidate wait descriptor was > timeout.\n"); > > + return rc; > > + } > > + } > > + > > why do you still do panic here? w/ PATCH 1/3 you should return rc to caller > directly, right? > This panic is original in queue_invalidate_wait(). Patch 1/3 is just for Device-TLB flush error ( Actually it covers iotlb flush error). If I fix VT-d iotlb/context/iec flush error to propagated callers, then I can remove this panic and return rc to propagated caller directly. > > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did, > > + u16 seg, u8 bus, u8 devfn) > { > > + struct domain *d; > > + struct pci_dev *pdev; > > + > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > > + ASSERT(d); > > + for_each_pdev(d, pdev) > > + { > > + if ( (pdev->seg == seg) && > > + (pdev->bus == bus) && > > + (pdev->devfn == devfn) ) > > + { > > + if ( pdev->domain ) > > + { > > + list_del(&pdev->domain_list); > > + pdev->domain = NULL; > > + } > > + > > + if ( pci_hide_device(bus, devfn) ) > > + { > > + printk(XENLOG_ERR > > + "IOMMU hide device %04x:%02x:%02x error.", > > + seg, bus, devfn); > > + break; > > + } > > I'm not sure whether using pci_hide_device is enough or not break other code > path here? For example, now you have errors propagated to callers. More information, after call pci_hide_device(), .. pdev->domain = dom_xen; list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); .. Hidden PCI devices are associated with 'dom_xen'. Now I found it may not clear rmrr mapping / context mapping. .i.e iommu_remove_device() -- [hd->platform_ops is NULL for dom_xen]. But I think it is acceptable, IMO dom_xen is a part hypervisor, and there mappings would not a security issue. Or I can remove these mappings before to hide this device. So I think it will not break other code break other code. > What's the > final policy against flush error? > If Device-TLB flush is timeout, we'll hide the target ATS device and crash the domain owning this ATS device. If impacted domain is hardware domain, just throw out a warning, instead of crash the hardware domain. The hided Device will be disallowed to be further assigned to any domain. >If they will finally go to cleanup some more state > relying on pdev->domain, then that code path might be broken. If it's fine to > do > cleanup at this point, then just hidden is not enough. If you look at > pci_remove_device, there's quite some state to be further cleaned up... > As the pdev->domain is 'dom_xen'; So for this case, it is still working. Correct me if it is not correct. BTW, a quick question, which case to call pci_remove_device()? > I didn't think about the full logic thoroughly now. But it would always be > good > to not hide device now until a point where all related states have been > cleaned > up in error handling path chained up. > Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |