[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: Thursday, January 14, 2016 5:00 PM > > >>> On 14.01.16 at 02:22, <kevin.tian@xxxxxxxxx> wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: Wednesday, January 13, 2016 7:03 PM > >> > >> >> Jan, > >> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch > >> >> 1/2 > >> > and patch 2/2). > >> >> We have discussed how to hide a device with pci_hide_device() when > > Device-TLB > >> > flush is > >> >> error. > >> >> > >> >> Now there are 2 concerns: > >> >> 1. Hide the PCI device may break the code path. (then the > >> >> pdev->domain > >> > is > >> >> dom_xen) > >> >> 2. Is the blew logic right? > >> >> --If Device-TLB flush is timeout, we'll hide the target ATS > >> >> device > >> > and crash the > >> >> domain owning this ATS device. > >> >> If impacted domain is hardware domain, just throw out a > >> > warning, instead of > >> >> crash the hardware domain. > >> >> The hided Device will be disallowed to be further assigned > >> >> to > >> > any domain. > >> >> > >> >> Kevin, correct me if I am wrong. > >> >> > >> >> > >> > > >> > for 2, yes it's the policy we agreed in previous discussion. > >> > > >> > for 1, after more thinking I think the concern is real. pci_hide_device > >> > is used once in early boot-up phase. At that time, it's simple to just > >> > have right owner configured. However in the path of normal device > >> > assign/deassign, there are tons of more state associated which may > >> > be related to the owner. Though we may do some special handling > >> > in related code paths to have dom_xen specially handled, it's way > >> > tricky and not easy to maintain. > >> > >> I don't buy this without one of you pointing out the actual > >> difficulties: The domain is being crashed anyway, so there's no > >> true device de-assignment needed (IOMMU tables will get torn > >> down no matter what). > >> > > > > Here is one example of a quick glimpse: > > > > static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) > > { > > [...] > > > > ret = domain_context_mapping(pdev->domain, devfn, pdev); > > if ( ret ) > > { > > dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping failed\n", > > pdev->domain->domain_id); > > return ret; > > } > > > > If pdev->domain is changed inside due to pci_hide_device, the error > > message will be inaccurate. > > But that's a thing trivial to fix up. You had talked about tricky things... > It's not about how this specific thing can be fixed. I didn't check all the code. There could be more examples, and in the future all new code need to be aware that the majority of IOMMU functions may have pdev->domain changed due to error. My concern is more that it's not a good design. I think it's natural to have a state changed only after all existing references in the call chain have been completed. Adding a new flag to delay hiding device looks much clearer to me. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |