[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
> -----Original Message----- > From: Tian, Kevin > Sent: Thursday, January 14, 2016 5:13 PM > To: Jan Beulich; Xu, Quan > Cc: andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxxxxx; Wu, Feng; > Nakajima, Jun; xen-devel@xxxxxxxxxxxxx; keir@xxxxxxx; tim@xxxxxxx > Subject: RE: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush > timeout > issue. > > > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > > Sent: Thursday, January 14, 2016 5:00 PM > > > > >>> On 14.01.16 at 02:22, <kevin.tian@xxxxxxxxx> wrote: > > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > > >> Sent: Wednesday, January 13, 2016 7:03 PM > > >> > > >> >> Jan, > > >> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's > > >> >> patch 1/2 > > >> > and patch 2/2). > > >> >> We have discussed how to hide a device with pci_hide_device() > > >> >> when > > > Device-TLB > > >> > flush is > > >> >> error. > > >> >> > > >> >> Now there are 2 concerns: > > >> >> 1. Hide the PCI device may break the code path. (then the > > >> >> pdev->domain > > >> > is > > >> >> dom_xen) > > >> >> 2. Is the blew logic right? > > >> >> --If Device-TLB flush is timeout, we'll hide the > > >> >> target ATS device > > >> > and crash the > > >> >> domain owning this ATS device. > > >> >> If impacted domain is hardware domain, just throw > > >> >> out a > > >> > warning, instead of > > >> >> crash the hardware domain. > > >> >> The hided Device will be disallowed to be further > > >> >> assigned to > > >> > any domain. > > >> >> > > >> >> Kevin, correct me if I am wrong. > > >> >> > > >> >> > > >> > > > >> > for 2, yes it's the policy we agreed in previous discussion. > > >> > > > >> > for 1, after more thinking I think the concern is real. > > >> > pci_hide_device is used once in early boot-up phase. At that > > >> > time, it's simple to just have right owner configured. However in > > >> > the path of normal device assign/deassign, there are tons of more > > >> > state associated which may be related to the owner. Though we may > > >> > do some special handling in related code paths to have dom_xen > > >> > specially handled, it's way tricky and not easy to maintain. > > >> > > >> I don't buy this without one of you pointing out the actual > > >> difficulties: The domain is being crashed anyway, so there's no > > >> true device de-assignment needed (IOMMU tables will get torn down > > >> no matter what). > > >> > > > > > > Here is one example of a quick glimpse: > > > > > > static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) { > > > [...] > > > > > > ret = domain_context_mapping(pdev->domain, devfn, pdev); > > > if ( ret ) > > > { > > > dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping > failed\n", > > > pdev->domain->domain_id); > > > return ret; > > > } > > > > > > If pdev->domain is changed inside due to pci_hide_device, the error > > > message will be inaccurate. > > > > But that's a thing trivial to fix up. You had talked about tricky things... > > > > It's not about how this specific thing can be fixed. I didn't check all the > code. > There could be more examples, and in the future all new code need to be aware > that the majority of IOMMU functions may have pdev->domain changed due to > error. My concern is more that it's not a good design. I think it's natural > to have > a state changed only after all existing references in the call chain have been > completed. Adding a new flag to delay hiding device looks much clearer to me. > It looks we are not on the same page for how to hide a device. Actually I have implemented these two ideas with pci_hide_device() or flag in previous versions. Andrew and Tim, what's your opinion? Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |