[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.




> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, January 14, 2016 5:13 PM
> To: Jan Beulich; Xu, Quan
> Cc: andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxxxxx; Wu, Feng;
> Nakajima, Jun; xen-devel@xxxxxxxxxxxxx; keir@xxxxxxx; tim@xxxxxxx
> Subject: RE: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush 
> timeout
> issue.
> 

> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > Sent: Thursday, January 14, 2016 5:00 PM
> >
> > >>> On 14.01.16 at 02:22, <kevin.tian@xxxxxxxxx> wrote:
> > >>  From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> Sent: Wednesday, January 13, 2016 7:03 PM
> > >>
> > >> >> Jan,
> > >> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's
> > >> >> patch 1/2
> > >> > and patch 2/2).
> > >> >> We have discussed how to hide a device with pci_hide_device()
> > >> >> when
> > > Device-TLB
> > >> > flush is
> > >> >> error.
> > >> >>
> > >> >> Now there are 2 concerns:
> > >> >>       1. Hide the PCI device may break the code path. (then the
> > >> >> pdev->domain
> > >> > is
> > >> >> dom_xen)
> > >> >>       2. Is the blew logic right?
> > >> >>            --If Device-TLB flush is timeout, we'll hide the
> > >> >> target ATS device
> > >> > and crash the
> > >> >> domain owning this ATS device.
> > >> >>              If impacted domain is hardware domain, just throw
> > >> >> out a
> > >> > warning, instead of
> > >> >> crash the hardware domain.
> > >> >>             The hided Device will be disallowed to be further
> > >> >> assigned to
> > >> > any domain.
> > >> >>
> > >> >> Kevin, correct me if I am wrong.
> > >> >>
> > >> >>
> > >> >
> > >> > for 2, yes it's the policy we agreed in previous discussion.
> > >> >
> > >> > for 1, after more thinking I think the concern is real.
> > >> > pci_hide_device is used once in early boot-up phase. At that
> > >> > time, it's simple to just have right owner configured. However in
> > >> > the path of normal device assign/deassign, there are tons of more
> > >> > state associated which may be related to the owner. Though we may
> > >> > do some special handling in related code paths to have dom_xen
> > >> > specially handled, it's way tricky and not easy to maintain.
> > >>
> > >> I don't buy this without one of you pointing out the actual
> > >> difficulties: The domain is being crashed anyway, so there's no
> > >> true device de-assignment needed (IOMMU tables will get torn down
> > >> no matter what).
> > >>
> > >
> > > Here is one example of a quick glimpse:
> > >
> > > static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) {
> > >   [...]
> > >
> > >     ret = domain_context_mapping(pdev->domain, devfn, pdev);
> > >     if ( ret )
> > >     {
> > >         dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping
> failed\n",
> > >                 pdev->domain->domain_id);
> > >         return ret;
> > >     }
> > >
> > > If pdev->domain is changed inside due to pci_hide_device, the error
> > > message will be inaccurate.
> >
> > But that's a thing trivial to fix up. You had talked about tricky things...
> >
> 
> It's not about how this specific thing can be fixed. I didn't check all the 
> code.
> There could be more examples, and in the future all new code need to be aware
> that the majority of IOMMU functions may have pdev->domain changed due to
> error. My concern is more that it's not a good design. I think it's natural 
> to have
> a state changed only after all existing references in the call chain have been
> completed. Adding a new flag to delay hiding device looks much clearer to me.
> 

It looks we are not on the same page for how to hide a device.
Actually I have implemented these two ideas with pci_hide_device() or flag in 
previous versions.

Andrew and Tim, what's your opinion?


Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.