[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Thursday, January 14, 2016 5:00 PM
> 
> >>> On 14.01.16 at 02:22, <kevin.tian@xxxxxxxxx> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Wednesday, January 13, 2016 7:03 PM
> >>
> >> >> Jan,
> >> >> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 
> >> >> 1/2
> >> > and patch 2/2).
> >> >> We have discussed how to hide a device with pci_hide_device() when
> > Device-TLB
> >> > flush is
> >> >> error.
> >> >>
> >> >> Now there are 2 concerns:
> >> >>       1. Hide the PCI device may break the code path. (then the 
> >> >> pdev->domain
> >> > is
> >> >> dom_xen)
> >> >>       2. Is the blew logic right?
> >> >>            --If Device-TLB flush is timeout, we'll hide the target ATS 
> >> >> device
> >> > and crash the
> >> >> domain owning this ATS device.
> >> >>              If impacted domain is hardware domain, just throw out a
> >> > warning, instead of
> >> >> crash the hardware domain.
> >> >>             The hided Device will be disallowed to be further assigned 
> >> >> to
> >> > any domain.
> >> >>
> >> >> Kevin, correct me if I am wrong.
> >> >>
> >> >>
> >> >
> >> > for 2, yes it's the policy we agreed in previous discussion.
> >> >
> >> > for 1, after more thinking I think the concern is real. pci_hide_device
> >> > is used once in early boot-up phase. At that time, it's simple to just
> >> > have right owner configured. However in the path of normal device
> >> > assign/deassign, there are tons of more state associated which may
> >> > be related to the owner. Though we may do some special handling
> >> > in related code paths to have dom_xen specially handled, it's way
> >> > tricky and not easy to maintain.
> >>
> >> I don't buy this without one of you pointing out the actual
> >> difficulties: The domain is being crashed anyway, so there's no
> >> true device de-assignment needed (IOMMU tables will get torn
> >> down no matter what).
> >>
> >
> > Here is one example of a quick glimpse:
> >
> > static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
> > {
> >     [...]
> >
> >     ret = domain_context_mapping(pdev->domain, devfn, pdev);
> >     if ( ret )
> >     {
> >         dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping failed\n",
> >                 pdev->domain->domain_id);
> >         return ret;
> >     }
> >
> > If pdev->domain is changed inside due to pci_hide_device, the error
> > message will be inaccurate.
> 
> But that's a thing trivial to fix up. You had talked about tricky things...
> 

It's not about how this specific thing can be fixed. I didn't check all the 
code. 
There could be more examples, and in the future all new code need to be
aware that the majority of IOMMU functions may have pdev->domain changed
due to error. My concern is more that it's not a good design. I think it's 
natural
to have a state changed only after all existing references in the call 
chain have been completed. Adding a new flag to delay hiding device looks much 
clearer to me.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.