[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.

> From: Xu, Quan
> Sent: Thursday, January 07, 2016 9:47 PM
> > On January 07, 2016 9:28 PM, <JBeulich@xxxxxxxx> wrote:
> > >>> On 07.01.16 at 02:39, <quan.xu@xxxxxxxxx> wrote:
> > > On January 06, 2016 7:26 PM, <quan.xu@xxxxxxxxx> wrote:
> > >> > I didn't think about the full logic thoroughly now. But it would
> > >> > always be good to not hide device now until a point where all
> > >> > related states have been cleaned up in error handling path chained up.
> > >> >
> > >
> > > Jan, could you help me to double check it? thanks.
> >
> > I'm not sure I understand what you want or need, the more that I didn't even
> > get around to look at the patches yet.
> >
> Jan,
> Patch 2/3 and Patch 3/3 are based on v3 (Actually they are v3's patch 1/2 and 
> patch 2/2).
> We have discussed how to hide a device with pci_hide_device() when Device-TLB 
> flush is
> error.
> Now there are 2 concerns:
>       1. Hide the PCI device may break the code path. (then the pdev->domain 
> is
> dom_xen)
>       2. Is the blew logic right?
>            --If Device-TLB flush is timeout, we'll hide the target ATS device 
> and crash the
> domain owning this ATS device.
>              If impacted domain is hardware domain, just throw out a warning, 
> instead of
> crash the hardware domain.
>             The hided Device will be disallowed to be further assigned to any 
> domain.
> Kevin, correct me if I am wrong.

for 2, yes it's the policy we agreed in previous discussion.

for 1, after more thinking I think the concern is real. pci_hide_device
is used once in early boot-up phase. At that time, it's simple to just
have right owner configured. However in the path of normal device
assign/deassign, there are tons of more state associated which may
be related to the owner. Though we may do some special handling
in related code paths to have dom_xen specially handled, it's way
tricky and not easy to maintain.

I think the cleaner solution, similar to your earlier version, is to
set a flag and then continue existing calling chains with all required
error handling completed. Only at that place we can safely invoke
pci_hide_device. If outmost callers are scattered, we may do a 
lazy hide until next time when it's assigned to another guest while
the new flag is recognized.

Jan, your comments?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.