[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional

> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: 13 March 2020 08:10
> To: Tian, Kevin <kevin.tian@xxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper 
> <andrew.cooper3@xxxxxxxxxx>; Paul Durrant
> <paul@xxxxxxx>
> Subject: Re: [PATCH v3] IOMMU: make DMA containment of quarantined devices 
> optional
> On 13.03.2020 04:05, Tian, Kevin wrote:
> >> From: Jan Beulich <jbeulich@xxxxxxxx>
> >> Sent: Tuesday, March 10, 2020 6:31 PM
> >>
> >> On 10.03.2020 06:30, Tian, Kevin wrote:
> >>>> From: Jan Beulich <jbeulich@xxxxxxxx>
> >>>> Sent: Monday, March 9, 2020 7:09 PM
> >>>>
> >>>> Containing still in flight DMA was introduced to work around certain
> >>>> devices / systems hanging hard upon hitting a "not-present" IOMMU fault.
> >>>> Passing through (such) devices (on such systems) is inherently insecure
> >>>> (as guests could easily arrange for IOMMU faults of any kind to occur).
> >>>> Defaulting to a mode where admins may not even become aware of
> >> issues
> >>>> with devices can be considered undesirable. Therefore convert this mode
> >>>> of operation to an optional one, not one enabled by default.
> >>>
> >>> Here is another thought. The whole point of quarantine is to contain
> >>> the device after it is deassigned from untrusted guest.
> >>
> >> I'd question the "untrusted" here. Assigning devices to untrusted
> >> guests is problematic anyway, unless you're the device manufacturer
> >> and device firmware writer, and hence you can guarantee the device
> >> to not offer any backdoors or alike. Therefore I view quarantining
> >
> > Aren't all guests untrusted from hypervisor p.o.v, which is the reason
> > why IOMMU was introduced in the first place?
> "Untrusted" is always meant from the perspective of the host admin.
> > I may overlook the history of quarantine feature. Based on my study
> > of quarantine related changes, looks initially Paul pointed out such
> > problem that a device may have in-fly DMAs to touch dom0 memory
> > after it is deassigned. Then he introduced the quarantine concept by
> > putting a quarantined device into dom_io w/o any valid mapping,
> > with a whitelist approach. You later extended as a default behavior
> > for all PCI devices. Now Paul found some device which cannot tolerate
> > read-fault and then came up this scratch-page idea.
> >
> > Now I wonder why we are doing such explicit quarantine in the first
> > place. Shouldn't we always seek resetting the device when it is deassigned
> > from a guest? 'reset' should cancel/quiescent all in-fly DMAs for most
> > devices if they implement 'reset' correctly.
> And the important word here is "should". Paul and colleagues found
> it may not do so in reality.

Yeah... we have to live with what we've got. Yes, it's buggy as hell but we 
have to do our best to stop it wedging hosts whilst trying to prevent 
scribbling over critical parts of memory.

> > If doing that way, we don't
> > need a quarantine option at all, and then the bogus device in Paul's
> > latest finding could be handled by a standalone option w/o struggling
> > whether 'full' is a right name vs. 'basic'. or we may introduce a reset
> > flag when assigning such device to indicate such special requirement,
> > so a scratch page/dom_io can be linked specifically for such device
> > post reset, assuming it is not a platform-level bug from Paul's response?
> Which would imply host admins to know such properties of their
> devices, and better _without_ first having run into problems.

It is a device-level bug. We could, I guess, have a per-device quirk to say 
whether it should get a context entry pointing at a scratch page or not.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.