[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional


  • To: "paul@xxxxxxx" <paul@xxxxxxx>, 'Jan Beulich' <jbeulich@xxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Tue, 17 Mar 2020 06:20:35 +0000
  • Accept-language: en-US
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Tue, 17 Mar 2020 06:21:30 +0000
  • Dlp-product: dlpe-windows
  • Dlp-reaction: no-action
  • Dlp-version: 11.2.0.6
  • Ironport-sdr: peubvnw+CHfN6WhQ1SgvCJIfzi0Qdf0VdVVznV2g/8+tkbK2yhoMz1yIu3aiFLNq39mHQABIE8 2/U7RpUFr++w==
  • Ironport-sdr: y1BsOTN2SX7QOanIwGeE3YrNWEtWbwlTEv3kmMkrODvIN82UT5FaJdmANsTSdTKR6n8YNEj1nY 0DikphrqvpCw==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHV9gMvA8DkILGx5UOf5iWNvNKXRqhBS9AQ///QHgCABLZlcP//2TwAgAAVKoCABpircA==
  • Thread-topic: [PATCH v3] IOMMU: make DMA containment of quarantined devices optional

> From: Paul Durrant <xadimgnik@xxxxxxxxx>
> Sent: Friday, March 13, 2020 5:26 PM
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@xxxxxxxx>
> > Sent: 13 March 2020 08:10
> > To: Tian, Kevin <kevin.tian@xxxxxxxxx>
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx>; Paul Durrant
> > <paul@xxxxxxx>
> > Subject: Re: [PATCH v3] IOMMU: make DMA containment of quarantined
> devices optional
> >
> > On 13.03.2020 04:05, Tian, Kevin wrote:
> > >> From: Jan Beulich <jbeulich@xxxxxxxx>
> > >> Sent: Tuesday, March 10, 2020 6:31 PM
> > >>
> > >> On 10.03.2020 06:30, Tian, Kevin wrote:
> > >>>> From: Jan Beulich <jbeulich@xxxxxxxx>
> > >>>> Sent: Monday, March 9, 2020 7:09 PM
> > >>>>
> > >>>> Containing still in flight DMA was introduced to work around certain
> > >>>> devices / systems hanging hard upon hitting a "not-present" IOMMU
> fault.
> > >>>> Passing through (such) devices (on such systems) is inherently
> insecure
> > >>>> (as guests could easily arrange for IOMMU faults of any kind to occur).
> > >>>> Defaulting to a mode where admins may not even become aware of
> > >> issues
> > >>>> with devices can be considered undesirable. Therefore convert this
> mode
> > >>>> of operation to an optional one, not one enabled by default.
> > >>>
> > >>> Here is another thought. The whole point of quarantine is to contain
> > >>> the device after it is deassigned from untrusted guest.
> > >>
> > >> I'd question the "untrusted" here. Assigning devices to untrusted
> > >> guests is problematic anyway, unless you're the device manufacturer
> > >> and device firmware writer, and hence you can guarantee the device
> > >> to not offer any backdoors or alike. Therefore I view quarantining
> > >
> > > Aren't all guests untrusted from hypervisor p.o.v, which is the reason
> > > why IOMMU was introduced in the first place?
> >
> > "Untrusted" is always meant from the perspective of the host admin.
> >
> > > I may overlook the history of quarantine feature. Based on my study
> > > of quarantine related changes, looks initially Paul pointed out such
> > > problem that a device may have in-fly DMAs to touch dom0 memory
> > > after it is deassigned. Then he introduced the quarantine concept by
> > > putting a quarantined device into dom_io w/o any valid mapping,
> > > with a whitelist approach. You later extended as a default behavior
> > > for all PCI devices. Now Paul found some device which cannot tolerate
> > > read-fault and then came up this scratch-page idea.
> > >
> > > Now I wonder why we are doing such explicit quarantine in the first
> > > place. Shouldn't we always seek resetting the device when it is deassigned
> > > from a guest? 'reset' should cancel/quiescent all in-fly DMAs for most
> > > devices if they implement 'reset' correctly.
> >
> > And the important word here is "should". Paul and colleagues found
> > it may not do so in reality.
> 
> Yeah... we have to live with what we've got. Yes, it's buggy as hell but we
> have to do our best to stop it wedging hosts whilst trying to prevent
> scribbling over critical parts of memory.

'should' is applied to most devices who can gracefully handle the reset
request, and then for exception we come with ad-hoc band-aid.

> 
> >
> > > If doing that way, we don't
> > > need a quarantine option at all, and then the bogus device in Paul's
> > > latest finding could be handled by a standalone option w/o struggling
> > > whether 'full' is a right name vs. 'basic'. or we may introduce a reset
> > > flag when assigning such device to indicate such special requirement,
> > > so a scratch page/dom_io can be linked specifically for such device
> > > post reset, assuming it is not a platform-level bug from Paul's response?
> >
> > Which would imply host admins to know such properties of their
> > devices, and better _without_ first having run into problems.
> >
> 
> It is a device-level bug. We could, I guess, have a per-device quirk to say
> whether it should get a context entry pointing at a scratch page or not.
> 

If all except me think supporting such device in upstream is necessary, 
per-device quirk at least sounds better to me than enforcing a global 
quarantine policy for all devices.

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.