[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional



> -----Original Message-----
> From: Tian, Kevin <kevin.tian@xxxxxxxxx>
> Sent: 13 March 2020 03:23
> To: paul@xxxxxxx; 'Jan Beulich' <jbeulich@xxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; 'Andrew Cooper' 
> <andrew.cooper3@xxxxxxxxxx>
> Subject: RE: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of 
> quarantined devices optional
> 
> > From: Paul Durrant <xadimgnik@xxxxxxxxx>
> > Sent: Wednesday, March 11, 2020 12:05 AM
> >
> [...]
> >
> > >
> > > > However, is a really saying that things will break if any of the
> > > > PTEs has their present bit clear?
> > >
> > > Well, you said that read faults are fatal (to the host). Reads will,
> > > for any address with an unpopulated PTE, result in a fault and hence
> > > by implication be fatal.
> >
> > Oh I see. I thought there was an implication that the IOMMU could not cope
> > with non-present PTEs in some way. Agreed that, when the device is assigned
> > to the guest, then it can arrange (via ballooning) for a non-present entry 
> > to
> > be hit by a read transaction, resulting in a lock-up. But dealing with a
> > malicious guest was not the issue at hand... dealing with a buggy device 
> > that
> > still tried to DMA after reset and whilst in quarantine was the problem.
> >
> 
> More thinking on this, I wonder whether the scratch page is sufficient, or
> whether we should support such device in the first place. Looking at
> 0c35d446:
> --
>     The reason for doing this is that some hardware may continue to re-try
>     DMA (despite FLR) in the event of an error, or even BME being cleared, and
>     will fail to deal with DMA read faults gracefully. Having a scratch page
>     mapped will allow pending DMA reads to complete and thus such buggy
>     hardware will eventually be quiesced.
> --
> 
> 'eventually'... what does it exactly mean?

It means after a period of time we can only determine empirically.

> How would an user know a
> device has been quiesced before he attempts to re-assign the device
> to other domU or dom0? by guess?

Yes, a guess, but an educated one.

> Note the exact behavior of such
> device, after different guest behaviors (hang, kill, bug, etc.), is not
> documented. Who knows whether a in-fly DMA may be triggered when
> the new owner starts to initialize the device again? How many stale
> states are remaining on such device which, even not triggerring in-fly
> DMAs, may change the desired behavior of the new owner? e.g. it's
> possible one control register configured by the old owner, but not
> touched by the new owner. If it cannot be reset, what's the point of
> supporting assignment of such bogus device?
> 

Because I'm afraid it is quite ubiquitous and we need to deal with it.

> Thereby I feel any support of such bogus device should be maintained
> offtree, instead of in upstream Xen. Thoughts?
> 

I don't see the harm in the code being upstream. There may well be other 
devices with similar issues and it provides an option for an admin to try.

  Paul


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.