[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional

On 10.03.2020 04:43, Tian, Kevin wrote:
>> From: Jan Beulich <jbeulich@xxxxxxxx>
>> Sent: Monday, March 9, 2020 7:09 PM
>> I'm happy to take better suggestions to replace the "full" command line
>> option and Kconfig prompt tokens. I don't think though that "fault" and
>> "write-fault" are really suitable there.
> I think we may just allow both r/w access to scratch page for such bogus
> device, which may make 'full' more reasonable since we now fully
> contain in-fly DMAs. I'm not sure about the value of keeping write-fault
> alone for such devices (just because one observed his specific device only 
> has problem with read-fault).

Well, a fundamental problem I have here is that I still don't know
the _exact_ conditions for the observed hangs. I consider it unlikely
for IOMMU read faults to cause hangs, but for write faults to be
"fine". It would seem more likely to me that e.g. a non-present
context entry might cause issues. If that was the case, we wouldn't
need to handle reads and writes differently; we could instead install
an all zero top level page table. And we'd still get all faults that
are supposed to surface. But perhaps Paul did try this back then, and
it turned out to not be an option.

The choice of letting writes continue to fault was based on (a) this
having been tested to work on the affected system(s) and (b) also
letting writes go to a scratch page requiring a per-device scratch
page (and associated page tables) rather than a system-wide one, as
devices coming from different domains would otherwise be able to
observe data written to memory by respectively "foreign" devices
(and hence domains).

But this is all guesswork without the firmware writers of affected
systems giving us at least some hints.

> alternatively I also thought about whether whitelisting the problematic 
> devices through another option (e.g. nofault=b:d:f) could provide more
> value. In concept any IOMMU page table (dom0, dom_io or domU) 
> for such bogus device should not include invalid entry, even when 
> quarantine is not specified. However I'm not sure whether it's worthy of 
> going so far...

Indeed. Question though is whether this bad behavior is device specific
(rather than e.g. system dependent). Plus - as per above - question
also is whether it's really leaf (or intermediate) page table entry
presence which actually matters here. If it was, I agree we shouldn't
have any non-present entries anywhere in the page table trees.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.