[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: p2m: Free the p2m entry after flushing the IOMMU TLBs



Hi Andrew,

On 8/13/19 9:03 AM, Andrew Cooper wrote:
On 12/08/2019 21:27, Julien Grall wrote:
When freeing a p2m entry, all the sub-tree behind it will also be freed.
This may include intermediate page-tables or any l3 entry requiring to
drop a reference (e.g for foreign pages). As soon as pages are freed,
they may be re-used by Xen or another domain. Therefore it is necessary
to flush *all* the TLBs beforehand.

While CPU TLBs will be flushed before freeing the pages, this is not
the case for IOMMU TLBs. This can be solved by moving the IOMMU TLBs
flush earlier in the code.

This wasn't considered as a security issue as device passthrough on Arm
is not security supported.

Signed-off-by: Julien Grall <julien.grall@xxxxxxx>

---

Cc: olekstysh@xxxxxxxxx
Cc: oleksandr_tyshchenko@xxxxxxxx

     I discovered it while looking at the code, so I don't have any
     reproducer of the issue. There is a small windows where page could
     be reallocated to Xen or another domain but still present in the
     IOMMU TLBs.

     This patch only address the case where the flush succeed. In the
     unlikely case where it does not succeed, then we will still free the
     pages. The IOMMU helper will crash domain, but the device may still
     not be quiescent. So there are a potentially issues do DMA on wrong
     things.

     At the moment, none of the Arm IOMMUs drivers (including the IPMMU
     one under review) are return an error here. Note that flush may
     still fail (see timeout), but is ignored. This is not great as it
     means a device may DMA into something that does not belong to the
     domain. So we probably want to return an error here.

     Even if an error is returned, there are still potential issues
     (see above). The fix is not entirely trivial, we would need to keep
     the page around until the a device is quiescent or the IOMMU is
     reset. This mostly likely means until the domain is fully destroyed.

Xen's behaviour with IOMMU timeouts is broken, and definitely unsafe.

We do not (and indeed must not) impose a timeout for TLB flush
operations locally in the core.  IOTLB flush operations are no different.

Well there is a difference between CPU TLB flush and IOTLB flush. For the former, the dsb isb sequence is enough to ensure the completion.

For the latter you need to poll a register to check the completion.


If the IOMMU starts malfunctioning, that is fatal to the whole system,
not just the guest in question.

Whether this is a fatal or not, we need to detect it and make sure it does not do more damage on the systems.


The only viable approach is to drop the artificial timeouts and up the
severity of a malfunction.

I am not sure to understand your suggestion... If you drop the timeout, how do you detect an IOMMU starts malfunctioning?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.