[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver



On 2024-06-24 3:36 pm, Teddy Astie wrote:
Hello Robin,
Thanks for the thourough review.

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0af39bbbe3a3..242cefac77c9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -480,6 +480,15 @@ config VIRTIO_IOMMU
         Say Y here if you intend to run this kernel as a guest.
+config XEN_IOMMU
+    bool "Xen IOMMU driver"
+    depends on XEN_DOM0

Clearly this depends on X86 as well.

Well, I don't intend this driver to be X86-only, even though the current
Xen RFC doesn't support ARM (yet). Unless there is a counter-indication
for it ?

It's purely practical - even if you drop the asm/iommu.h stuff it would still break ARM DOM0 builds due to HYPERVISOR_iommu_op() only being defined for x86. And it's better to add a dependency here to make it clear what's *currently* supported, than to add dummy code to allow it to build for ARM if that's not actually tested or usable yet.

+bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+    switch (cap) {
+    case IOMMU_CAP_CACHE_COHERENCY:
+        return true;

Will the PV-IOMMU only ever be exposed on hardware where that really is
always true?


On the hypervisor side, the PV-IOMMU interface always implicitely flush
the IOMMU hardware on map/unmap operation, so at the end of the
hypercall, the cache should be always coherent IMO.

As Jason already brought up, this is not about TLBs or anything cached by the IOMMU itself, it's about the memory type(s) it can create mappings with. Returning true here says Xen guarantees it can use a cacheable memory type which will let DMA snoop the CPU caches. Furthermore, not explicitly handling IOMMU_CACHE in the map_pages op then also implies that it will *always* do that, so you couldn't actually get an uncached mapping even if you wanted one.

+    while (xen_pg_count) {
+        size_t to_unmap = min(xen_pg_count, max_nr_pages);
+
+        //pr_info("Unmapping %lx-%lx\n", dfn, dfn + to_unmap - 1);
+
+        op.unmap_pages.dfn = dfn;
+        op.unmap_pages.nr_pages = to_unmap;
+
+        ret = HYPERVISOR_iommu_op(&op);
+
+        if (ret)
+            pr_warn("Unmap failure (%lx-%lx)\n", dfn, dfn + to_unmap
- 1);

But then how
would it ever happen anyway? Unmap is a domain op, so a domain which
doesn't allow unmapping shouldn't offer it in the first place...

Unmap failing should be exceptionnal, but is possible e.g with
transparent superpages (like Xen IOMMU drivers do). Xen drivers folds
appropriate contiguous mappings into superpages entries to optimize
memory usage and iotlb. However, if you unmap in the middle of a region
covered by a superpage entry, this is no longer a valid superpage entry,
and you need to allocate and fill the lower levels, which is faillible
if lacking memory.

OK, so in the worst case you could potentially have a partial unmap failure if the range crosses a superpage boundary and the end part happens to have been folded, and Xen doesn't detect and prepare that allocation until it's already unmapped up to the boundary. If that is so, does the hypercall interface give any information about partial failure, or can any error only be taken to mean that some or all of the given range may or may not have be unmapped now?
In this case I'd argue that you really *do* want to return short, in the
hope of propagating the error back up and letting the caller know the
address space is now messed up before things start blowing up even more
if they keep going and subsequently try to map new pages into
not-actually-unmapped VAs.

While mapping on top of another mapping is ok for us (it's just going to
override the previous mapping), I definetely agree that having the
address space messed up is not good.

Oh, indeed, quietly replacing existing PTEs might help paper over errors in this particular instance, but it does then allow *other* cases to go wrong in fun and infuriating ways :)

+static struct iommu_domain default_domain = {
+    .ops = &(const struct iommu_domain_ops){
+        .attach_dev = default_domain_attach_dev
+    }
+};

Looks like you could make it a static xen_iommu_domain and just use the
normal attach callback? Either way please name it something less
confusing like xen_iommu_identity_domain - "default" is far too
overloaded round here already...


Yes, although, if in the future, we can have either this domain as
identity or blocking/paging depending on some upper level configuration.
Should we have both identity and blocking domains, and only setting the
relevant one in iommu_ops, or keep this naming.

That's something that can be considered if and when it does happen. For now, if it's going to be pre-mapped as an identity domain, then let's just treat it as such and keep things straightforward.

+void __exit xen_iommu_fini(void)
+{
+    pr_info("Unregistering Xen IOMMU driver\n");
+
+    iommu_device_unregister(&xen_iommu_device);
+    iommu_device_sysfs_remove(&xen_iommu_device);
+}

This is dead code since the Kconfig is only "bool". Either allow it to
be an actual module (and make sure that works), or drop the pretence
altogether.


Ok, I though this function was actually a requirement even if it is not
a module.

No, quite the opposite - even code which is modular doesn't have to support removal if it doesn't want to.

Thanks,
Robin.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.