[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver



Hello Robin,

Le 26/06/2024 à 14:09, Robin Murphy a écrit :
> On 2024-06-24 3:36 pm, Teddy Astie wrote:
>> Hello Robin,
>> Thanks for the thourough review.
>>
>>>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>>>> index 0af39bbbe3a3..242cefac77c9 100644
>>>> --- a/drivers/iommu/Kconfig
>>>> +++ b/drivers/iommu/Kconfig
>>>> @@ -480,6 +480,15 @@ config VIRTIO_IOMMU
>>>>          Say Y here if you intend to run this kernel as a guest.
>>>> +config XEN_IOMMU
>>>> +    bool "Xen IOMMU driver"
>>>> +    depends on XEN_DOM0
>>>
>>> Clearly this depends on X86 as well.
>>>
>> Well, I don't intend this driver to be X86-only, even though the current
>> Xen RFC doesn't support ARM (yet). Unless there is a counter-indication
>> for it ?
>
> It's purely practical - even if you drop the asm/iommu.h stuff it would
> still break ARM DOM0 builds due to HYPERVISOR_iommu_op() only being
> defined for x86. And it's better to add a dependency here to make it
> clear what's *currently* supported, than to add dummy code to allow it
> to build for ARM if that's not actually tested or usable yet.
>

Ok, it does exist from the hypervisor side (even though a such Xen build
is not possible yet), I suppose I just need to add relevant hypercall
interfaces for arm/arm64 in hypercall{.S,.h}.

>>>> +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
>>>> +{
>>>> +    switch (cap) {
>>>> +    case IOMMU_CAP_CACHE_COHERENCY:
>>>> +        return true;
>>>
>>> Will the PV-IOMMU only ever be exposed on hardware where that really is
>>> always true?
>>>
>>
>> On the hypervisor side, the PV-IOMMU interface always implicitely flush
>> the IOMMU hardware on map/unmap operation, so at the end of the
>> hypercall, the cache should be always coherent IMO.
>
> As Jason already brought up, this is not about TLBs or anything cached
> by the IOMMU itself, it's about the memory type(s) it can create
> mappings with. Returning true here says Xen guarantees it can use a
> cacheable memory type which will let DMA snoop the CPU caches.
> Furthermore, not explicitly handling IOMMU_CACHE in the map_pages op
> then also implies that it will *always* do that, so you couldn't
> actually get an uncached mapping even if you wanted one.
>

Yes, this is a point we are currently discussing on the Xen side.

>>>> +    while (xen_pg_count) {
>>>> +        size_t to_unmap = min(xen_pg_count, max_nr_pages);
>>>> +
>>>> +        //pr_info("Unmapping %lx-%lx\n", dfn, dfn + to_unmap - 1);
>>>> +
>>>> +        op.unmap_pages.dfn = dfn;
>>>> +        op.unmap_pages.nr_pages = to_unmap;
>>>> +
>>>> +        ret = HYPERVISOR_iommu_op(&op);
>>>> +
>>>> +        if (ret)
>>>> +            pr_warn("Unmap failure (%lx-%lx)\n", dfn, dfn + to_unmap
>>>> - 1);
>>>
>>> But then how
>>>> would it ever happen anyway? Unmap is a domain op, so a domain which
>>>> doesn't allow unmapping shouldn't offer it in the first place...
>>
>> Unmap failing should be exceptionnal, but is possible e.g with
>> transparent superpages (like Xen IOMMU drivers do). Xen drivers folds
>> appropriate contiguous mappings into superpages entries to optimize
>> memory usage and iotlb. However, if you unmap in the middle of a region
>> covered by a superpage entry, this is no longer a valid superpage entry,
>> and you need to allocate and fill the lower levels, which is faillible
>> if lacking memory.
>
> OK, so in the worst case you could potentially have a partial unmap
> failure if the range crosses a superpage boundary and the end part
> happens to have been folded, and Xen doesn't detect and prepare that
> allocation until it's already unmapped up to the boundary. If that is
> so, does the hypercall interface give any information about partial
> failure, or can any error only be taken to mean that some or all of the
> given range may or may not have be unmapped now?

The hypercall interface has op.unmap_page.unmapped to indicate the real
amount of pages actually unmapped even in case of error. If it is less
than requested initially, it will retry with the remaining part,
certainly fail afterward. Although, I have the impression that it is
going to fail silently.

>>> In this case I'd argue that you really *do* want to return short, in the
>>> hope of propagating the error back up and letting the caller know the
>>> address space is now messed up before things start blowing up even more
>>> if they keep going and subsequently try to map new pages into
>>> not-actually-unmapped VAs.
>>
>> While mapping on top of another mapping is ok for us (it's just going to
>> override the previous mapping), I definetely agree that having the
>> address space messed up is not good.
>
> Oh, indeed, quietly replacing existing PTEs might help paper over errors
> in this particular instance, but it does then allow *other* cases to go
> wrong in fun and infuriating ways :)
>

Yes, but I was wrong at my stance. Our hypercall interface doesn't allow
mapping on top of another one (it's going to fail with -EADDRINUSE). So
unmap failing is still going to cause bad issues.

Should iommu_unmap and related code consider the cases where unmapped !=
size and report it appropriately ? So such cases, if ever happening will
be reported louder and not failing silently.

>>>> +static struct iommu_domain default_domain = {
>>>> +    .ops = &(const struct iommu_domain_ops){
>>>> +        .attach_dev = default_domain_attach_dev
>>>> +    }
>>>> +};
>>>
>>> Looks like you could make it a static xen_iommu_domain and just use the
>>> normal attach callback? Either way please name it something less
>>> confusing like xen_iommu_identity_domain - "default" is far too
>>> overloaded round here already...
>>>
>>
>> Yes, although, if in the future, we can have either this domain as
>> identity or blocking/paging depending on some upper level configuration.
>> Should we have both identity and blocking domains, and only setting the
>> relevant one in iommu_ops, or keep this naming.
>
> That's something that can be considered if and when it does happen. For
> now, if it's going to be pre-mapped as an identity domain, then let's
> just treat it as such and keep things straightforward.
>

Let's name it xen_iommu_identity_domain.

>>>> +void __exit xen_iommu_fini(void)
>>>> +{
>>>> +    pr_info("Unregistering Xen IOMMU driver\n");
>>>> +
>>>> +    iommu_device_unregister(&xen_iommu_device);
>>>> +    iommu_device_sysfs_remove(&xen_iommu_device);
>>>> +}
>>>
>>> This is dead code since the Kconfig is only "bool". Either allow it to
>>> be an actual module (and make sure that works), or drop the pretence
>>> altogether.
>>>
>>
>> Ok, I though this function was actually a requirement even if it is not
>> a module.
>
> No, quite the opposite - even code which is modular doesn't have to
> support removal if it doesn't want to.
>

Ok

> Thanks,
> Robin.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.