[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN RFC PATCH v5 3/5] xen/public: Introduce PV-IOMMU hypercall interface



Hello Stefano,

Le 03/02/2025 à 18:47, Stefano Stabellini a écrit :
> On Mon, 3 Feb 2025, Teddy Astie wrote:
>> Hello Jason,
>>
>> Le 30/01/2025 à 21:17, Jason Andryuk a écrit :
>>> Hi Teddy,
>>>
>>> Thanks for working on this.  I'm curious about your plans for this:
>>>
>>> On 2025-01-21 11:13, Teddy Astie wrote:
>>>> +/**
>>>> + * IOMMU_alloc_nested
>>>> + * Create a nested IOMMU context (needs IOMMUCAP_nested).
>>>> + *
>>>> + * This context uses a platform-specific page table from domain
>>>> address space
>>>> + * specified in pgtable_gfn and use it for nested translations.
>>>> + *
>>>> + * Explicit flushes needs to be submited with IOMMU_flush_nested on
>>>> + * modification of the nested pagetable to ensure coherency between
>>>> IOTLB and
>>>> + * nested page table.
>>>> + *
>>>> + * This context can be destroyed using IOMMU_free_context.
>>>> + * This context cannot be modified using map_pages, unmap_pages.
>>>> + */
>>>> +struct pv_iommu_alloc_nested {
>>>> +    /* OUT: allocated IOMMU context number */
>>>> +    uint16_t ctx_no;
>>>> +
>>>> +    /* IN: guest frame number of the nested page table */
>>>> +    uint64_aligned_t pgtable_gfn;
>>>> +
>>>> +    /* IN: nested mode flags */
>>>> +    uint64_aligned_t nested_flags;
>>>> +};
>>>> +typedef struct pv_iommu_alloc_nested pv_iommu_alloc_nested_t;
>>>> +DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_nested_t);
>>>
>>> Is this command intended to be used for GVA -> GPA translation?  Would
>>> you need some way to associate with another iommu context for GPA -> HPA
>>> translation?
>>>
>>
>> It's intended to be used for accelerating IOMMU emulation for the guest.
>> So in this case the other GPA->HPA translation is domain's p2m
>> page-table (or something similar) such as the translations made from
>> this nested context is meaningful from guest point of view.
>>
>> The idea to use it is to use the "remote_op" sub-command to let the
>> device model (e.g QEMU) alter the IOMMU behavior for the affected domain
>> (e.g by reattaching devices, making new IOMMU contexts, ...).
>>
>> I think it can also be used for virtio-iommu pagetable.
>>
>>> Maybe more broadly, what are your goals for enabling PV-IOMMU?  The
>>> examples on your blog post cover a domain restrict device access to
>>> specific portions of the the GPA space.  Are you also interested in GVA
>>> -> GPA?  Does VFIO required GVA -> GPA?
>>>
>>
>> The current way of enabling and using PV-IOMMU is with the dedicated
>> Linux IOMMU driver [1] that implements Linux's IOMMU subsystem with this
>> proposed interface.
>> This in practice in the PV case replaces the xen-swiotlb with dma-iommu
>> and do all DMA through the paravirtualized IOMMU (e.g creating DMA
>> domains, moving devices to it).
>>
>> Regarding GVA->GPA, this is what this interface provides, and
>> restricting device access to memory is one way of using it. This is a
>> requirement for VFIO (as it is also one for Linux IOMMU), and I managed
>> to make SPDK and DPDK work in Dom0 using VFIO and these patches [2].
>
> Thanks for the explanation, Teddy. It certainly seems that this work is
> moving in the right direction. However, I have a couple of questions on
> this point, as I admit that I have not fully understood it.
>
> Modern IOMMUs support two stages of translation. Using ARM terminology,
> these are referred to as stage2 and stage1. The stage2 translation is
> used by Xen for the domain's GPA->PA (p2m) mapping. The pagetable used
> for stage2 could potentially be shared with the pagetable used by Xen
> for the p2m. Stage1 is typically controlled by the guest for its own
> address translations, mapping guest virtual addresses (GVA) to guest
> physical addresses (GPA).
>
> I can see that this patch series introduces an interface that allows
> Dom0 to modify its own stage2 mappings.
>
> My question is: how do we allow the domain to also set up stage1
> mappings? I would assume that the interface would take the form of a
> hypercall that allows a domain to pass a stage1 pagetable pointer, which
> Xen would then use to configure the IOMMU stage1. However, I do not see
> such a hypercall in your series. I was under the impression that
> GVA-to-GPA translation was left as future work, so I am confused by your
> statement that this patch series already provides it.
>

There are 2 interfaces for the guest (and device model) to configure its
GVA-to-GPA mappings. There are map/unmap subcommands and (depending on
hardware support) nested iommu contexts where the guest/device model
provides the GPA of the stage1 pagetable (this is meant to be a way of
accelerating IOMMU emulation through QEMU).

Here is the hypercall subcommands for map/unmap where you can map and
unmap pages to the "paravirtualized IOMMU context" (making the mapped
region visible to devices of the context attached through reattach_device).

/**
  * IOMMU_map_pages
  * Map pages on a IOMMU context.
  *
  * pgsize must be supported by pgsize_mask.
  * Fails with -EINVAL if mapping on top of another mapping.
  * Report actually mapped page count in mapped field (regardless of
failure).
  */
struct pv_iommu_map_pages {
     /* IN: IOMMU context number */
     uint16_t ctx_no;

     /* IN: Guest frame number */
     uint64_aligned_t gfn;

     /* IN: Device frame number */
     uint64_aligned_t dfn;

     /* IN: Map flags */
     uint32_t map_flags;

     /* IN: Size of pages to map */
     uint32_t pgsize;

     /* IN: Number of pages to map */
     uint32_t nr_pages;

     /* OUT: Number of pages actually mapped */
     uint32_t mapped;
};
typedef struct pv_iommu_map_pages pv_iommu_map_pages_t;
DEFINE_XEN_GUEST_HANDLE(pv_iommu_map_pages_t);

/**
  * IOMMU_unmap_pages
  * Unmap pages on a IOMMU context.
  *
  * pgsize must be supported by pgsize_mask.
  * Report actually unmapped page count in mapped field (regardless of
failure).
  * Fails with -ENOENT when attempting to unmap a page without any mapping
  */
struct pv_iommu_unmap_pages {
     /* IN: IOMMU context number */
     uint16_t ctx_no;

     /* IN: Device frame number */
     uint64_aligned_t dfn;

     /* IN: Size of pages to unmap */
     uint32_t pgsize;

     /* IN: Number of pages to unmap */
     uint32_t nr_pages;

     /* OUT: Number of pages actually unmapped */
     uint32_t unmapped;
};
typedef struct pv_iommu_unmap_pages pv_iommu_unmap_pages_t;
DEFINE_XEN_GUEST_HANDLE(pv_iommu_unmap_pages_t);

If the hardware supports it, there is a alternative (still being
drafted) interface to allow the guest to directly provide native pagetables.

This is exposed through the "_nested" subcommands, there is no
implementation of this feature in this patch series yet.

/**
  * IOMMU_alloc_nested
  * Create a nested IOMMU context (needs IOMMUCAP_nested).
  *
  * This context uses a platform-specific page table from domain address
space
  * specified in pgtable_gfn and use it for nested translations.
  *
  * Explicit flushes needs to be submited with IOMMU_flush_nested on
  * modification of the nested pagetable to ensure coherency between
IOTLB and
  * nested page table.
  *
  * This context can be destroyed using IOMMU_free_context.
  * This context cannot be modified using map_pages, unmap_pages.
  */
struct pv_iommu_alloc_nested {
     /* OUT: allocated IOMMU context number */
     uint16_t ctx_no;

     /* IN: guest frame number of the nested page table */
     uint64_aligned_t pgtable_gfn;

     /* IN: nested mode flags */
     uint64_aligned_t nested_flags;
};
typedef struct pv_iommu_alloc_nested pv_iommu_alloc_nested_t;
DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_nested_t);

/**
  * IOMMU_flush_nested (needs IOMMUCAP_nested)
  * Flush the IOTLB for nested translation.
  */
struct pv_iommu_flush_nested {
     /* TODO */
};
typedef struct pv_iommu_flush_nested pv_iommu_flush_nested_t;
DEFINE_XEN_GUEST_HANDLE(pv_iommu_flush_nested_t);


>
>
>
>> [1] Originally
>> https://lists.xen.org/archives/html/xen-devel/2024-06/msg01145.html but
>> this patch got quite old and probably doesn't work anymore with this new
>> Xen patch series.
>> I have a updated patch in my xen-pviommu branch
>> https://gitlab.com/xen-project/people/tsnake41/linux/-/commit/125d67a09fa9f66a32f9175641cfccca22dbbdb6
>>
>> [2] You also need to set "vfio_iommu_type1.allow_unsafe_interrupts=1" to
>> make VFIO work for now.

Thanks
Teddy



Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.