[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XEN RFC PATCH v5 3/5] xen/public: Introduce PV-IOMMU hypercall interface
Hello Stefano, Le 03/02/2025 à 18:47, Stefano Stabellini a écrit : > On Mon, 3 Feb 2025, Teddy Astie wrote: >> Hello Jason, >> >> Le 30/01/2025 à 21:17, Jason Andryuk a écrit : >>> Hi Teddy, >>> >>> Thanks for working on this. I'm curious about your plans for this: >>> >>> On 2025-01-21 11:13, Teddy Astie wrote: >>>> +/** >>>> + * IOMMU_alloc_nested >>>> + * Create a nested IOMMU context (needs IOMMUCAP_nested). >>>> + * >>>> + * This context uses a platform-specific page table from domain >>>> address space >>>> + * specified in pgtable_gfn and use it for nested translations. >>>> + * >>>> + * Explicit flushes needs to be submited with IOMMU_flush_nested on >>>> + * modification of the nested pagetable to ensure coherency between >>>> IOTLB and >>>> + * nested page table. >>>> + * >>>> + * This context can be destroyed using IOMMU_free_context. >>>> + * This context cannot be modified using map_pages, unmap_pages. >>>> + */ >>>> +struct pv_iommu_alloc_nested { >>>> + /* OUT: allocated IOMMU context number */ >>>> + uint16_t ctx_no; >>>> + >>>> + /* IN: guest frame number of the nested page table */ >>>> + uint64_aligned_t pgtable_gfn; >>>> + >>>> + /* IN: nested mode flags */ >>>> + uint64_aligned_t nested_flags; >>>> +}; >>>> +typedef struct pv_iommu_alloc_nested pv_iommu_alloc_nested_t; >>>> +DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_nested_t); >>> >>> Is this command intended to be used for GVA -> GPA translation? Would >>> you need some way to associate with another iommu context for GPA -> HPA >>> translation? >>> >> >> It's intended to be used for accelerating IOMMU emulation for the guest. >> So in this case the other GPA->HPA translation is domain's p2m >> page-table (or something similar) such as the translations made from >> this nested context is meaningful from guest point of view. >> >> The idea to use it is to use the "remote_op" sub-command to let the >> device model (e.g QEMU) alter the IOMMU behavior for the affected domain >> (e.g by reattaching devices, making new IOMMU contexts, ...). >> >> I think it can also be used for virtio-iommu pagetable. >> >>> Maybe more broadly, what are your goals for enabling PV-IOMMU? The >>> examples on your blog post cover a domain restrict device access to >>> specific portions of the the GPA space. Are you also interested in GVA >>> -> GPA? Does VFIO required GVA -> GPA? >>> >> >> The current way of enabling and using PV-IOMMU is with the dedicated >> Linux IOMMU driver [1] that implements Linux's IOMMU subsystem with this >> proposed interface. >> This in practice in the PV case replaces the xen-swiotlb with dma-iommu >> and do all DMA through the paravirtualized IOMMU (e.g creating DMA >> domains, moving devices to it). >> >> Regarding GVA->GPA, this is what this interface provides, and >> restricting device access to memory is one way of using it. This is a >> requirement for VFIO (as it is also one for Linux IOMMU), and I managed >> to make SPDK and DPDK work in Dom0 using VFIO and these patches [2]. > > Thanks for the explanation, Teddy. It certainly seems that this work is > moving in the right direction. However, I have a couple of questions on > this point, as I admit that I have not fully understood it. > > Modern IOMMUs support two stages of translation. Using ARM terminology, > these are referred to as stage2 and stage1. The stage2 translation is > used by Xen for the domain's GPA->PA (p2m) mapping. The pagetable used > for stage2 could potentially be shared with the pagetable used by Xen > for the p2m. Stage1 is typically controlled by the guest for its own > address translations, mapping guest virtual addresses (GVA) to guest > physical addresses (GPA). > > I can see that this patch series introduces an interface that allows > Dom0 to modify its own stage2 mappings. > > My question is: how do we allow the domain to also set up stage1 > mappings? I would assume that the interface would take the form of a > hypercall that allows a domain to pass a stage1 pagetable pointer, which > Xen would then use to configure the IOMMU stage1. However, I do not see > such a hypercall in your series. I was under the impression that > GVA-to-GPA translation was left as future work, so I am confused by your > statement that this patch series already provides it. > There are 2 interfaces for the guest (and device model) to configure its GVA-to-GPA mappings. There are map/unmap subcommands and (depending on hardware support) nested iommu contexts where the guest/device model provides the GPA of the stage1 pagetable (this is meant to be a way of accelerating IOMMU emulation through QEMU). Here is the hypercall subcommands for map/unmap where you can map and unmap pages to the "paravirtualized IOMMU context" (making the mapped region visible to devices of the context attached through reattach_device). /** * IOMMU_map_pages * Map pages on a IOMMU context. * * pgsize must be supported by pgsize_mask. * Fails with -EINVAL if mapping on top of another mapping. * Report actually mapped page count in mapped field (regardless of failure). */ struct pv_iommu_map_pages { /* IN: IOMMU context number */ uint16_t ctx_no; /* IN: Guest frame number */ uint64_aligned_t gfn; /* IN: Device frame number */ uint64_aligned_t dfn; /* IN: Map flags */ uint32_t map_flags; /* IN: Size of pages to map */ uint32_t pgsize; /* IN: Number of pages to map */ uint32_t nr_pages; /* OUT: Number of pages actually mapped */ uint32_t mapped; }; typedef struct pv_iommu_map_pages pv_iommu_map_pages_t; DEFINE_XEN_GUEST_HANDLE(pv_iommu_map_pages_t); /** * IOMMU_unmap_pages * Unmap pages on a IOMMU context. * * pgsize must be supported by pgsize_mask. * Report actually unmapped page count in mapped field (regardless of failure). * Fails with -ENOENT when attempting to unmap a page without any mapping */ struct pv_iommu_unmap_pages { /* IN: IOMMU context number */ uint16_t ctx_no; /* IN: Device frame number */ uint64_aligned_t dfn; /* IN: Size of pages to unmap */ uint32_t pgsize; /* IN: Number of pages to unmap */ uint32_t nr_pages; /* OUT: Number of pages actually unmapped */ uint32_t unmapped; }; typedef struct pv_iommu_unmap_pages pv_iommu_unmap_pages_t; DEFINE_XEN_GUEST_HANDLE(pv_iommu_unmap_pages_t); If the hardware supports it, there is a alternative (still being drafted) interface to allow the guest to directly provide native pagetables. This is exposed through the "_nested" subcommands, there is no implementation of this feature in this patch series yet. /** * IOMMU_alloc_nested * Create a nested IOMMU context (needs IOMMUCAP_nested). * * This context uses a platform-specific page table from domain address space * specified in pgtable_gfn and use it for nested translations. * * Explicit flushes needs to be submited with IOMMU_flush_nested on * modification of the nested pagetable to ensure coherency between IOTLB and * nested page table. * * This context can be destroyed using IOMMU_free_context. * This context cannot be modified using map_pages, unmap_pages. */ struct pv_iommu_alloc_nested { /* OUT: allocated IOMMU context number */ uint16_t ctx_no; /* IN: guest frame number of the nested page table */ uint64_aligned_t pgtable_gfn; /* IN: nested mode flags */ uint64_aligned_t nested_flags; }; typedef struct pv_iommu_alloc_nested pv_iommu_alloc_nested_t; DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_nested_t); /** * IOMMU_flush_nested (needs IOMMUCAP_nested) * Flush the IOTLB for nested translation. */ struct pv_iommu_flush_nested { /* TODO */ }; typedef struct pv_iommu_flush_nested pv_iommu_flush_nested_t; DEFINE_XEN_GUEST_HANDLE(pv_iommu_flush_nested_t); > > > >> [1] Originally >> https://lists.xen.org/archives/html/xen-devel/2024-06/msg01145.html but >> this patch got quite old and probably doesn't work anymore with this new >> Xen patch series. >> I have a updated patch in my xen-pviommu branch >> https://gitlab.com/xen-project/people/tsnake41/linux/-/commit/125d67a09fa9f66a32f9175641cfccca22dbbdb6 >> >> [2] You also need to set "vfio_iommu_type1.allow_unsafe_interrupts=1" to >> make VFIO work for now. Thanks Teddy Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |