[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] Dom0 PV IOMMU control design (draft A)
On 16/04/14 15:13, Zhang, Xiantao wrote: > What's the real issue are you trying to resolve? Typically, Dom0's DMA memory > is allocated in guest-physical contiguous memory, and then exchanged to > machine-physical contiguous memory if its size exceeds one page. I didn't see > why bounce buffer is needed in most cases. > Last year, we found some legacy hardware drivers may use static DMA buffer > instead of dynamic buffer, and if the static buffer's size exceeds one page, > it may bring issues due to lack of memory exchange operation. One patch is > cooked and posted out for fixing such issues, but looks it is not merged so > far. http://lkml.iu.edu//hypermail/linux/kernel/1212.0/02226.html > In this case, it indeed need one bounce buffer, but this is not a typical > case. We have found that many NIC vendors e.g. (Cisco enic, Solarflare sfc, Mellanox mlx4) use buffers larger than a page when configured with jumbo frames enabled. High performance NIC's are typically configured with jumbo frames which means a lot of pressure of put on the SWIOTLB region because the memory copies cannot occur as fast as the NIC hardware itself. Using memory exchange based on DMA mask to create bounce regions is prone to allocation failures if the DMA mask is small (less the 4GB) because potentially many PV driver domains are trying to use the same low memory region. Malcolm > Xiantao >> -----Original Message----- >> From: xen-devel-bounces@xxxxxxxxxxxxx >> [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Malcolm Crossley >> Sent: Saturday, April 12, 2014 1:29 AM >> To: xen-devel >> Subject: [Xen-devel] [RFC] Dom0 PV IOMMU control design (draft A) >> >> Hi, >> >> Here is a design for allowing Dom0 PV guests to control the IOMMU. This >> allows for the Dom0 GPFN mapping to be programmed into the IOMMU and >> avoid using the SWIOTLB bounce buffer technique in the Linux kernel (except >> for >> legacy 32 bit DMA IO devices) >> >> This feature provides two gains: >> 1. Improved performance for use cases which relied upon the bounce buffer >> e.g. NIC cards using jumbo frames with linear buffers. >> 2. Prevent SWIOTLB bounce buffer region exhaustion which can cause >> unrecoverable Linux kernel driver errors. >> >> A PDF version of the document is available here: >> >> http://xenbits.xen.org/people/andrewcoop/pv-iommu-control-A.pdf >> >> The pandoc markdown format of the document is provided below to allow for >> easier inline comments: >> >> Introduction >> ============ >> >> Background >> ------- >> >> Xen PV guests use a Guest Pseudo-physical Frame Number(GPFN) address space >> which is decoupled from the host Machine Frame Number(MFN) address space. >> PV guests which interact with hardware need to translate GPFN addresses to >> MFN address because hardware uses the host address space only. >> PV guest hardware drivers are only aware of the GPFN address space only and >> assume that if GPFN addresses are contiguous then the hardware addresses >> would be contiguous as well. The decoupling between GPFN and MFN address >> spaces means GPFN and MFN addresses may not be contiguous across page >> boundaries and thus a buffer allocated in GPFN address space which spans a >> page boundary may not be contiguous in MFN address space. >> >> PV hardware drivers cannot tolerate this behaviour and so a special "bounce >> buffer" region is used to hide this issue from the drivers. >> >> A bounce buffer region is a special part of the GPFN address space which has >> been made to be contiguous in both GPFN and MFN address spaces. When a >> driver requests a buffer which spans a page boundary be made available for >> hardware to read then core operating system code copies the buffer into a >> temporarily reserved part of the bounce buffer region and then returns the >> MFN >> address of the reserved part of the bounce buffer region back to the driver >> itself. >> The driver then instructs the hardware to read the copy of the buffer in the >> bounce buffer. Similarly if the driver requests a buffer is made available >> for >> hardware to write to then first a region of the bounce buffer is reserved and >> then after the hardware completes writing then the reserved region of bounce >> buffer is copied to the originally allocated buffer. >> >> The overheard of memory copies to/from the bounce buffer region is high and >> damages performance. Furthermore, there is a risk the fixed size bounce >> buffer >> region will become exhausted and it will not be possible to return an >> hardware >> address back to the driver. The Linux kernel drivers do not tolerate this >> failure >> and so the kernel is forced to crash the kernel, as an uncorrectable error >> has >> occurred. >> >> Input/Output Memory Management Units (IOMMU) allow for an inbound >> address mapping to be created from the I/O Bus address space (typically PCI) >> to >> the machine frame number address space. IOMMU's typically use a page table >> mechanism to manage the mappings and therefore can create mappings of >> page size granularity or larger. >> >> Purpose >> ======= >> >> Allow Xen Domain 0 PV guests to create/modify/destroy IOMMU mappings for >> hardware devices that Domain 0 has access to. This enables Domain 0 to >> program a bus address space mapping which matches it's GPFN mapping. Once >> a 1-1 mapping of GPFN to bus address space is created then a bounce buffer >> region is not required for the IO devices connected to the IOMMU. >> >> >> Architecture >> ============ >> >> A three argument hypercall interface (do_iommu_op), implementing two >> hypercall >> subops. >> >> Design considerations for hypercall subops >> ------------------------------------------- >> IOMMU map/unmap operations can be slow and can involve flushing the >> IOMMU TLB >> to ensure the IO device uses the updated mappings. >> >> The subops have been designed to take an array of operations and a count as >> parameters. This allows for easily implemented hypercall continuations >> to be >> used and allows for batches of IOMMU operations to be submitted before >> flushing >> the IOMMU TLB. >> >> >> >> IOMMUOP_map_page >> ---------------- >> First argument, pointer to array of `struct iommu_map_op` >> Second argument, integer count of `struct iommu_map_op` elements in array >> >> This subop will attempt to IOMMU map each element in the `struct >> iommu_map_op` >> array and record the mapping status back into the array itself. If an >> mapping >> fault occurs then the hypercall will return with -EFAULT. >> >> This subop will inspect the MFN address being mapped in each >> iommu_map_op to >> ensure it does not belong to the Xen hypervisor itself. If the MFN does >> belong >> to the Xen hypervisor the subop will return -EPERM in the status field >> for that >> particular iommu_map_op. >> >> The IOMMU TLB will only be flushed when the hypercall completes or a >> hypercall >> continuation is created. >> >> struct iommu_map_op { >> uint64_t bfn; >> uint64_t mfn; >> uint32_t flags; >> int32_t status; >> }; >> >> ------------------------------------------------------------------------------ >> Field Purpose >> ----- --------------------------------------------------------------- >> `bfn` [in] Bus address frame number to mapped to specified mfn >> below >> >> `mfn` [in] Machine address frame number >> >> `flags` [in] Flags for signalling type of IOMMU mapping to be created >> >> `status` [out] Mapping status of this map operation, 0 indicates >> success >> ------------------------------------------------------------------------------ >> >> >> Defined bits for flags field >> ------------------------------------------------------------------------ >> Name Bit Definition >> ---- ----- ---------------------------------- >> IOMMU_MAP_OP_readable 0 Create readable IOMMU >> mapping >> IOMMU_MAP_OP_writeable 1 Create writeable IOMMU >> mapping >> Reserved for future use 2-31 n/a >> ------------------------------------------------------------------------ >> >> Additional error codes specific to this hypercall: >> >> Error code Reason >> ---------- ------------------------------------------------------------ >> EPERM PV IOMMU mode not enabled or calling domain is not domain 0 >> ------------------------------------------------------------------------ >> >> IOMMUOP_unmap_page >> ---------------- >> First argument, pointer to array of `struct iommu_map_op` >> Second argument, integer count of `struct iommu_map_op` elements in array >> >> This subop will attempt to unmap each element in the `struct >> iommu_map_op` array >> and record the mapping status back into the array itself. If an >> unmapping fault >> occurs then the hypercall stop processing the array and return with an >> EFAULT; >> >> The IOMMU TLB will only be flushed when the hypercall completes or a >> hypercall >> continuation is created. >> >> struct iommu_map_op { >> uint64_t bfn; >> uint64_t mfn; >> uint32_t flags; >> int32_t status; >> }; >> >> -------------------------------------------------------------------- >> Field Purpose >> ----- ----------------------------------------------------- >> `bfn` [in] Bus address frame number to be unmapped >> >> `mfn` [in] This field is ignored for unmap subop >> >> `flags` [in] This field is ignored for unmap subop >> >> `status` [out] Mapping status of this unmap operation, 0 indicates >> success >> -------------------------------------------------------------------- >> >> Additional error codes specific to this hypercall: >> >> Error code Reason >> ---------- ------------------------------------------------------------ >> EPERM PV IOMMU mode not enabled or calling domain is not domain 0 >> ------------------------------------------------------------------------ >> >> >> Conditions for which PV IOMMU hypercalls succeed >> ------------------------------------------------ >> All the following conditions are required to be true for PV IOMMU hypercalls >> to succeed: >> >> 1. IOMMU detected and supported by Xen >> 2. The following Xen IOMMU options are NOT enabled: dom0-passthrough, >> dom0-strict >> 3. Domain 0 is making the hypercall >> >> >> Security Implications of allowing Domain 0 IOMMU control >> ======================================================== >> >> Xen currently allows IO devices attached to Domain 0 to have direct >> access to >> the all of the MFN address space (except Xen hypervisor memory regions), >> provided the Xen IOMMU option dom0-strict is not enabled. >> >> The PV IOMMU feature provides the same level of access to MFN address space >> and the feature is not enabled when the Xen IOMMU option dom0-strict is >> enabled. Therefore security is not affected by the PV IOMMU feature. >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxx >> http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |