[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Linux grant map/unmap improvement proposal (Draft B)
On Mon, 13 Oct 2014, David Vrabel wrote: > Grant mapping in the Linux kernel has a number of problems: > > * Grant mapping from userspace is broken for many real world use > cases. > > * Netback does not handle sending packets to network storage provided > by a VM on the same host. > > * Using blkback with network-based storage is unsafe. > > * Performance is poor, particularly with userspace grants. > > A PDF version of this document as available from: > > http://xenbits.xen.org/people/dvrabel/grant-improvements-B.pdf > > > Userspace grant maps > -------------------- > > Certain types of system calls using foreign mappings require > translating the virtual address to a page using `get_user_pages()` or > `get_user_pages_fast()`. These system calls include direct I/O and > asynchronous I/O (AIO). > > In the native case this translation is done by walking the userspace > page tables and looking up the PFN in the L1 entry. PFN to page is > then trivial. > > For a PV guest this L1 entry contains an MFN and this first needs to > be translated into a PFN. For a normal frame this is a simple lookup > in the M2P. For foreign pages, the gntdev driver maintains an > additional hash of foreign MFNs to local PFNs called the m2p_override. > > The m2p_override table has a fundamental design flaw. > > A domain may grant a frame multiple times, using a different grant > reference each time. The backend maps each grant reference to a > separate page. The 1-to-many MFN-to-page mapping cannot be > represented in the 1-to-1 m2p_override table and I/O to or from these > mappings cannot get the correct page. > > Transmitting foreign pages to guests > ------------------------------------ > > Netback when sending pages to the guest uses a grant copy operation to > copy the data into the frames granted by the guest. This grant copy > requires either a local GFN _or_ a grant reference; it is not possible > to grant copy to/from a foreign mapping. > > In order to support VM to VM traffic, netback stores the grant > reference for the sender VM in the socket buffer structure which may > then be used by the receiving netback for the grant copy. > > Packets with foreign pages from other sources cannot be successfully > copied, since netback does not know the grant reference. Once such > configuration is a VM providing an iSCSI or other network-based > storage that presents a block device in the backend that is then used > by another VM on the same host. > > Blkback and network storage > --------------------------- > > Blkback unmaps the foreign pages in a I/O request when the request is > completed. If networked storage is used it is possible for requests > to be completed while the skbs referring to those pages are still > queued for transmit (e.g., because a retransmission was queued while > the responds to the original packet was in flight). > > When the network driver attempts to send the packet with the unmapped > page it may: > > - Fault while trying to access the unmapped page. > > - Transmit from a frame that is no longer granted (potentially > transmitting sensitive guest or Xen data). > > The fault does not occur with userspace storage backends since gntdev > replaces the foreign mapping with one to a local scratch page. It > uses GNTOP_unmap_and_replace which atomically replaces the foreign > mapping with another (source) mapping. However, this cannot be used > with batched operations since it clears the source mapping and it does > not prevent against transmitting from a non-granted frame. This is a very good summary of the issues we are currently having with Xen support in Linux. As such, I think I should add one that is missing from the list, but good to keep in mind. I should point out that I am not asking you to do anything about it at the moment. dma_ops.unmap_page and dma_ops.unmap_sg only pass dma addresses as arguments ---------------------------------------------------------------------------- The Linux dma_map_ops API consists of a number of functions that only provide the dma address of the dma request as argument, not the struct page or the physical address. For example unmap_page and unmap_sg. For Xen PV guests the dma address is a machine address. If the machine address corresponds to a foreign page (granted to the current domain), there is no easy way for us to retrieve the corresponding struct page or guest physical address (other than the m2p_override with all its problems). This is a serious limitation, in particular if we need to do any operations on the memory region at the time one of these functions are called: - on x86 fortunately we don't need to do anything; - on ARM, if the device is not dma coherent, we might have to issue cache maintenance operations. > Design > ====== > > Map onto ballooned pages only > ----------------------------- > > Grant maps will only be permitted with ballooned pages. > > The original p2m entry for these pages will always be INVALID_MFN and > thus the original MFN does not need to saved on map and restored on > unmap. > > Grant map/unmap will no longer need to use or clobber `page->index`. > This allows a workaround in netback to clear `page->pfmemalloc` to be > removed (`index` and `pfmemalloc` are part of the same union). > > > Safe grant unmap > ---------------- > > Grant references will only be unmapped when they are no longer in use. > i.e., the page reference count is one. > > int gnttab_unmap_refs_async(struct gnttab_unmap_grant_ref *unmap_ops, > struct gnttab_unmap_grant_ref *kunmap_ops, > struct page **pages, unsigned int count, > void (*done)(void *data), void *data); > > The `gnttab_unmap_refs_async()` function will unmap the grant > references using the supplied unmap operations and call `done(data)`. > The grant unmap will only be done once all pages are no longer in use. > > It shall run synchronously on the first attempt (this is expected to > be the most common case). If any page is in use, it shall queue the > unmap request to be tried at a later time. > > Only the blkback and gntdev devices need to use asynchronouse unmaps. > > > Userspace address to page translation > ------------------------------------- > > The m2p_override table shall be removed. > > Each VMA (struct vm_struct) shall contain an additional pointer to an > optional array of pages. This array shall be sized to cover the full > extent of the VMA. > > The gntdev driver populates this array with the relevant pages for the > foreign mappings as they are mapped. It shall also clear them when > unmapping. The gntdev driver must ensure it properly splits the page > array when the VMA itself is split. > > Since the m2p lookup will not return a local PFN, the native > get_user_pages_fast() call will fail. Prior to attempting to fault in > the pages, get_user_pages() can simply look up the pages in the VMA's > page array. > > `page->private` will no longer need to be set to the MFN. > > This is similar to the approach used in the classic kernel. It is worth pointing out that if/when non dma coherent devices are going to start appearing in x86-land, this solution won't suffice. > Identifying foreign pages > ------------------------- > > A new page flag is introduced: PG_foreign. This will alias PG_pinned > so it does not require an additional bit. > > If PG_foreign is set then `page->private` contains the grant reference > and domid for this foreign page. This information can only be packed > into an unsigned long on 64-bit platforms. 32-bit platforms will have > to allocate an additional structure to store the domid and gref. > > The aliasing of PG_foreign and PG_pinned is safe because: > > - Page table pages will never be foreign. > - Foreign pages shall have `p2m[P] & FOREIGN_FRAME_BIT`. > > The use of the private field is safe because: > > - The page is allocated by the balloon driver and thus it owns the > private field. > > - The other fields in the union (ptl, slab_cache, and first_page) will > not be used because the page is not used in a page table, slab or > compound page. > > Netback can thus: > > 1. Test PG_foreign. > 2. Verify that the page is foreign via the p2m. > 3. Extract the domid and gref from page->private. > > The PG_foreign test is not strictly necessary as the p2m lookup is > sufficient, but it should be quicker for non-foreign pages. > > > Userspace grant performance > --------------------------- > > Since the m2p_override table will be removed, the gntdev device may > easy batch the grant map and unmap hypercalls that update the kernel > mappings. > > The use of the scratch pages on unmap will be unnecessary and can be > removed. > > Other improvements that may be considered are: > > - Batch the userspace and kernel map and unmap. > > - Lazily map grants into userspace on faults. For applications that > do not access the foreign frames by the userspace mappings (such as > block backends using direct I/O) this would avoid a set of maps and > unmaps. This lazy mode would have to be requested by the userspace > program (since faulting many pages would be much more expensive than > a single batched map). > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |