Xen project Mailing List

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

Hi, At 01:41 +0000 on 11 Dec (1418258504), Tian, Kevin wrote: > > From: Tim Deegan [mailto:tim@xxxxxxx] > > It is Xen's job to isolate VMs from each other. As part of that, Xen > > uses the MMU, nested paging, and IOMMUs to control access to RAM. Any > > software component that can pass a raw MFN to hardware breaks that > > isolation, because Xen has no way of controlling what that component > > can do (including taking over the hypervisor). This is why I am > > afraid when developers ask for GFN->MFN translation functions. > > When I agree Xen's job absolutely, the isolation is also required in different > layers, regarding to who controls the resource and where the virtualization > happens. For example talking about I/O virtualization, Dom0 or driver domain > needs to isolate among backend drivers to avoid one backend interfering > with another. Xen doesn't know such violation, since it only knows it's Dom0 > wants to access a VM's page. I'm going to write second reply to this mail in a bit, to talk about this kind of system-level design. In this email I'll just talk about the practical aspects of interfaces and address spaces and IOMMUs. > btw curious of how worse exposing GFN->MFN translation compared to > allowing mapping other VM's GFN? If exposing GFN->MFN is under the > same permission control as mapping, would it avoid your worry here? I'm afraid not. There's nothing worrying per se in a backend knowing the MFNs of the pages -- the worry is that the backend can pass the MFNs to hardware. If the check happens only at lookup time, then XenGT can (either through a bug or a security breach) just pass _any_ MFN to the GPU for DMA. But even without considering the security aspects, this model has bugs that may be impossible for XenGT itself to even detect. E.g.: 1. Guest asks its virtual GPU to DMA to a frame of memory; 2. XenGT looks up the GFN->MFN mapping; 3. Guest balloons out the page; 4. Xen allocates the page to a different guest; 5. XenGT passes the MFN to the GPU, which DMAs to it. Whereas if stage 2 is a _mapping_ operation, Xen can refcount the underlying memory and make sure it doesn't get reallocated until XenGT is finished with it. > > When the backend component gets a GFN from the guest, it wants an > > address that it can give to the GPU for DMA that will map the right > > memory. That address must be mapped in the IOMMU tables that the GPU > > will be using, which means the IOMMU tables of the backend domain, > > IIUC[1]. So the hypercall it needs is not "give me the MFN that matches > > this GFN" but "please map this GFN into my IOMMU tables". > > Here "please map this GFN into my IOMMU tables" actually breaks the > IOMMU isolation. IOMMU is designed for serving DMA requests issued > by an exclusive VM, so IOMMU page table can restrict that VM's attempts > strictly. > > To map multiple VM's GFNs into one IOMMU table, the 1st thing is to > avoid GFN conflictions to make it functional. We thought about this approach > previously, e.g. by reserving highest 3 bits of GFN as VMID, so one IOMMU > page table can be used to combine multi-VM's page table together. However > doing so have two limitations: > > a) it still requires write-protect guest GPU page table, and maintain a shadow > GPU page table by translate from real GFN to pseudo GFN (plus VMID), which > doesn't save any engineering effort in the device model part Yes -- since there's only one IOMMU context for the whole GPU, the XenGT backend still has to audit all GPU commands to maintain isolation between clients. > b) it breaks the designed isolation intrinsic of IOMMU. In such case, IOMMU > can't isolate multiple VMs by itself, since a DMA request can target any > pseudo GFN if valid in the page table. We have to rely on the audit in the > backend component in Dom0 to ensure the isolation. Yep. > c) this introduces tricky logic in IOMMU driver to handle such non-standard > multiplexed page table style. > > w/o a SR-IOV implementation (so each VF has its own IOMMU page table), > I don't see using IOMMU can help isolation here. If I've understood your argument correctly, it basically comes down to "It would be extra work for no benefit, because XenGT still has to do all the work of isolating GPU clients from each other". It's true that XenGT still has to isolate its clients, but there are other benefits. The main one, from my point of view as a Xen maintainer, is that it allows Xen to constrain XenGT itself, in the case where bugs or security breaches mean that XenGT tries to access memory it shouldn't. More about that in my other reply. I'll talk about the rest below. > yes, this is a good feedback we didn't think about before. So far the reason > why XenGT can work is because we use default IOMMU setting which set > up a 1:1 r/w mapping for all possible RAM, so when GPU hits a MFN thru > shadow GPU page table, IOMMU is essentially bypassed. However like > you said, if IOMMU page table is restricted to dom0's memory, or is not > 1:1 identity mapping, XenGT will be broken. > > However I don't see a good solution for this, except using multiplexed > IOMMU page table aforementioned, which however doesn't look like > a sane design to me. Right. AIUI you're talking about having a component, maybe in Xen, that automatically makes a merged IOMMU table that contains multiple VMs' p2m tables all at once. I think that we can do something simpler than that which will have the same effect and also avoid race conditions like the one I mentioned at the top of the email. [First some hopefully-helpful diagrams to explain my thinking. I'll borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the addresses that devices issue their DMAs in: Here's how the translations work for a HVM guest using HAP: CPU <- Code supplied by the guest | (VA) | MMU <- Pagetables supplied by the guest | (GFN) | HAP <- Guest's P2M, supplied by Xen | (MFN) | RAM Here's how it looks for a GPU operation using XenGT: GPU <- Code supplied by Guest, audited by XenGT | (GPU VA) | GPU-MMU <- GTTs supplied by XenGT (by shadowing guest ones) | (GPU BFN) | IOMMU <- XenGT backend dom's P2M (for PVH/HVM) or IOMMU tables (for PV) | (MFN) | RAM OK, on we go...] Somewhere in the existing XenGT code, XenGT has a guest GFN in its hand and makes a lookup hypercall to find the MFN. It puts that MFN into the GTTs that it passes to the GPU. But an MFN is not actually what it needs here -- it needs a GPU BFN, which the IOMMU will then turn into an MFN for it. If we replace that lookup with a _map_ hypercall, either with Xen choosing the BFN (as happens in the PV grant map operation) or with the guest choosing an unused address (as happens in the HVM/PVH grant map operation), then: - the only extra code in XenGT itself is that you need to unmap when you change the GTT; - Xen can track and control exactly which MFNs XenGT/the GPU can access; - running XenGT in a driver domain or PVH dom0 ought to work; and - we fix the race condition I described above. The default policy I'm suggesting is that the XenGT backend domain should be marked IS_PRIV_FOR (or similar) over the XenGT client VMs, which will need a small extension in Xen since at the moment struct domain has only one "target" field. BTW, this is the exact analogue of how all other backend and toolstack operations work -- they request access from Xen to specific pages and they relinquish it when they are done. In particular: > for mapping and accessing other guest's memory, I don't think we > need any new interface atop existing ones. Just similar to other backend > drivers, we can leverage the same permission control. I don't think that's right -- other backend drivers use the grant table mechanism, wher the guest explicitly grants access to only the memory it needs. AIUI you're not suggesting that you'll use that for XenGT! :) Right - I hope that made some sense. I'll go get another cup of coffee and start on that other reply... Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.