[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] More questions about Xen memory layout/usage, access to guest memory
On 8/17/2019 7:04 AM, Andrew Cooper wrote: >> Similarly, to what extent does the dom0 (or other such >> privileged domain) utilize "foreign memory maps" to reach into another >> guest's memory? I understand that this is necessary when creating a >> guest, for live migration, and for QEMU to emulate stuff for HVM guests; >> but for PVH, is it ever necessary for Xen or the dom0 to "forcibly" >> access a guest's memory? > I'm not sure what you mean by forcibly. Dom0 has the ability to do so, > if it chooses. There is no "force" about it. > > Debuggers and/or Introspection are other reasons why dom0 might chose to > map guest RAM, but I think you've covered the common cases. > >> (I ask because the research project I'm working on is seeking to protect >> guests from a compromised hypervisor and dom0, so I need to limit >> outside access to a guest's memory to explicitly shared pages that the >> guest will treat as untrusted - not storing any secrets there, vetting >> input as necessary, etc.) > Sorry to come along with roadblocks, but how on earth do you intend to > prevent a compromised Xen from accessing guest memory? A compromised > Xen can do almost anything it likes, and without recourse. This is > ultimately why technologies such as Intel SGX or AMD Secure Encrypted VM > are coming along, because only the hardware itself is in a position to > isolate an untrusted hypervisor/kernel from guest data. > > For dom0, that's perhaps easier. You could reference count the number > of foreign mappings into the domain as it is created, and refuse to > unpause the guests vcpus until the foreign map count has dropped to 0. We're using a technique where privileged system software (in this case, the hypervisor) is compiled to a virtual instruction set (based on LLVM IR) that limits its access to hardware features and its view of available memory. These limitations are/can be enforced in a variety of ways but the main techniques we're employing are software fault isolation (i.e., memory loads and stores in privileged code are instrumented with checks to ensure they aren't accessing forbidden regions), and mediation of page table updates (by modifying privileged software to make page table updates through a virtual instruction set interface, very similarly to how Xen PV guests make page table updates through hypercalls which gives Xen the opportunity to ensure mappings aren't made to protected regions). Our technique is based on that used by the "Virtual Ghost" project (see https://dl.acm.org/citation.cfm?id=2541986 for the paper; direct PDF link: http://sva.cs.illinois.edu/pubs/VirtualGhost-ASPLOS-2014.pdf), which does something similar to protect applications from a compromised operating system kernel without relying on something like a hypervisor operating at a higher privileged level. We're looking to extend that approach to hypervisors to protect guest VMs from a compromised hypervisor. >> Again, this mostly boils down to: under what circumstances, if ever, >> does Xen ever "force" access to any part of a guest's memory? >> (Particularly for PV(H). Clearly that must happen for HVM since, by >> definition, the guest is unaware there's a hypervisor controlling its >> world and emulating hardware behavior, and thus is in no position to >> cooperatively/voluntarily give the hypervisor and dom0 access to its >> memory.) > There are cases for all guest types where Xen will need to emulate > instructions. Xen will access guest memory in order to perfom > architecturally correct actions, which generally starts with reading the > instruction under %rip. > > For PV guests, this almost entirely restricted to guest-kernel > operations which are privileged in nature. Access to MSRs, writes to > pagetables, etc. > > For HVM and PVH guests, while PVH means "HVM without Qemu", it doesn't > be a complete absence of emulation. The Local APIC is emulated by Xen > in most cases, as a bare minimum, but for example, the LMSW instruction > on AMD hardware doesn't have any intercept decoding to help the > hypervisor out when a guest uses the instruction. > > ~Andrew I've found a number of files in the Xen source tree which seem to be related to instruction/x86 platform emulation: arch/x86/x86_emulate.c arch/x86/hvm/emulate.c arch/x86/hvm/vmx/realmode.c arch/x86/hvm/svm/emulate.c arch/x86/pv/emulate.c arch/x86/pv/emul-priv-op.c arch/x86/x86_emulate/x86_emulate.c The last of these, in particular, looks especially hairy (it seems to support emulation of essentially the entire x86 instruction set through a quite impressive edifice of switch statements). How does all of this fit into the big picture of how Xen virtualizes the different types of VMs (PV/HVM/PVH)? My impression (from reading the original "Xen and the Art of Virtualization" SOSP '03 paper that describes the basic architecture) had been that PV guests, in particular, used hypercalls in place of all privileged operations that the guest kernel would otherwise need to execute in ring 0; and that all other (unprivileged) operations could execute natively on the CPU without requiring emulation. From what you're saying (and what I'm seeing in the source code), though, it sounds like in reality things are a bit fuzzier - that there are some operations that Xen traps and emulates instead of explicitly paravirtualizing. Likewise, the Xen design described in the SOSP paper discussed guest I/O as something that's fully paravirtualized, taking place not through emulation of either memory-mapped or port I/O but rather through ring buffers shared between the guest and dom0 via grant tables. I was a bit confused to find I/O emulation code under arch/x86/pv (see e.g. arch/x86/pv/emul-priv-op.c) that seems to be talking about "ports" and the like. Is this another example of things being fuzzier in reality than in the "theoretical" PV design? What devices, if any, are emulated rather than paravirtualized for a PV guest? I know that for PVH, you mentioned that the Local APIC is (at a minimum) emulated, along with some special instructions; is that true for classic PV as well? For HVM, obviously anything that can't be virtualized natively by the hardware needs to be emulated by Xen/QEMU (since the guest kernel isn't expected to be cooperative to issue PV hypercalls instead); but I would expect emulation to be limited to the relatively small subset of the ISA that VMX/SVM can't natively virtualize. Yet I see that x86_emulate.c supports emulating just about everything. Under what circumstances does Xen actually need to put all that emulation code to use? I'm also wondering just how much of this is Xen's responsibility vs. QEMU's. I understand that when QEMU is used on its own (i.e., not with Xen), it uses dynamic binary recompilation to handle the parts of the ISA that can't be virtualized natively in lower-privilege modes. Does Xen only use QEMU for emulating off-CPU devices (interrupt controller, non-paravirtualized disk/network/graphics/etc.), or does it ever employ any of QEMU's x86 emulation support in addition to Xen's own emulation code? Is there any particular place in the code where I can go to get a comprehensive "list" (or other such summary) of which parts of the ISA and off-CPU system are emulated for each respective guest type (PV, HVM, and PVH)? I realize that the difference between HVM and PVH is more of a continuum than a line; what I'm especially interested in is, what's the *bare minimum* of emulation required for a PVH guest that's using as much paravirtualization as possible? (That's the setting I'm looking to target for my research on protecting guests from a compromised hypervisor, since I'm trying to minimize the scope of interactions between the guest and hypervisor/dom0 that our virtual instruction set layer needs to mediate.)// On a somewhat related note, I also have a question about a particular piece of code in arch/x86/pv/emul-priv-op.c, namely the function io_emul_stub_setup(). It looks like it is, at runtime, crafting a function that switches to the guest register context, emulates a particular I/O operation, then switches back to the host register context. This caught our attention while we were implementing Control Flow Integrity (CFI) instrumentation for Xen (which is necessary for us to enforce the software fault isolation (SFI) instrumentation that provides our memory protections). Why does Xen use dynamically-generated code here? Is it just for implementation convenience (i.e., to improve the generalizability of the code)? Thanks again for all your time and effort spent answering my questions. I know I'm throwing a lot of unusual questions out there - this back-and-forth has been very helpful for me in figuring out *what* questions I need to be asking in the first place to understand what's feasible to do in the Xen architecture and how I might go about doing it. :-) Thanks, Ethan Johnson -- Ethan J. Johnson Computer Science PhD student, Systems group, University of Rochester ejohns48@xxxxxxxxxxxxxxxx ethanjohnson@xxxxxxx PGP public key available from public directory or on request _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |