[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] More questions about Xen memory layout/usage, access to guest memory
On 16/08/2019 20:51, Johnson, Ethan wrote: > Hi all, > > I have some follow-up questions about Xen's usage and layout of memory, > building on the ones I asked here a few weeks ago (which were quite > helpfully answered: see > https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg01513.html > for reference). For context on why I'm asking these questions, I'm using > Xen as a research platform for enforcing novel memory protection schemes > on hypervisors and guests. > > 1. Xen itself lives in the memory region from (on x86-64) 0xffff 8000 > 0000 0000 - 0xffff 8777 ffff ffff, regardless of whether it's in PV mode > or HVM/PVH. Clearly, in PV mode a separate set of page tables (i.e. CR3 > root pointer) must be used for each guest. More than that. Each vCPU. PV guests manage their own pagetables, and have a vCR3 which the guest kernel controls, and we must honour. For 64bit PV guests, each time a new L4 pagetable is created, Xen sets up its own 16 slots appropriately. As a result, Xen itself is able to function appropriately on all pagetable hierarchies the PV guest creates. See init_xen_l4_slots() which does this. For 32bit PV guests, things are a tad more complicated. Each vCR3 is actually a PAE-quad of pagetable entries. Because Xen is still operating in 64bit mode with 4-level paging, we enforce that guests allocate a full 4k page for the pagetable (rather than the 32 bytes it would normally be). In Xen, we allocate what is called a monitor table, which is per-vcpu (set up with all the correct details for Xen), and we rewrite slot 0 each time the vCPU changes vCR3. Not related to this question, but important for future answers. All pagetables are actually at a minimum per-domain, because we have per-domain mappings to simplify certain tasks. Contained within these are various structures, including the hypercall compatibility translation area. This per-domain restriction can in principle be lifted if we alter the way Xen chooses to lay out its memory. > Is that also true of the host > (non-extended, i.e. CR3 in VMX root mode) page tables when an HVM/PVH > guest is running? Historical context is important to answer this question. When the first HVM support came along, there was no EPT or NPT in hardware. Hypervisors were required to virtualise the guests pagetable structure, which is called Shadow Paging in Xen. The shadow pagetables themselves are organised per-domain so as to form a single coherent guest physical address space, but CPUs operating in non-root mode still needed the real CR3 pointing at the logical vCPU's CR3 which was being virtualised. In practice, we still allocate a monitor pagetable per vcpu for HVM guests, even with HAP support. I can't think of any restrictions which would prevent us from doing this differently. > Or is the dom0 page table left in place, assuming the > dom0 is PV, when an HVM/PVH guest is running, since extended paging is > now being used to provide the guest's view of memory? Does this change > if the dom0 is PVH? Here is some (prototype) documentation prepared since your last round of questions. https://andrewcoop-xen.readthedocs.io/en/docs-devel/admin-guide/introduction.html Dom0 is just a VM, like every other domU in the system. There is nothing special about how it is virtualised. Dom0 defaults to having full permissions, so can successfully issue a whole range of more interesting hypercalls, but you could easily create dom1, set the is_priv boolean in Xen, and give dom1 all the same permissions that dom0 has, if you wished. > Or, to ask this from another angle: is there ever anything *but* Xen > living in the host-virtual address space when an HVM/PVH guest is > active? No, depending on how you classify Xen's directmap in this context. > And is the answer to this different depending on whether the > HVM/PVH guest is a domU vs. a PVH dom0? Dom0 vs domU has no relevance to the question. > 2. Do the mappings in Xen's slice of the host-virtual address space > differ at all between the host page tables corresponding to different > guests? No (ish). Xen has a mostly flat address space, so most of the mappings are the same. There is a per-domain mapping slot which is common to each vcpu in a domain, but different across domains, and a self-linear map for easy modification of the PTEs for the current pagetable hierarchy, and a shadow-linear map for easy modification of the shadow PTEs for which Xen is not in the address space at all. > If the mappings are in fact the same, does Xen therefore share > lower-level page table pages between the page tables corresponding to > different guests? We have many different L4's (the monitor tables, every L4 a PV guest has allocated) which can run Xen. Most parts of Xen's address space converge at L3 (the M2P, the directmap, Xen text/data/bss/fixmap/vmap/heaps/misc), and are common to all contexts. The per-domain mapping converges at L3 and are shared between vcpus of the same guest, but not shared across guests. One aspect I haven't really covered is XPTI for Meltdown mitigation for PV guests. Here, we have a per-CPU private pagetable which ends up being a merge of most of the guests L4, but with some pre-construct CPU-private pagetable hierarchy to hide the majority of data in the Xen region. > Is any of this different for PV vs. HVM/PVH? PV guests control their parts of their address space, and can do largely whatever they choose. HVM has nothing in the lower canonical half, but do have an extended directmap (which in practice only makes a difference on a >5TB machine). > 3. Under what circumstances, and for what purposes, does Xen use its > ability to access guest memory through its direct map of host-physical > memory? That is a very broad question, and currently has the unfortunate answer of "whenever speculation goes awry in an attackers favour." There are steps under way to reduce the usage of the directmap so we can run without it, and prevent this kind of leakage. As for when Xen would normally access memory, the most common answer is for hypercall parameters which mostly use a virtual address based ABI. Also, any time we need to emulate an instruction, we need to read a fair amount of guest state, including reading the instruction under %rip. > Similarly, to what extent does the dom0 (or other such > privileged domain) utilize "foreign memory maps" to reach into another > guest's memory? I understand that this is necessary when creating a > guest, for live migration, and for QEMU to emulate stuff for HVM guests; > but for PVH, is it ever necessary for Xen or the dom0 to "forcibly" > access a guest's memory? I'm not sure what you mean by forcibly. Dom0 has the ability to do so, if it chooses. There is no "force" about it. Debuggers and/or Introspection are other reasons why dom0 might chose to map guest RAM, but I think you've covered the common cases. > (I ask because the research project I'm working on is seeking to protect > guests from a compromised hypervisor and dom0, so I need to limit > outside access to a guest's memory to explicitly shared pages that the > guest will treat as untrusted - not storing any secrets there, vetting > input as necessary, etc.) Sorry to come along with roadblocks, but how on earth do you intend to prevent a compromised Xen from accessing guest memory? A compromised Xen can do almost anything it likes, and without recourse. This is ultimately why technologies such as Intel SGX or AMD Secure Encrypted VM are coming along, because only the hardware itself is in a position to isolate an untrusted hypervisor/kernel from guest data. For dom0, that's perhaps easier. You could reference count the number of foreign mappings into the domain as it is created, and refuse to unpause the guests vcpus until the foreign map count has dropped to 0. > 4. What facilities/processes does Xen provide for PV(H) guests to > explicitly/voluntarily share memory pages with Xen and other domains > (dom0, etc.)? From what I can gather from the documentation, it sounds > like "grant tables" are involved in this - is that how a PV-aware guest > is expected to set up shared memory regions for communication with other > domains (ring buffers, etc.)? Yes. Grant Tables is Xen's mechanism for the coordinated setup of shared memory between two consenting domains. > Does a PV(H) guest need to voluntarily > establish all external access to its pages, or is there ever a situation > where it's the other way around - where Xen itself establishes/defines a > region as shared and the guest is responsible for treating it accordingly? During domain construction, two grants/events are constructed automatically. One is for the xenstore ring, and one is for the console ring. The latter is so it can get debugging out from very early code, while both are, in practice, done like this because the guest has no a-priori way to establish the grants/events itself. For all other shared interfaces, the guests are expected to negotiate which grants/events/rings/details to use via Xenstore. > Again, this mostly boils down to: under what circumstances, if ever, > does Xen ever "force" access to any part of a guest's memory? > (Particularly for PV(H). Clearly that must happen for HVM since, by > definition, the guest is unaware there's a hypervisor controlling its > world and emulating hardware behavior, and thus is in no position to > cooperatively/voluntarily give the hypervisor and dom0 access to its > memory.) There are cases for all guest types where Xen will need to emulate instructions. Xen will access guest memory in order to perfom architecturally correct actions, which generally starts with reading the instruction under %rip. For PV guests, this almost entirely restricted to guest-kernel operations which are privileged in nature. Access to MSRs, writes to pagetables, etc. For HVM and PVH guests, while PVH means "HVM without Qemu", it doesn't be a complete absence of emulation. The Local APIC is emulated by Xen in most cases, as a bare minimum, but for example, the LMSW instruction on AMD hardware doesn't have any intercept decoding to help the hypervisor out when a guest uses the instruction. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |