Xen project Mailing List

Re: [Xen-devel] More questions about Xen memory layout/usage, access to guest memory

To: "Johnson, Ethan" <ejohns48@xxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Sat, 17 Aug 2019 12:04:04 +0100

Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@xxxxxxxxxx; spf=Pass smtp.mailfrom=Andrew.Cooper3@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABtClBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPokCOgQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86LkCDQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAYkC HwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Delivery-date: Sat, 17 Aug 2019 11:04:24 +0000

Ironport-sdr: oxgq1+NkqO5EB0Rc9TuXGh95C0nGMI7WUWDQfNhjx1E5SeZelQS1zktAVQEqy6eXw+1FIAisyX dKZ/KMwQgyFDnNvd/5aWP2vx4T3ffJiqtkBEns2ni0+D6UJ+bXKj4qv53q9gClgjUQifpSoZ49 S52CfxbbeDBVTzT8lubrFDIsuUodOIiRg6SQEWoDyOs3npSHTD1toxABtC/zVmOdWoCQkPjkgw seWJ8D9vWbSgIKXGCX0ymyhz3iC5qwJZPOv3+5F4eNaYfsNpSOSt9WNUtbtur3j+n47oTN0acI gck=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 16/08/2019 20:51, Johnson, Ethan wrote: > Hi all, > > I have some follow-up questions about Xen's usage and layout of memory, > building on the ones I asked here a few weeks ago (which were quite > helpfully answered: see > https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg01513.html > for reference). For context on why I'm asking these questions, I'm using > Xen as a research platform for enforcing novel memory protection schemes > on hypervisors and guests. > > 1. Xen itself lives in the memory region from (on x86-64) 0xffff 8000 > 0000 0000 - 0xffff 8777 ffff ffff, regardless of whether it's in PV mode > or HVM/PVH. Clearly, in PV mode a separate set of page tables (i.e. CR3 > root pointer) must be used for each guest. More than that. Each vCPU. PV guests manage their own pagetables, and have a vCR3 which the guest kernel controls, and we must honour. For 64bit PV guests, each time a new L4 pagetable is created, Xen sets up its own 16 slots appropriately. As a result, Xen itself is able to function appropriately on all pagetable hierarchies the PV guest creates. See init_xen_l4_slots() which does this. For 32bit PV guests, things are a tad more complicated. Each vCR3 is actually a PAE-quad of pagetable entries. Because Xen is still operating in 64bit mode with 4-level paging, we enforce that guests allocate a full 4k page for the pagetable (rather than the 32 bytes it would normally be). In Xen, we allocate what is called a monitor table, which is per-vcpu (set up with all the correct details for Xen), and we rewrite slot 0 each time the vCPU changes vCR3. Not related to this question, but important for future answers. All pagetables are actually at a minimum per-domain, because we have per-domain mappings to simplify certain tasks. Contained within these are various structures, including the hypercall compatibility translation area. This per-domain restriction can in principle be lifted if we alter the way Xen chooses to lay out its memory. > Is that also true of the host > (non-extended, i.e. CR3 in VMX root mode) page tables when an HVM/PVH > guest is running? Historical context is important to answer this question. When the first HVM support came along, there was no EPT or NPT in hardware. Hypervisors were required to virtualise the guests pagetable structure, which is called Shadow Paging in Xen. The shadow pagetables themselves are organised per-domain so as to form a single coherent guest physical address space, but CPUs operating in non-root mode still needed the real CR3 pointing at the logical vCPU's CR3 which was being virtualised. In practice, we still allocate a monitor pagetable per vcpu for HVM guests, even with HAP support. I can't think of any restrictions which would prevent us from doing this differently. > Or is the dom0 page table left in place, assuming the > dom0 is PV, when an HVM/PVH guest is running, since extended paging is > now being used to provide the guest's view of memory? Does this change > if the dom0 is PVH? Here is some (prototype) documentation prepared since your last round of questions. https://andrewcoop-xen.readthedocs.io/en/docs-devel/admin-guide/introduction.html Dom0 is just a VM, like every other domU in the system. There is nothing special about how it is virtualised. Dom0 defaults to having full permissions, so can successfully issue a whole range of more interesting hypercalls, but you could easily create dom1, set the is_priv boolean in Xen, and give dom1 all the same permissions that dom0 has, if you wished. > Or, to ask this from another angle: is there ever anything *but* Xen > living in the host-virtual address space when an HVM/PVH guest is > active? No, depending on how you classify Xen's directmap in this context. > And is the answer to this different depending on whether the > HVM/PVH guest is a domU vs. a PVH dom0? Dom0 vs domU has no relevance to the question. > 2. Do the mappings in Xen's slice of the host-virtual address space > differ at all between the host page tables corresponding to different > guests? No (ish). Xen has a mostly flat address space, so most of the mappings are the same. There is a per-domain mapping slot which is common to each vcpu in a domain, but different across domains, and a self-linear map for easy modification of the PTEs for the current pagetable hierarchy, and a shadow-linear map for easy modification of the shadow PTEs for which Xen is not in the address space at all. > If the mappings are in fact the same, does Xen therefore share > lower-level page table pages between the page tables corresponding to > different guests? We have many different L4's (the monitor tables, every L4 a PV guest has allocated) which can run Xen. Most parts of Xen's address space converge at L3 (the M2P, the directmap, Xen text/data/bss/fixmap/vmap/heaps/misc), and are common to all contexts. The per-domain mapping converges at L3 and are shared between vcpus of the same guest, but not shared across guests. One aspect I haven't really covered is XPTI for Meltdown mitigation for PV guests. Here, we have a per-CPU private pagetable which ends up being a merge of most of the guests L4, but with some pre-construct CPU-private pagetable hierarchy to hide the majority of data in the Xen region. > Is any of this different for PV vs. HVM/PVH? PV guests control their parts of their address space, and can do largely whatever they choose. HVM has nothing in the lower canonical half, but do have an extended directmap (which in practice only makes a difference on a >5TB machine). > 3. Under what circumstances, and for what purposes, does Xen use its > ability to access guest memory through its direct map of host-physical > memory? That is a very broad question, and currently has the unfortunate answer of "whenever speculation goes awry in an attackers favour." There are steps under way to reduce the usage of the directmap so we can run without it, and prevent this kind of leakage. As for when Xen would normally access memory, the most common answer is for hypercall parameters which mostly use a virtual address based ABI. Also, any time we need to emulate an instruction, we need to read a fair amount of guest state, including reading the instruction under %rip. > Similarly, to what extent does the dom0 (or other such > privileged domain) utilize "foreign memory maps" to reach into another > guest's memory? I understand that this is necessary when creating a > guest, for live migration, and for QEMU to emulate stuff for HVM guests; > but for PVH, is it ever necessary for Xen or the dom0 to "forcibly" > access a guest's memory? I'm not sure what you mean by forcibly. Dom0 has the ability to do so, if it chooses. There is no "force" about it. Debuggers and/or Introspection are other reasons why dom0 might chose to map guest RAM, but I think you've covered the common cases. > (I ask because the research project I'm working on is seeking to protect > guests from a compromised hypervisor and dom0, so I need to limit > outside access to a guest's memory to explicitly shared pages that the > guest will treat as untrusted - not storing any secrets there, vetting > input as necessary, etc.) Sorry to come along with roadblocks, but how on earth do you intend to prevent a compromised Xen from accessing guest memory? A compromised Xen can do almost anything it likes, and without recourse. This is ultimately why technologies such as Intel SGX or AMD Secure Encrypted VM are coming along, because only the hardware itself is in a position to isolate an untrusted hypervisor/kernel from guest data. For dom0, that's perhaps easier. You could reference count the number of foreign mappings into the domain as it is created, and refuse to unpause the guests vcpus until the foreign map count has dropped to 0. > 4. What facilities/processes does Xen provide for PV(H) guests to > explicitly/voluntarily share memory pages with Xen and other domains > (dom0, etc.)? From what I can gather from the documentation, it sounds > like "grant tables" are involved in this - is that how a PV-aware guest > is expected to set up shared memory regions for communication with other > domains (ring buffers, etc.)? Yes. Grant Tables is Xen's mechanism for the coordinated setup of shared memory between two consenting domains. > Does a PV(H) guest need to voluntarily > establish all external access to its pages, or is there ever a situation > where it's the other way around - where Xen itself establishes/defines a > region as shared and the guest is responsible for treating it accordingly? During domain construction, two grants/events are constructed automatically. One is for the xenstore ring, and one is for the console ring. The latter is so it can get debugging out from very early code, while both are, in practice, done like this because the guest has no a-priori way to establish the grants/events itself. For all other shared interfaces, the guests are expected to negotiate which grants/events/rings/details to use via Xenstore. > Again, this mostly boils down to: under what circumstances, if ever, > does Xen ever "force" access to any part of a guest's memory? > (Particularly for PV(H). Clearly that must happen for HVM since, by > definition, the guest is unaware there's a hypervisor controlling its > world and emulating hardware behavior, and thus is in no position to > cooperatively/voluntarily give the hypervisor and dom0 access to its > memory.) There are cases for all guest types where Xen will need to emulate instructions. Xen will access guest memory in order to perfom architecturally correct actions, which generally starts with reading the instruction under %rip. For PV guests, this almost entirely restricted to guest-kernel operations which are privileged in nature. Access to MSRs, writes to pagetables, etc. For HVM and PVH guests, while PVH means "HVM without Qemu", it doesn't be a complete absence of emulation. The Local APIC is emulated by Xen in most cases, as a bare minimum, but for example, the LMSW instruction on AMD hardware doesn't have any intercept decoding to help the hypervisor out when a guest uses the instruction. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.