[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen Summit 2019 Design Session - Nested Virtualization
On Thu, Aug 08, 2019 at 08:53:36PM -0400, Rich Persaud wrote: > Session notes attached in markdown and PDF format, please revise as needed. > > Rich > > > > # Nested Virtualization Design Session > Xen Design and Developer Summit, [11 July > 2019](https://design-sessions.xenproject.org/uid/discussion/disc_1NVcnOZyDZM1LpQbIsJm/view) > > **Related Presentations** > > - (2019) Jürgen Groß, [Support of PV devices in nested > Xen](https://youtube.com/watch?v=HA_teA6hV7c) > - (2019) Christopher Clark and Kelli Little, [The > Xen-Blanket](https://youtube.com/watch?v=i5w9sF9VerE) > - (2018) Ian Pratt, [Hypervisor Security: Lessons > Learned](https://youtube.com/watch?v=bNVe2y34dnM) (uXen) > - (2018) David Weston, [Windows: Hardening with > Hardware](https://youtube.com/watch?v=8V0wcqS22vc) (Credential Guard) > > **Use Cases** > > - Xen on Xen, some work was done for the Shim (Meltdown mitigation). > - Xen on another hypervisor, involves teaching Xen how to use enlightenments > from other hypervisors. > - Qubes runs Xen on AWS bare-metal instances that use Nitro+KVM, mostly works. It isn't AWS, it's standard KVM (with qemu etc). Use case is testing. Mostly works. > - Windows Credential Guard (Hyper-V on Xen) > - Bromium Type-2 uXen in Windows and Linux guests on Xen > > **Issues** > > 1. Need to be careful with features, eg. Ballooning down memory. > 2. Dom0 is exposed to things that it should not see. > 3. Nested virtualization works when both L0 and L1 agree, e.g Xen on Xen. > When replacing Xen with another hypervisor, all falls apart. In my experience, running Xen on another hypervisor (KVM, VirtualBox) mostly works. What is broken, is running non-Xen within Xen. > 4. Need more audit checks for what the VM can read or write, i.e. guest > requirements. > 5. Virtual vmentry and vmexit emulation "leaking", doesn't cope well. > 6. Context switching bug fixed a while ago: doesn't understand AFR(?) loading > or whether it should do it or leave alone. > 7. Missing instructions to virtualize vmexit. > 8. Enlightened EPT shootdown is easy on top of the other features working. > > **Dependent on CPUID and MSR work** > > 1. Auditing of changes. Can then fix virtual vmentry and vmexit, one bit at > a time. Once all features are covered, it should work fine. > 2. hwcaps: needed to tell the guest about the security state of the hardware. > 3. Reporting CPU topology representation to guests, which is blocking > core-scheduling work (presented by Juergen at Xen Summit) > 4. Andrew is working on the prerequisites for the policy. > > **Validation of Nested Virtualization** > > 1. First priority is correctness. > 2. Second priority is performance. > 3. There is a unit testing prototype which exercises vmxon, vmxoff > instructions. > 4. Depends on regression testing, which depends upon (a) formal security > support, (b) approval of the Xen security team. > 5. Other hypervisors must be tested with Xen. > > **Guest State** > > Nesting requires merge of L1 and L0 state. > > 1. AMD interface is much easier: it has "clean bits": if any bit is clear, > must resync. Guest state is kept separately. > 2. Intel guest state is kept in an opaque blob in memory, with special > instructions to access it. Memory layout in RAM is unknown, behavior changes > with microcode updates and there are 150 pages of relevant Intel manuals. > 4. Bromium does some fun stuff to track guest state in software, poisoning > RAM and then inspecting it, which is still faster than Intel's hardware-based > VMCS shadowing. L1 hypervisor (Type-2 uXen): https://github.com/openxt/uxen > 5. Viridian emulates the AMD way, i.e. Microsoft has Intel bits formatted in > an AMD-like structure, then L0 translates the AMD structure into Intel's > opaque blob. > > **Secure Variable Storage** > > 1. Need an agreed sane way for multiple hypervisors to handle it, eg. a pair > of ioports, translation from VMX, guest handles the interrupts via a standard > ioport interception to secondary emulator: tiny. > 2. Easy case: ioports + memory page for data. > 3. Citrix XenServer has a closed-source implementation (varstored?) > > **Interface for nested PV devices** > > PV driver support currently involves grants and interrupts. > > Requirements: > > 1. Should Xen's ABI include hypercall nesting level? > 2. Each layer of nesting must apply access control decisions to the operation > invoked by its guest. > 3. Brownfield: if Xen and other L1 hypervisors must be compatible with > existing Xen bare-metal deployments, the L0 hypervisor must continue to > support grants, events and xenstore. > 4. Greenfield: if the L0 hypervisor can be optimized for nesting, then PV > driver mechanisms other than grants, events and xenstore could be considered. > > Live migration with PCI graphics (has been implemented on AWS): > > - need to make it look the same, regardless of nesting level > - 1 or more interrupts > - 1 or shared pages of RAM > - share xenstore > - virtio guest physical address space DMA: done right > - _*need to get rid of domid*_ as the endpoint identifier > > Access Control: > > Marek: use virtio? > David: do whatever you like in L1 > Juergen: new "nested hypercall", to pass downwards an opaque payload > David: how does access control work with that approach? > > Christopher: xenblanket RFC series implements support for one level of > nesting. Its implementation below the hypercall interface demonstrates access > control logic that is required at each nesting level. > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |