[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Xen Winter Meetup 2025 design session notes: Nested Virt



Description: this session will focus on discussing the current state and 
key challenges of Nested Virtualization in Xen.

---

I'm going to reference this message to the mailing list in the related 
Gitlab epic:
https://gitlab.com/groups/xen-project/-/epics/25

References: George Dunlap's two part talk in the previous Xen Summit:
- https://www.youtube.com/watch?v=8jKGYY1Bi_o
- https://www.youtube.com/watch?v=3MxWvVTmY1s&t=1564s

Andrew Cooper reminded us of the nested virtualization challenges,

What is needed to make nested virt work again?

Andrew:
Xen does have some nested virt implementation
from 2009/2010, bitrotting since then and weren't production quality 
since day one.
Intel took care to virtualize everything relevant => confusing aspects 
that are not documented enough
AMD took a more simpler route, but things don't quite work right.
VMX/SVM are different pieces of work.
Interrupt shadows : disabling interrupt is different for VMX/SVM
Important to reduce the scope of the problem.
Both Intel and AMD dropped support for 32bits virtualization.
Bunch of features can be dropped for limiting the scope
Still need to trap them but can say not implemented.
Depend on the L2 guests: Windows with VBS is expecting different features
Missing non-nested features for VBS that need to be implemented first 
before nested one.

VM configuration is hard to change during run time because the 
configuration set was static
Xen has a model where it expects one model set of what it expect a guest 
to run.

First task: Change implementation of Xen to have one configuration per 
VM of the configuration instead of a global
Meaning having different configuration to other VM.

HW only has root and non-root mode (strictly x86)

Nested virt need to implement L2 guest in non-root mode of L0
Xen usually has one VMCS/VMCB per vCPU
L1 HV will have one VMCS/VMCB per L2 vCPU

VMCS are a bunch of configuration, some exposed to guests others to 
control guests behavior.
VMCS for L2 guests are merged from Xen, from L1 info called VMCS02.
Drop host state from the L1 guests and use Xen host state.
Features can be mutually exclusive.

L2 guest will trap to Xen (L0) and Xen then needs to know if the VMEXIT 
is for it, L1 or both.
Virtual VM entry, need to merge VMCS from exit and merge info about host 
part of L1.
If it is correctly implemented, it can scale infinitely. L>3 guests.

Alain Tchana: VMCS shadowing, is it needed?
Andrew: It's complicated, it's a giant security hole since you can audit 
guest state

VMCS (Intel) opaque memory needed a special instruction to READ/WRITE
VMCB (AMD) a page of memory you can write/read to directly
Easier for AMD to copy in/out large amount of memory.

Yann: What is the current state in Xen implementation?
Andrew: There is some, with known security issues and unknown ones.

Marek: If you run KVM in a Xen guests, you have an instant crash.

Yann: What are the plans to fix it?
Andrew: The L1 VMCS configuration can be completely different from the 
one Xen will use, and need to modify this so Xen can have multiple 
guests configuration.

See paper called Turtle for nested virtualization with VMCS merging.

Need to store VMCS/VMCB state somewhere (easy with VMCB since it's just 
a mapped page)

A bit of work from Andrew and Roger is needed before it can worked on by 
multiple people in parallel.

Next course of action:

- Wait for Andrew and Roger to fix MSR configuration from the toolstack. 
They're halfway through. According to them, that's sadly not a task we 
can really parallelize.
- When it is ready more people can then participate by implementing 
missing features one by one (with unit tests) (There will be a suggested 
order of things that need to be implemented) Can't predict when features 
will intersect with existing bugs.


A big thanks to Damien Thenot and Benjamin Reis for all the note taking, 
and of course to Andrew Cooper for most of the explaining.


Samuel Verschelde | Vates XCP-ng Lead Maintainer / Release Manager / Technical 
Product Manager

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.