Xen project Mailing List

[Xen-devel] Re: [PATCH 00/13] Nested Virtualization: Overview

To: "Dong, Eddie" <eddie.dong@xxxxxxxxx>

From: Christoph Egger <Christoph.Egger@xxxxxxx>

Date: Tue, 7 Sep 2010 17:49:22 +0200

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Tim Deegan <Tim.Deegan@xxxxxxxxxx>, "He, Qing" <qing.he@xxxxxxxxx>

Delivery-date: Tue, 07 Sep 2010 08:52:44 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thank you very much for giving me the valuable information I am asking for, finally. > >> Freely (at any time) preload/post-store of VMCS fields is very hard > >> because VMX only provide access to current VMCS (because it is CPU > >> register), while SVM may be able to access all VMCBs given that it > >> is in memory. I can't say it is impossible (one can solve it with > >> further limitation/complexity), however enforcing those conversion > >> simply put trickies to VMX code to limiting the time of > >> pre-load/post-load, and the additional cost of VMCS access. > > > > When the VCPU switches between host and guest mode then you need to > > save the l1 guest state, restore the l2 guest state and vice versa. > > No. Switching from L1 to L2 guest, it is a simple restore of L2 guest state > (VMCLEAR/VMPTRLD). VMX doesn't save L1 state, So the function hooks prepare4vmentry is pretty cheap and the 'hostsave' function hook is empty. > while SVM does require the save of L1 state per my understanding. SVM does not per architecture. It is the implementation I have taken over from KVM. It will be replaced with an implementation that only requires a VMSAVE to save the L1 state. But that won't change the patch series fundamentally as this change touches SVM code only. > Intel process can hold VMCS in processor for performance. > > Switching from L2 to L1, we need to convert (or store) some VMCS > information from physical to virtual VMCS. That is what can be put into the prepare4vmexit function hook. > But it is limited and only covers the "guest state" and exit information. > Load of L1 state may be as simple as VMPTRLD (of course it may modify some > VMCS field upon different situation). That is what can be put into the 'hostrestore' function hook. > > This requires a lot of accesses from/to the vmcb/vmcs unless you have > > a lazy switching technique, do you ? > > Yes, the Intel hardware already did lazy switching thru VMPTRLD. > And the access of VMCS is depending on the L1 guest modification, only > dirty VMCS fields needs to be updated. Sounds like you need a 'shadow vmcs' that holds the l2 guest state. > In the majority case, the VM exit from L2 guest will be handled by root VMM > directly. Same on SVM. root VMM handles everything L1 guest does not intercept plus some special intercepts such as interrupts, nmi's, page fault/nested page faults. > One example is external interrupt, which doesn't need to access > rest of VMCS fields except the reason, but the wrapper makes the access a > must, which I can't agree. Which conversion function do you mean by 'wrapper' here ? Why does it require additional VMCS field accesses ? Can you explain this in detail, please ? > >> Another portion of so called common code are actually SVM code only. > >> Here are part of them: > >> > >> > >> > >> + > >> +static enum nestedhvm_vmexits > >> +nestedhvm_vmexit_msr(unsigned long *msr_bitmap, uint32_t msr, > >> bool_t write) +{ + bool_t enabled; > >> + unsigned long *msr_bit = NULL; > >> + > >> + /* > >> + * See AMD64 Programmers Manual, Vol 2, Section 15.10 + * > >> (MSR-Bitmap Address). + */ > >> + if ( msr <= 0x1fff ) > >> + msr_bit = msr_bitmap + 0x0000 / BYTES_PER_LONG; > >> + else if ( (msr >= 0xc0000000) && (msr <= 0xc0001fff) ) > >> + msr_bit = msr_bitmap + 0x0800 / BYTES_PER_LONG; > >> + else if ( (msr >= 0xc0010000) && (msr <= 0xc0011fff) ) > >> + msr_bit = msr_bitmap + 0x1000 / BYTES_PER_LONG; > > > > Why does above code snippet not work on Intel CPUs ? > > It is said even "Intel processor" is following AMD64 manual, isn;t it? > msr_bitmap in Intel doesn't have a fixed bitmap position, rather it scan > the entire table to decide which MSR to automatically save/restore for > performance given that we only put single digital MSRs for that purpose. > BTW, Intel does implement MSRs large than 0x1fff such as 0x107D0. > Preassuming their usage model for now is risky. Ok, I will think about this. > >> +/* Virtual GIF */ > >> +int > >> +nestedsvm_vcpu_clgi(struct vcpu *v) > >> +{ > >> + if (!nestedhvm_enabled(v->domain)) { > >> + hvm_inject_exception(TRAP_invalid_op, 0, 0); > >> + return -1; > >> + } > >> + > >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > >> + return 0; > >> + > >> + /* clear gif flag */ > >> + vcpu_nestedhvm(v).nh_gif = 0; > >> + local_event_delivery_disable(); /* mask events for PV drivers */ > >> + return 0; +} > >> + > >> +int > >> +nestedsvm_vcpu_stgi(struct vcpu *v) > >> +{ > >> + if (!nestedhvm_enabled(v->domain)) { > >> + hvm_inject_exception(TRAP_invalid_op, 0, 0); > >> + return -1; > >> + } > >> + > >> + /* Always set the GIF to make hvm_interrupt_blocked work. */ > >> + vcpu_nestedhvm(v).nh_gif = 1; > >> + > >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > >> + return 0; > >> + > >> + local_event_delivery_enable(); /* unmask events for PV drivers */ > >> + return 0; +} > >> + > > > > The reason to leave this in the generic code is what Keir stated out > > as feedback to the second patch series: > > http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00280.html > > (What was patch #10 there is patch #8 in latest patch series). > > While, I didn;t read in this way. Even the function itself is wrapped, they > should go to SVM tree if not go together with the caller. > > Anyway, to me given that nested SVM & VMX is still on the very beginning of > development, I can only say yes to those wrappers that have clear > understanding to both side. Good to know. See my offer below. > I would rather leave those uncertain wrappers to future, once the basic > shape of nested virtualization is good and stable enough, i.e. lightweight > wrapper. We have plenty of performance work ahead such as virtual VTd > support, enhanced PV driver for nested etc. Excessive wrapper is simple a > burden to nested VMX developers for those future features. > > Qing will post his patch today or tomorrow for your reference if you want. Thanks. May I take code from there and add into my patch series ? > C++ abstracts better than C, but C++ never replaces C in places like Linux. It's not necessary. When you truely understand the concept behind interface and implementation you can do that in any language you know. > BTW, can you add SR-IOV test into your patch test system to avoid > regression? Yes. > Thx, Eddie -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.