Xen project Mailing List

[Xen-devel] RE: [PATCH 00/13] Nested Virtualization: Overview

To: Christoph Egger <Christoph.Egger@xxxxxxx>

From: "Dong, Eddie" <eddie.dong@xxxxxxxxx>

Date: Tue, 7 Sep 2010 09:44:01 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: Tim, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>, Deegan <Tim.Deegan@xxxxxxxxxx>, "He, Qing" <qing.he@xxxxxxxxx>

Delivery-date: Mon, 06 Sep 2010 18:49:59 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: ActNwP65SOobiIzgQ2iiumkAxI7vGAACcUMg

Thread-topic: [PATCH 00/13] Nested Virtualization: Overview

>> I am not convinced a wrapper for wrapping, which is just make it >> like C style as you said, should be taken just because somebody have >> called it out. > > I wouldn't call it a wrapper. I call as the implementation of an > abstraction. While, that is terminology difference, abstract for abstract is as same bad as wrapper for wrapper. One good example is SVM on VMX that you abstracted months ago. Extending SVM knowledge to VMX and limiting the flexbility of VMX implementation (in case you won't implement VMX code) is not a practical solution IMO. Nested VMX will touch existing code more comprehensively. >> Freely (at any time) preload/post-store of VMCS fields is very hard >> because VMX only provide access to current VMCS (because it is CPU >> register), while SVM may be able to access all VMCBs given that it >> is in memory. I can't say it is impossible (one can solve it with >> further limitation/complexity), however enforcing those conversion >> simply put trickies to VMX code to limiting the time of >> pre-load/post-load, and the additional cost of VMCS access. >> > > When the VCPU switches between host and guest mode then you need to > save the l1 guest state, restore the l2 guest state and vice versa. No. Switching from L1 to L2 guest, it is a simple restore of L2 guest state (VMCLEAR/VMPTRLD). VMX doesn't save L1 state, while SVM does require the save of L1 state per my understanding. Intel process can hold VMCS in processor for performance. Switching from L2 to L1, we need to convert (or store) some VMCS information from physical to virtual VMCS. But it is limited and only covers the "guest state" and exit information. Load of L1 state may be as simple as VMPTRLD (of course it may modify some VMCS field upon different situation). This is another good example how risky it is to extend SVM knowledge to VMX. > > This requires a lot of accesses from/to the vmcb/vmcs unless you have > a lazy switching technique, do you ? Yes, the Intel hardware already did lazy switching thru VMPTRLD. And the access of VMCS is depending on the L1 guest modification, only dirty VMCS fields needs to be updated. In the majority case, the VM exit from L2 guest will be handled by root VMM directly. One example is external interrupt, which doesn't need to access rest of VMCS fields except the reason, but the wrapper makes the access a must, which I can't agree. > > >> >> Another portion of so called common code are actually SVM code only. >> Here are part of them: >> >> >> >> + >> +static enum nestedhvm_vmexits >> +nestedhvm_vmexit_msr(unsigned long *msr_bitmap, uint32_t msr, >> bool_t write) +{ + bool_t enabled; >> + unsigned long *msr_bit = NULL; >> + >> + /* >> + * See AMD64 Programmers Manual, Vol 2, Section 15.10 + * >> (MSR-Bitmap Address). + */ >> + if ( msr <= 0x1fff ) >> + msr_bit = msr_bitmap + 0x0000 / BYTES_PER_LONG; >> + else if ( (msr >= 0xc0000000) && (msr <= 0xc0001fff) ) >> + msr_bit = msr_bitmap + 0x0800 / BYTES_PER_LONG; >> + else if ( (msr >= 0xc0010000) && (msr <= 0xc0011fff) ) >> + msr_bit = msr_bitmap + 0x1000 / BYTES_PER_LONG; >> > > Why does above code snippet not work on Intel CPUs ? It is said even "Intel processor" is following AMD64 manual, isn;t it? msr_bitmap in Intel doesn't have a fixed bitmap position, rather it scan the entire table to decide which MSR to automatically save/restore for performance given that we only put single digital MSRs for that purpose. BTW, Intel does implement MSRs large than 0x1fff such as 0x107D0. Preassuming their usage model for now is risky. > > >> >> +/* Virtual GIF */ >> +int >> +nestedsvm_vcpu_clgi(struct vcpu *v) >> +{ >> + if (!nestedhvm_enabled(v->domain)) { >> + hvm_inject_exception(TRAP_invalid_op, 0, 0); >> + return -1; >> + } >> + >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) >> + return 0; >> + >> + /* clear gif flag */ >> + vcpu_nestedhvm(v).nh_gif = 0; >> + local_event_delivery_disable(); /* mask events for PV drivers */ >> + return 0; +} >> + >> +int >> +nestedsvm_vcpu_stgi(struct vcpu *v) >> +{ >> + if (!nestedhvm_enabled(v->domain)) { >> + hvm_inject_exception(TRAP_invalid_op, 0, 0); >> + return -1; >> + } >> + >> + /* Always set the GIF to make hvm_interrupt_blocked work. */ >> + vcpu_nestedhvm(v).nh_gif = 1; >> + >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) >> + return 0; >> + >> + local_event_delivery_enable(); /* unmask events for PV drivers */ >> + return 0; +} >> + > > The reason to leave this in the generic code is what Keir stated out > as feedback to the second patch series: > http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00280.html > (What was patch #10 there is patch #8 in latest patch series). > While, I didn;t read in this way. Even the function itself is wrapped, they should go to SVM tree if not go together with the caller. Anyway, to me given that nested SVM & VMX is still on the very beginning of development, I can only say yes to those wrappers that have clear understanding to both side. I would rather leave those uncertain wrappers to future, once the basic shape of nested virtualization is good and stable enough, i.e. lightweight wrapper. We have plenty of performance work ahead such as virtual VTd support, enhanced PV driver for nested etc. Excessive wrapper is simple a burden to nested VMX developers for those future feaftures. Qing will post his patch today or tomorrow for your reference if you want. C++ abstracts better than C, but C++ never replaces C in places like Linux. BTW, can you add SR-IOV test into your patch test system to avoid regression? Thx, Eddie _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.