[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: [PATCH 00/13] Nested Virtualization: Overview



>> I am not convinced a wrapper for wrapping, which is just make it
>> like C style as you said, should be taken just because somebody have
>> called it out.
> 
> I wouldn't call it a wrapper. I call as the implementation of an
> abstraction. 


While, that is terminology difference, abstract for abstract is as same bad as 
wrapper for wrapper. One good example is SVM on VMX that you abstracted months 
ago.
Extending SVM knowledge to VMX and limiting the flexbility of VMX 
implementation (in case you won't implement VMX code) is not a practical 
solution IMO. Nested VMX will touch existing code more comprehensively.

>> Freely (at any time) preload/post-store of VMCS fields is very hard
>> because VMX only provide access to current VMCS (because it is CPU
>> register), while SVM may be able to access all VMCBs given that it
>> is in memory. I can't say it is impossible (one can solve it with
>> further limitation/complexity), however enforcing those conversion
>> simply put trickies to VMX code to limiting the time of
>> pre-load/post-load, and the additional cost of VMCS access. 
>> 
> 
> When the VCPU switches between host and guest mode then you need to
> save the l1 guest state, restore the l2 guest state and vice versa.

No. Switching from L1 to L2 guest, it is a simple restore of L2 guest state 
(VMCLEAR/VMPTRLD). VMX doesn't save L1 state, while SVM does require the save 
of L1 state per my understanding. Intel process can hold VMCS in processor for 
performance.

Switching from L2 to L1, we need to convert (or store) some VMCS information 
from physical to virtual VMCS. But it is limited and only covers the "guest 
state" and exit information. Load of L1 state may be as simple as VMPTRLD (of 
course it may modify some VMCS field upon different situation).

This is another good example how risky it is to extend SVM knowledge to VMX.

> 
> This requires a lot of accesses from/to the vmcb/vmcs  unless you have
> a lazy switching technique, do you ?

Yes, the Intel hardware already did lazy switching thru VMPTRLD.
And the access of VMCS is depending on the L1 guest modification, only dirty 
VMCS fields needs to be updated. 

In the majority case, the VM exit from L2 guest will be handled by root VMM 
directly. One example is external interrupt, which doesn't need to access rest 
of VMCS fields except the reason, but the wrapper makes the access a must, 
which I can't agree.

> 
> 
>> 
>> Another portion of so called common code are actually SVM code only.
>> Here are part of them: 
>> 
>> 
>> 
>> +
>> +static enum nestedhvm_vmexits
>> +nestedhvm_vmexit_msr(unsigned long *msr_bitmap, uint32_t msr,
>> bool_t write) +{ +   bool_t enabled;
>> +    unsigned long *msr_bit = NULL;
>> +
>> +    /*
>> +     * See AMD64 Programmers Manual, Vol 2, Section 15.10 +  *
>> (MSR-Bitmap Address). +       */
>> +    if ( msr <= 0x1fff )
>> +            msr_bit = msr_bitmap + 0x0000 / BYTES_PER_LONG;
>> +    else if ( (msr >= 0xc0000000) && (msr <= 0xc0001fff) )
>> +            msr_bit = msr_bitmap + 0x0800 / BYTES_PER_LONG;
>> +    else if ( (msr >= 0xc0010000) && (msr <= 0xc0011fff) )
>> +            msr_bit = msr_bitmap + 0x1000 / BYTES_PER_LONG;
>> 
> 
> Why does above code snippet not work on Intel CPUs ?

It is said even "Intel processor" is following AMD64 manual, isn;t it?
msr_bitmap in Intel doesn't have a fixed bitmap position, rather it scan the 
entire table to decide which MSR to automatically save/restore for performance 
given that we only put single digital MSRs for that purpose.
BTW, Intel does implement MSRs large than 0x1fff such as 0x107D0. Preassuming 
their usage model for now is risky.

> 
> 
>> 
>> +/* Virtual GIF */
>> +int
>> +nestedsvm_vcpu_clgi(struct vcpu *v)
>> +{
>> +    if (!nestedhvm_enabled(v->domain)) {
>> +            hvm_inject_exception(TRAP_invalid_op, 0, 0);
>> +            return -1;
>> +    }
>> +
>> +    if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
>> +            return 0;
>> +
>> +    /* clear gif flag */
>> +    vcpu_nestedhvm(v).nh_gif = 0;
>> +    local_event_delivery_disable(); /* mask events for PV drivers */
>> +    return 0; +}
>> +
>> +int
>> +nestedsvm_vcpu_stgi(struct vcpu *v)
>> +{
>> +    if (!nestedhvm_enabled(v->domain)) {
>> +            hvm_inject_exception(TRAP_invalid_op, 0, 0);
>> +            return -1;
>> +    }
>> +
>> +    /* Always set the GIF to make hvm_interrupt_blocked work. */
>> +    vcpu_nestedhvm(v).nh_gif = 1;
>> +
>> +    if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
>> +            return 0;
>> +
>> +    local_event_delivery_enable(); /* unmask events for PV drivers */
>> +    return 0; +}
>> +
> 
> The reason to leave this in the generic code is what Keir stated out
> as feedback to the second patch series:
> http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00280.html
> (What was patch #10 there is  patch #8 in latest patch series).
> 

While, I didn;t read in this way. Even the function itself is wrapped, they 
should go to SVM tree if not go together with the caller.

Anyway, to me given that nested SVM & VMX is still on the very beginning of 
development, I can only say yes to those wrappers that have clear understanding 
to both side. I would rather leave those uncertain wrappers to future, once the 
basic shape of nested virtualization is good and stable enough, i.e. 
lightweight wrapper.  We have plenty of performance work ahead such as virtual 
VTd support, enhanced PV driver for nested etc. Excessive wrapper is simple a 
burden to nested VMX developers for those future feaftures. 

Qing will post his patch today or tomorrow for your reference if you want.

C++ abstracts better than C, but C++ never replaces C in places like Linux. 

BTW, can you add SR-IOV test into your patch test system to avoid regression?

Thx, Eddie

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.