[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] Add limited support of VMware's hyper-call



>>> On 01.09.14 at 17:33, <dslutz@xxxxxxxxxxx> wrote:
> This is both a more complete support then in currently provided by
> QEMU and/or KVM and less.  The missing part requires QEMU changes
> and has been left out until the QEMU patches are accepted upstream.
> 
> VMware's hyper-call is also known as VMware Backdoor I/O Port.
> 
> Note: this support does not depend on vmware_hw being non-zero.
> 
> Summary is that VMware treats "IN EAX, DX" (or "OUT DX, EAX"; or
> "inl %dx, %eax" in AT&T syntax ) to port 0x5658 specially.  Note:
> since many operations return data in EAX, "OUT DX, EAX" does not
> work for them on VMware, I did not support the "OUT DX, EAX", but it
> would not be hard to add.
> 
> Also this instruction is allowed to be used from ring 3.  To
> support this the vmexit for GP needs to be enabled.  I have not
> fully tested that nested HVM is doing the right thing for this.
> 
> An open source example of using this is:
> 
> http://open-vm-tools.sourceforge.net/
> 
> Which only uses "IN EAX, DX".  Also
> 
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=disp
> layKC&externalId=1009458
> 
> Lists just "inl (%%dx)" (I assume this is AT&T syntax and is the
> same as "inl %dx, %eax").
> 
> The support included is enough to allow VMware tools to install in a
> HVM domU and provide guestinfo support.  guestinfo support is
> provide by what is known as VMware RPC support.  This guestinfo
> support is provided via libxc.  libxl support has not be written.
> 
> Note: VMware RPC support is only available on HVM domU.
> 
> This interface is an extension of __HYPERVISOR_HVM_op.  It was
> picked because xc_get_hvm_param() also uses it and VMware guest
> info is a lot like a hvm param.
> 
> The HVMOP_get_vmport_guest_info is used by two libxc functions,
> xc_get_vmport_guest_info and xc_fetch_vmport_guest_info.
> xc_fetch_vmport_guest_info is designed to be used to fetch all
> currently set guestinfo values.
> 
> To save on hypervisor heap memory, the guestinfo support in done in
> two sizes, normal and jumbo.  Normal is used to handle up to 128
> byte values and jumbo is used to handle up to 4096 byte values.
> 
> Since all this is work is done when the guest is doing a single
> instruction; it was designed to not use the hypervisor heap to
> allocate the memory at this time.  Instead a few are allocated at
> the create domain time and during the xen's hyper-call to get or set
> them.  This was picked in that if a tool stack is using the VMware
> guest info support, it should be using either of both of the get and
> set.  And so in this case the guest should only see an out of memory
> error when the compile max amount of hypervisor heap memory is in
> use.
> 
> Doing it this way does lead to a lot of pointer use and many
> sub structures.
> 
> If the domU is running VMware tools, then the "build version" of
> the tools is also available via xc_get_HVM_param().  This also
> enables the use of new triggers that will use the VMware hyper-call
> to do some limited control of the domU.  The most useful are
> poweroff and reboot.  Since a guest process needs to be running
> for these to work, a tool stack should check that the build version
> is non zero before assuming these will work.
> 
> The 2 hvm param's HVM_PARAM_VMPORT_BUILD_NUMBER_TIME and
> HVM_PARAM_VMPORT_BUILD_NUMBER_VALUE are how "build version" is
> accessed.  These 2 params are only allowed to be set to zero.  The
> HVM_PARAM_VMPORT_BUILD_NUMBER_TIME can be used to track the last
> time the VMware tools in the guest responded.  One such use would
> be the health of the tools in the guest.  The hvm param
> HVM_PARAM_VMPORT_RESET_TIME controls how often to request them in
> seconds minus 1.  The minus 1 is to handle to 0 case.  I.E. the
> fastest that can be selected is every second.  The default is 4
> times a minute.
> 
> The VMware RPC support includes the notion of channels that are
> opened, active and closed.  All RPC messages sent via a channel
> starts with normal ASCII text.  The message some times does include
> binary data.
> 
> Currently there are 2 protocols defined for VMware RPC.  They
> determine the direction for data flow, domU to tool stack or
> tool stack to domU.
> 
> There is no provided interrupt for VMware RPC.
> 
> The VMware's hyper-call state is included in live migration and
> save/restore.  Because the max size of the VMware guestinfo is
> large, then data is compressed and expanded in the
> vmport_save_domain_ctxt and vmport_load_domain_ctxt.
> 
> For a debug=y build there is a new command line option
> vmport_debug=.  It enabled output to the console of various
> stages of handling the "IN EAX, DX" instruction.  Most uses
> are the summary ones that show complete RPC actions.
> 
> Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
> ---
>  tools/libxc/xc_domain.c                      |  115 +++
>  tools/libxc/xenctrl.h                        |   24 +
>  tools/misc/xen-hvmctx.c                      |  229 ++++
>  tools/xentrace/formats                       |    8 +
>  xen/arch/x86/domctl.c                        |   34 +
>  xen/arch/x86/hvm/Makefile                    |    1 +
>  xen/arch/x86/hvm/hvm.c                       |  480 ++++++++-
>  xen/arch/x86/hvm/io.c                        |    1 +
>  xen/arch/x86/hvm/svm/emulate.c               |    3 +
>  xen/arch/x86/hvm/svm/svm.c                   |   79 ++
>  xen/arch/x86/hvm/svm/vmcb.c                  |    1 +
>  xen/arch/x86/hvm/vmport/Makefile             |    1 +
>  xen/arch/x86/hvm/vmport/includeCheck.h       |   17 +
>  xen/arch/x86/hvm/vmport/vmport.c             | 1436 
> ++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vmcs.c                  |    1 +
>  xen/arch/x86/hvm/vmx/vmx.c                   |   90 ++
>  xen/arch/x86/hvm/vmx/vvmx.c                  |   14 +
>  xen/include/asm-x86/hvm/domain.h             |    4 +
>  xen/include/asm-x86/hvm/svm/emulate.h        |    1 +
>  xen/include/asm-x86/hvm/trace.h              |   31 +
>  xen/include/asm-x86/hvm/vmport.h             |   90 ++
>  xen/include/public/arch-x86/hvm/save.h       |   39 +-
>  xen/include/public/arch-x86/hvm/vmporttype.h |  105 ++
>  xen/include/public/domctl.h                  |    3 +
>  xen/include/public/hvm/hvm_op.h              |   18 +
>  xen/include/public/hvm/params.h              |    5 +-
>  xen/include/public/trace.h                   |    7 +
>  27 files changed, 2834 insertions(+), 3 deletions(-)
>  create mode 100644 xen/arch/x86/hvm/vmport/Makefile
>  create mode 100644 xen/arch/x86/hvm/vmport/includeCheck.h
>  create mode 100644 xen/arch/x86/hvm/vmport/vmport.c
>  create mode 100644 xen/include/asm-x86/hvm/vmport.h
>  create mode 100644 xen/include/public/arch-x86/hvm/vmporttype.h

As I think was said on v1 already - this should be split into smaller
pieces if at all possible. I'm therefore not going to do a full review
at this time, just give a couple of remarks.

> @@ -579,6 +580,39 @@ long arch_do_domctl(
>          }
>          break;
>  
> +        case XEN_DOMCTL_SENDTRIGGER_VTPOWER:
> +        {
> +            ret = -EINVAL;
> +            if ( is_hvm_domain(d) )
> +            {
> +                ret = 0;
> +                vmport_ctrl_send(&d->arch.hvm_domain, "OS_Halt");
> +            }
> +        }
> +        break;
> +
> +        case XEN_DOMCTL_SENDTRIGGER_VTREBOOT:
> +        {
> +            ret = -EINVAL;
> +            if ( is_hvm_domain(d) )
> +            {
> +                ret = 0;
> +                vmport_ctrl_send(&d->arch.hvm_domain, "OS_Reboot");
> +            }
> +        }
> +        break;
> +
> +        case XEN_DOMCTL_SENDTRIGGER_VTPING:
> +        {
> +            ret = -EINVAL;
> +            if ( is_hvm_domain(d) )
> +            {
> +                ret = 0;
> +                vmport_ctrl_send(&d->arch.hvm_domain, "ping");
> +            }
> +        }
> +        break;

Rather than adding three new domctls, wouldn't a single VMware
one with suitable sub-operations do?

> @@ -1480,6 +1481,7 @@ int hvm_domain_initialise(struct domain *d)
>      INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server.list);
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
>      spin_lock_init(&d->arch.hvm_domain.uc_lock);
> +    spin_lock_init(&d->arch.hvm_domain.vmport_lock);
>  
>      INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
>      spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
> @@ -1492,11 +1494,36 @@ int hvm_domain_initialise(struct domain *d)
>  
>      d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
>      d->arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler);
> +    d->arch.hvm_domain.vmport_data = xzalloc(struct vmport_state);
>      rc = -ENOMEM;
> -    if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler )
> +    if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler ||
> +         !d->arch.hvm_domain.vmport_data )
>          goto fail1;
>      d->arch.hvm_domain.io_handler->num_slot = 0;
>  
> +    /*
> +     * Any value is fine here. In fact a random number may better.
> +     * It is used to help validate that a both sides are talking
> +     * about the same channel.
> +     */
> +    d->arch.hvm_domain.vmport_data->open_cookie = 435;
> +
> +    d->arch.hvm_domain.vmport_data->used_guestinfo = 10;
> +    for (rc = 0;
> +         rc < d->arch.hvm_domain.vmport_data->used_guestinfo;
> +         rc++)
> +        d->arch.hvm_domain.vmport_data->guestinfo[rc] =
> +            xzalloc(vmport_guestinfo_t);
> +
> +    d->arch.hvm_domain.vmport_data->used_guestinfo_jumbo = 2;
> +    for (rc = 0;
> +         rc < d->arch.hvm_domain.vmport_data->used_guestinfo_jumbo;
> +         rc++)
> +        d->arch.hvm_domain.vmport_data->guestinfo_jumbo[rc] =
> +            xzalloc(vmport_guestinfo_jumbo_t);
> +
> +    vmport_flush(&d->arch.hvm_domain);
> +

All this would very likely better go into a separate function placed in
vmport.c. Same for many of the helper functions further down.
Please realize that this file is already huge, so any new addition
having a reasonable other home would be nice to be kept out of
this file.

> @@ -1520,6 +1548,7 @@ int hvm_domain_initialise(struct domain *d)
>  
>      register_portio_handler(d, 0xe9, 1, hvm_print_line);
>      register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
> +    register_portio_handler(d, VMPORT_PORT, 4, vmport_ioport);

So from a cursory look I wasn't able to spot any code making sure
that port doesn't get used for e.g. a passed through device.

And in any event I'm rather uncomfortable about this getting
enabled unconditionally, also due to (but not limited to this) the
up front allocation of various memory blocks that may end up
never being needed. The main issue I see with this approach is
that guests could end up being misguided into thinking they're
running on VMware when in fact they have code in place to
optimize when running on Xen. We had such detection ordering
issues with Linux checking for both Hyper-V and Xen.

> @@ -1532,6 +1561,17 @@ int hvm_domain_initialise(struct domain *d)
>      stdvga_deinit(d);
>      vioapic_deinit(d);
>   fail1:
> +    if ( d->arch.hvm_domain.vmport_data )
> +    {
> +        struct vmport_state *vs = d->arch.hvm_domain.vmport_data;
> +        int idx;
> +
> +        for (idx = 0; idx < vs->used_guestinfo; idx++)

You'll have to go through and fix coding style issues.

> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -42,6 +42,7 @@
>  #include <asm/hvm/vlapic.h>
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/emulate.h>
> +#include <asm/hvm/vmport.h>

This again being the only change to this file - why?

> @@ -106,6 +107,7 @@ MAKE_INSTR(VMSAVE, 3, 0x0f, 0x01, 0xdb);
>  MAKE_INSTR(STGI,   3, 0x0f, 0x01, 0xdc);
>  MAKE_INSTR(CLGI,   3, 0x0f, 0x01, 0xdd);
>  MAKE_INSTR(INVLPGA,3, 0x0f, 0x01, 0xdf);
> +MAKE_INSTR(IN,     1, 0xed);

This name is ambiguous.

> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs, struct vcpu 
> *v)
> +{
> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
> +    unsigned long inst_len = __get_instruction_length(v, INSTR_IN);
> +
> +    regs->error_code = vmcb->exitinfo1;
> +    if ( hvm_long_mode_enabled(v) )
> +        HVMTRACE_LONG_C4D(TRAP_GP, TRAP_gp_fault, inst_len,
> +                          regs->error_code,
> +                          TRC_PAR_LONG(vmcb->exitinfo2));
> +    else
> +        HVMTRACE_C4D(TRAP_GP, TRAP_gp_fault, inst_len,
> +                     regs->error_code, vmcb->exitinfo2);
> +
> +    if ( inst_len <= 1 && (regs->rdx & 0xffff) == VMPORT_PORT &&

What would inst_len being zero mean here?

> +         vmcb->exitinfo2 == 0 && regs->error_code == 0 )
> +    {
> +        uint32_t magic = regs->rax;
> +
> +        if ( magic == VMPORT_MAGIC )

This ought to be folded with the previous if(), at once reducing
code redundancy further down in the function.

> +        {
> +            unsigned char bytes[1] = { 0 };

Pointless initializer (and it being an array looks suspicious too).

> @@ -2565,6 +2567,82 @@ static void vmx_idtv_reinject(unsigned long idtv_info)
>      }
>  }
>  
> +void do_gp_fault(struct cpu_user_regs *regs, struct vcpu *v)

A VMX-specific function shouldn't be named this way. Plus I can't see
why it's not static (in which case the name may actually be okay). And
then a lot of the code further down looks rather similar to SVM's -
please avoid such duplication.

> @@ -2675,6 +2753,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>                   && vector != TRAP_nmi 
>                   && vector != TRAP_machine_check ) 
>              {
> +                if ( vector == TRAP_gp_fault )
> +                {
> +                    VMPORT_DBG_LOG(VMPORT_LOG_REALMODE_GP,
> +                                   "realmode gp: ip=%"PRIx64" ax=%"PRIx64
> +                                   " bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64
> +                                   " si=%"PRIx64" di=%"PRIx64,
> +                                   regs->rip, regs->rax, regs->rbx, 
> regs->rcx,
> +                                   regs->rdx, regs->rsi, regs->rdi);
> +                }

Is this really useful (no matter that it'd guarded by a !NDEBUG-only
command line option? Speaking of which - why can't you just define
a new DBG_LEVEL_VMPORT and use the existing HVM_DBG_LOG()?

> @@ -2182,6 +2183,19 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>              if ( v->fpu_dirtied )
>                  nvcpu->nv_vmexit_pending = 1;
>          }
> +        else if ( vector == TRAP_gp_fault )
> +        {
> +#ifndef NDEBUG
> +            struct cpu_user_regs *ur = guest_cpu_user_regs();
> +#endif
> +            VMPORT_DBG_LOG(VMPORT_LOG_VGP_UNKNOWN,
> +                           "Unexpected gp: ip=%"PRIx64" ax=%"PRIx64
> +                           " bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64
> +                           " si=%"PRIx64" di=%"PRIx64,
> +                           ur->rip, ur->rax, ur->rbx, ur->rcx, ur->rdx,
> +                           ur->rsi, ur->rdi);

The #endif above ought to go here for the code to look consistent.

> +#ifndef NDEBUG
> +#define VMPORT_LOG_RPC             (1 << 0)
> +#define VMPORT_LOG_RECV_STATUS     (1 << 1)
> +#define VMPORT_LOG_SKIP_SEND       (1 << 2)
> +#define VMPORT_LOG_SEND            (1 << 3)
> +#define VMPORT_LOG_SEND_SIZE_ALL   (1 << 4)
> +#define VMPORT_LOG_SEND_SIZE       (1 << 5)
> +#define VMPORT_LOG_RECV_SIZE_ALL   (1 << 6)
> +#define VMPORT_LOG_RECV_SIZE       (1 << 7)
> +#define VMPORT_LOG_CLOSE           (1 << 8)
> +#define VMPORT_LOG_OPEN            (1 << 9)
> +#define VMPORT_LOG_FLUSH           (1 << 10)
> +#define VMPORT_LOG_TRACE           (1 << 11)
> +#define VMPORT_LOG_PING            (1 << 12)
> +#define VMPORT_LOG_SWEEP           (1 << 13)
> +#define VMPORT_LOG_BUILD           (1 << 14)
> +
> +#define VMPORT_LOG_ERROR           (1 << 16)
> +
> +#define VMPORT_LOG_INFO_GET        (1 << 17)
> +#define VMPORT_LOG_INFO_SET        (1 << 18)
> +
> +#define VMPORT_LOG_SKIP_MESSAGE    (1 << 19)
> +
> +#define VMPORT_LOG_GP_UNKNOWN      (1 << 20)
> +#define VMPORT_LOG_GP_NOT_VMWARE   (1 << 21)
> +#define VMPORT_LOG_GP_FAIL_RD_INST (1 << 22)
> +
> +#define VMPORT_LOG_GP_VMWARE_AFTER (1 << 23)
> +
> +#define VMPORT_LOG_VGP_UNKNOWN     (1 << 24)
> +#define VMPORT_LOG_REALMODE_GP     (1 << 27)
> +
> +#define VMPORT_LOG_VMWARE_AFTER    (1 << 28)

Ah, here we go (as to using HVM_DBG_LOG()): Isn't this _way_ too
fine grained?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.