[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 04/16] hypervisor part of add vmware_port to xl.cfg



On Mon, Sep 8, 2014 at 2:15 PM, Don Slutz <dslutz@xxxxxxxxxxx> wrote:
> Limited support of VMware's hyper-call.
>
> This is both a more complete support then in currently provided by
> QEMU and/or KVM and less.  The missing part requires QEMU changes
> and has been left out until the QEMU patches are accepted upstream.
>
> VMware's hyper-call is also known as VMware Backdoor I/O Port.
>
> Note: this support does not depend on vmware_hw being non-zero.
>
> Summary is that VMware treats "IN EAX, DX" (or "OUT DX, EAX"; or
> "inl %dx, %eax" in AT&T syntax ) to port 0x5658 specially.  Note:
> since many operations return data in EAX, "OUT DX, EAX" does not
> work for them on VMware nor with this code.
>
> Also this instruction is allowed to be used from ring 3.  To
> support this the vmexit for GP needs to be enabled.  I have not
> fully tested that nested HVM is doing the right thing for this.
>
> An open source example of using this is:
>
> http://open-vm-tools.sourceforge.net/
>
> Which only uses "IN EAX, DX".  Also
>
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
>
> Lists just "inl (%%dx)" (I assume this is AT&T syntax and is the
> same as "inl %dx, %eax").
>
> The support included is enough to allow VMware tools to install in a
> HVM domU.
>
> For a debug=y build there is a new command line option
> vmport_debug=.  It enabled output to the console of various
> stages of handling the "IN EAX, DX" instruction.
>
> Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>

Patch title "xen: Implement VMWare hypercall (magic port)" or
something like that

> diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
> index 37a1ece..2446eb7 100644
> --- a/xen/arch/x86/hvm/svm/emulate.c
> +++ b/xen/arch/x86/hvm/svm/emulate.c
> @@ -50,7 +50,7 @@ static unsigned int is_prefix(u8 opc)
>      return 0;
>  }
>
> -static unsigned long svm_rip2pointer(struct vcpu *v)
> +unsigned long svm_rip2pointer(struct vcpu *v)

I think we tend to use '_to_' instead of '2'; so this should be
svm_rip_to_pointer()

> @@ -152,7 +161,9 @@ static int fetch(struct vcpu *v, u8 *buf, unsigned long 
> addr, int len)
>  }
>
>  int __get_instruction_length_from_list(struct vcpu *v,
> -        const enum instruction_index *list, unsigned int list_count)
> +                                       const enum instruction_index *list,
> +                                       unsigned int list_count,
> +                                       bool_t err_rpt)

"err_rpt" reads to me as "error report" -- what's functinally distinct
about this is whether you deliver the gp or not, not whether it does a
printk.  I'd call this "inject_gp" or "gp_on_mismatch" or something.


> diff --git a/xen/arch/x86/hvm/vmware/vmport.c 
> b/xen/arch/x86/hvm/vmware/vmport.c
> new file mode 100644
> index 0000000..a6fd95c
> --- /dev/null
> +++ b/xen/arch/x86/hvm/vmware/vmport.c
> @@ -0,0 +1,311 @@
> +/*
> + * HVM VMPORT emulation
> + *
> + * Copyright (C) 2012 Verizon Corporation
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License Version 2 (GPLv2)
> + * as published by the Free Software Foundation.
> + *
> + * This file is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details. <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/config.h>
> +#include <xen/lib.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/support.h>
> +#include <asm/hvm/vmport.h>
> +
> +#include "backdoor_def.h"
> +#include "guest_msg_def.h"
> +
> +#ifndef NDEBUG
> +unsigned int opt_vmport_debug __read_mostly;
> +integer_param("vmport_debug", opt_vmport_debug);
> +#endif
> +
> +/* More VMware defines */
> +
> +#define VMWARE_GUI_AUTO_GRAB              0x001
> +#define VMWARE_GUI_AUTO_UNGRAB            0x002
> +#define VMWARE_GUI_AUTO_SCROLL            0x004
> +#define VMWARE_GUI_AUTO_RAISE             0x008
> +#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
> +#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
> +#define VMWARE_GUI_FULL_SCREEN            0x040
> +
> +#define VMWARE_GUI_TO_FULL_SCREEN         0x080
> +#define VMWARE_GUI_TO_WINDOW              0x100
> +
> +#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
> +
> +#define VMWARE_GUI_SYNC_TIME              0x400
> +
> +/* When set, toolboxes should not show the cursor options page. */
> +#define VMWARE_DISABLE_CURSOR_OPTIONS     0x800
> +
> +inline uint16_t get_low_bits(uint32_t bits)
> +{
> +    return bits & 0xffff;
> +}
> +
> +void vmport_register(struct domain *d)
> +{
> +    register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
> +}
> +
> +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    uint32_t cmd = get_low_bits(regs->rcx);
> +    uint32_t magic = regs->rax;
> +    int rc = X86EMUL_OKAY;
> +
> +    if ( magic == BDOOR_MAGIC )
> +    {
> +        uint64_t saved_rax = regs->rax;
> +        uint64_t value;
> +
> +        VMPORT_DBG_LOG(VMPORT_LOG_TRACE,
> +                       "VMware trace dir=%d bytes=%u ip=%"PRIx64" cmd=%d 
> ax=%"
> +                       PRIx64" bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"
> +                       PRIx64" di=%"PRIx64"\n", dir, bytes,
> +                       regs->rip, cmd, regs->rax, regs->rbx, regs->rcx,
> +                       regs->rdx, regs->rsi, regs->rdi);
> +        switch ( cmd )
> +        {
> +        case BDOOR_CMD_GETMHZ:
> +            /* ... */
> +            regs->rbx = BDOOR_MAGIC;
> +            regs->rax = current->domain->arch.tsc_khz / 1000;
> +            break;
> +        case BDOOR_CMD_GETVERSION:
> +            /* ... */
> +            regs->rbx = BDOOR_MAGIC;
> +            /* VERSION_MAGIC */
> +            regs->rax = 6;
> +            /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */
> +            regs->rcx = 2;
> +            break;
> +        case BDOOR_CMD_GETHWVERSION:
> +            /* ... */
> +            regs->rbx = BDOOR_MAGIC;
> +            /* vmware_hw */
> +            regs->rax = 0;
> +            if ( is_hvm_vcpu(current) )
> +            {
> +                struct hvm_domain *hd = &current->domain->arch.hvm_domain;
> +
> +                regs->rax = hd->params[HVM_PARAM_VMWARE_HW];
> +            }
> +            if ( !regs->rax )
> +                regs->rax = 4;  /* Act like version 4 */
> +            break;
> +        case BDOOR_CMD_GETHZ:
> +            value = current->domain->arch.tsc_khz * 1000;
> +            /* apic-frequency (bus speed) */
> +            regs->rcx = (uint32_t)(1000000000ULL / APIC_BUS_CYCLE_NS);
> +            /* High part of tsc-frequency */
> +            regs->rbx = (uint32_t)(value >> 32);
> +            /* Low part of tsc-frequency */
> +            regs->rax = value;
> +            break;
> +        case BDOOR_CMD_GETTIME:
> +            value = get_localtime_us(current->domain);
> +            /* hostUsecs */
> +            regs->rbx = (uint32_t)(value % 1000000UL);
> +            /* hostSecs */
> +            regs->rax = value / 1000000ULL;
> +            /* maxTimeLag */
> +            regs->rcx = 0;
> +            break;
> +        case BDOOR_CMD_GETTIMEFULL:
> +            value = get_localtime_us(current->domain);
> +            /* ... */
> +            regs->rax = BDOOR_MAGIC;
> +            /* hostUsecs */
> +            regs->rbx = (uint32_t)(value % 1000000UL);
> +            /* High part of hostSecs */
> +            regs->rsi = (uint32_t)((value / 1000000ULL) >> 32);
> +            /* Low part of hostSecs */
> +            regs->rdx = (uint32_t)(value / 1000000ULL);
> +            /* maxTimeLag */
> +            regs->rcx = 0;
> +            break;
> +        case BDOOR_CMD_GETGUIOPTIONS:
> +            regs->rax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB |
> +                VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME |
> +                VMWARE_DISABLE_CURSOR_OPTIONS;
> +            break;
> +        case BDOOR_CMD_SETGUIOPTIONS:
> +            regs->rax = 0x0;
> +            break;
> +        default:
> +            VMPORT_DBG_LOG(VMPORT_LOG_ERROR,
> +                           "VMware bytes=%d dir=%d cmd=%d",
> +                           bytes, dir, cmd);
> +            break;
> +        }
> +        VMPORT_DBG_LOG(VMPORT_LOG_VMWARE_AFTER,
> +                       "VMware after ip=%"PRIx64" cmd=%d ax=%"PRIx64" bx=%"
> +                       PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"PRIx64" di=%"
> +                       PRIx64"\n",
> +                       regs->rip, cmd, regs->rax, regs->rbx, regs->rcx,
> +                       regs->rdx, regs->rsi, regs->rdi);
> +        if ( dir == IOREQ_READ )
> +        {
> +            switch ( bytes )
> +            {
> +            case 1:
> +                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
> +                break;
> +            case 2:
> +                regs->rax = (saved_rax & 0xffff0000) | 
> get_low_bits(regs->rax);
> +                break;
> +            case 4:
> +                regs->rax = (uint32_t)regs->rax;
> +                break;
> +            }
> +            *val = regs->rax;
> +        }
> +        else
> +            regs->rax = saved_rax;
> +    }
> +    else
> +    {
> +        rc = X86EMUL_UNHANDLEABLE;
> +        VMPORT_DBG_LOG(VMPORT_LOG_ERROR,
> +                       "Not VMware %x vs %x; ip=%"PRIx64" ax=%"PRIx64
> +                       " bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"PRIx64
> +                       " di=%"PRIx64"",
> +                       magic, BDOOR_MAGIC, regs->rip, regs->rax, regs->rbx,
> +                       regs->rcx, regs->rdx, regs->rsi, regs->rdi);
> +    }
> +
> +    return rc;
> +}
> +
> +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
> +                    unsigned long inst_len, unsigned long inst_addr,
> +                    unsigned long ei)
> +{
> +    if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_VMWARE_PORT] )
> +        return 10;
> +
> +    if ( inst_len && inst_len <= 2 && get_low_bits(regs->rdx) == BDOOR_PORT 
> &&
> +         ei == 0 && regs->error_code == 0 &&
> +         (uint32_t)regs->rax == BDOOR_MAGIC )
> +    {
> +        int i = 0;
> +        uint32_t val;
> +        uint32_t byte_cnt = 4;
> +        unsigned char bytes[2];
> +        unsigned int fetch_len;
> +        int frc;
> +        int rc;
> +
> +        /*
> +         * Fetch up to the next page break; we'll fetch from the
> +         * next page later if we have to.
> +         */
> +        fetch_len = min_t(unsigned int, inst_len,
> +                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
> +        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
> +                                                PFEC_page_present);
> +        if ( frc != HVMCOPY_okay )
> +        {
> +            gdprintk(XENLOG_WARNING,
> +                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
> +                     (unsigned long) inst_addr, frc, inst_len, fetch_len);
> +            return 11;
> +        }

And what happens if the instruction happens to span a page boundary,
and the second half is paged out?  Won't this (unexpectedly) cause a
#GP, instead of a #PF?

> @@ -2565,6 +2567,50 @@ static void vmx_idtv_reinject(unsigned long idtv_info)
>      }
>  }
>
> +static unsigned long vmx_rip2pointer(struct cpu_user_regs *regs,
> +                                     struct vcpu *v)

vmx_rip_to_pointer(), please.

> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index dee6d68..77ff539 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -153,7 +153,8 @@
>
>  /* Params for VMware */
>  #define HVM_PARAM_VMWARE_HW                 35
> +#define HVM_PARAM_VMWARE_PORT               36

So why is the CPUID control different than having the port available?
Are we expecting a guest to have one without the other?

If so, and the first one really only enables the CPUID leaf, then it
should be called "VMWARE_CPUID" or something.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.