[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.5 v6 04/16] xen: Add vmware_port support



On 09/24/14 12:01, George Dunlap wrote:
On 09/20/2014 07:07 PM, Don Slutz wrote:
This includes adding is_vmware_port_enabled

This is a new domain_create() flag, DOMCRF_vmware_port.  It is
passed to domctl as XEN_DOMCTL_CDF_vmware_port.

This enables limited support of VMware's hyper-call.

This is both a more complete support then in currently provided by
QEMU and/or KVM and less.  The missing part requires QEMU changes
and has been left out until the QEMU patches are accepted upstream.

VMware's hyper-call is also known as VMware Backdoor I/O Port.

Note: this support does not depend on vmware_hw being non-zero.

Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)")
to port 0x5658 specially.  Note: since many operations return data
in EAX, "in (%dx),%eax" is the one to use.  The other lengths like
"in (%dx),%al" will still do things, only AL part of EAX will be
changed.  For "out %eax,(%dx)" of all lengths, EAX will remain
unchanged.

Also this instruction is allowed to be used from ring 3.  To
support this the vmexit for GP needs to be enabled.  I have not
fully tested that nested HVM is doing the right thing for this.

An open source example of using this is:

http://open-vm-tools.sourceforge.net/

Which only uses "inl (%dx)".  Also

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

The support included is enough to allow VMware tools to install in a
HVM domU.

For a debug=y build there is a new command line option
vmport_debug=.  It enabled output to the console of various
stages of handling the "in (%dx)" instruction.

Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>

[snip]

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7b1dfe6..e2e4aad 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -510,6 +510,8 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
      d->arch.hvm_domain.mem_sharing_enabled = 0;
        d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity);
+    d->arch.hvm_domain.is_vmware_port_enabled =
+        (domcr_flags & DOMCRF_vmware_port);

Should this be "!!(domcr..."?


I do not think it is needed, but happy to change to that.

diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c
new file mode 100644
index 0000000..811c303
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/vmport.c
@@ -0,0 +1,326 @@
+/*
+ * HVM VMPORT emulation
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+#include <xen/lib.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmport.h>
+
+#include "backdoor_def.h"
+#include "guest_msg_def.h"
+
+#define MAX_INST_LEN 15
+
+#ifndef NDEBUG
+unsigned int opt_vmport_debug __read_mostly;
+integer_param("vmport_debug", opt_vmport_debug);
+#endif
+
+/* More VMware defines */
+
+#define VMWARE_GUI_AUTO_GRAB              0x001
+#define VMWARE_GUI_AUTO_UNGRAB            0x002
+#define VMWARE_GUI_AUTO_SCROLL            0x004
+#define VMWARE_GUI_AUTO_RAISE             0x008
+#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
+#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
+#define VMWARE_GUI_FULL_SCREEN            0x040
+
+#define VMWARE_GUI_TO_FULL_SCREEN         0x080
+#define VMWARE_GUI_TO_WINDOW              0x100
+
+#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
+
+#define VMWARE_GUI_SYNC_TIME              0x400
+
+/* When set, toolboxes should not show the cursor options page. */
+#define VMWARE_DISABLE_CURSOR_OPTIONS     0x800
+
+void vmport_register(struct domain *d)
+{
+    register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
+}
+
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    uint32_t cmd = regs->rcx & 0xffff;
+    uint32_t magic = regs->rax;
+    int rc = X86EMUL_OKAY;
+
+    if ( magic == BDOOR_MAGIC )
+    {
+        uint64_t saved_rax = regs->rax;
+        uint64_t value;
+
+        VMPORT_DBG_LOG(VMPORT_LOG_TRACE,
+ "VMware trace dir=%d bytes=%u ip=%"PRIx64" cmd=%d ax=%" + PRIx64" bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"
+                       PRIx64" di=%"PRIx64"\n", dir, bytes,
+                       regs->rip, cmd, regs->rax, regs->rbx, regs->rcx,
+                       regs->rdx, regs->rsi, regs->rdi);
+        switch ( cmd )
+        {
+        case BDOOR_CMD_GETMHZ:
+            /* ... */
+            regs->rbx = BDOOR_MAGIC;
+            regs->rax = current->domain->arch.tsc_khz / 1000;
+            break;
+        case BDOOR_CMD_GETVERSION:
+            /* ... */
+            regs->rbx = BDOOR_MAGIC;
+            /* VERSION_MAGIC */
+            regs->rax = 6;
+            /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */
+            regs->rcx = 2;
+            break;
+        case BDOOR_CMD_GETSCREENSIZE:
+            /* We have no screen size */
+            regs->rax = 0;
+            break;
+        case BDOOR_CMD_GETHWVERSION:
+            /* vmware_hw */
+            regs->rax = 0;
+            if ( is_hvm_vcpu(current) )
+            {
+ struct hvm_domain *hd = &current->domain->arch.hvm_domain;
+
+                regs->rax = hd->params[HVM_PARAM_VMWARE_HW];
+            }
+            if ( !regs->rax )
+                regs->rax = 4;  /* Act like version 4 */
+            break;
+        case BDOOR_CMD_GETHZ:
+            value = current->domain->arch.tsc_khz * 1000;
+            /* apic-frequency (bus speed) */
+            regs->rcx = (uint32_t)(1000000000ULL / APIC_BUS_CYCLE_NS);
+            /* High part of tsc-frequency */
+            regs->rbx = (uint32_t)(value >> 32);
+            /* Low part of tsc-frequency */
+            regs->rax = value;

Either the comment or the code here is wrong -- this is clearly not the lower 32 bits, at least on 64-bit guests. :-)


Opps, it should have included the (uint32_t) cast also.  Will fix.

If the code is right -- that is, if a 32-bit guest find this truncated automatically, but a 64-bit guest find all 64 bits here (and thus not have to reconstruct it) -- you should make the comment more informative; for example: /* On 32-bit systems this will be the lower 32 bits. 64-bit systems can just use the full value from rax. */

(Word-wrapped, of course.)

Hmm -- looks like regs->rax will be clipped to 32 bits for a 4-byte IO read? In which case the comment here should reflect this, but you have the same basic issue for BDOOR_CMD_GETTIMEFUL regs->rdx (which will not be clipped, I don't think).


It will also be "adjusted" for 2 or 1 byte IO.  rdx does not get clipped
later, but is clipped to 32bits (see below).

+            break;
+        case BDOOR_CMD_GETTIME:
+            value = get_localtime_us(current->domain);
+            /* hostUsecs */
+            regs->rbx = (uint32_t)(value % 1000000UL);
+            /* hostSecs */
+            regs->rax = value / 1000000ULL;
+            /* maxTimeLag */
+            regs->rcx = 0;
+            break;
+        case BDOOR_CMD_GETTIMEFULL:
+            value = get_localtime_us(current->domain);
+            /* ... */
+            regs->rax = BDOOR_MAGIC;
+            /* hostUsecs */
+            regs->rbx = (uint32_t)(value % 1000000UL);
+            /* High part of hostSecs */
+            regs->rsi = (uint32_t)((value / 1000000ULL) >> 32);
+            /* Low part of hostSecs */
+            regs->rdx = (uint32_t)(value / 1000000ULL);

Same here.


But the (uint32_t) does make it just 32bits.
+            /* maxTimeLag */
+            regs->rcx = 0;
+            break;
+        case BDOOR_CMD_GETGUIOPTIONS:
+            regs->rax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB |
+                VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME |
+                VMWARE_DISABLE_CURSOR_OPTIONS;
+            break;
+        case BDOOR_CMD_SETGUIOPTIONS:
+            regs->rax = 0x0;
+            break;
+        default:
+            VMPORT_DBG_LOG(VMPORT_LOG_ERROR,
+                           "VMware bytes=%d dir=%d cmd=%d",
+                           bytes, dir, cmd);
+            break;
+        }
+        VMPORT_DBG_LOG(VMPORT_LOG_VMWARE_AFTER,
+ "VMware after ip=%"PRIx64" cmd=%d ax=%"PRIx64" bx=%" + PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"PRIx64" di=%"
+                       PRIx64"\n",
+                       regs->rip, cmd, regs->rax, regs->rbx, regs->rcx,
+                       regs->rdx, regs->rsi, regs->rdi);
+        if ( dir == IOREQ_READ )
+        {
+            switch ( bytes )
+            {
+            case 1:
+ regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
+                break;
+            case 2:
+ regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff);
+                break;
+            case 4:
+                regs->rax = (uint32_t)regs->rax;
+                break;
+            }
+            *val = regs->rax;
+        }
+        else
+            regs->rax = saved_rax;
+    }
+    else
+    {
+        rc = X86EMUL_UNHANDLEABLE;
+        VMPORT_DBG_LOG(VMPORT_LOG_ERROR,
+                       "Not VMware %x vs %x; ip=%"PRIx64" ax=%"PRIx64
+ " bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"PRIx64
+                       " di=%"PRIx64"",
+ magic, BDOOR_MAGIC, regs->rip, regs->rax, regs->rbx,
+                       regs->rcx, regs->rdx, regs->rsi, regs->rdi);
+    }
+
+    return rc;
+}
+
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2)

I've wondered, in another e-mail, whether it would make more sense to try to re-use the normal Xen emulation code, instead of duplicating its IO instruction decoding stuff here.


Since this is only called on a #GP, I do not what to attempt to emulate
the instruction by the normal way.  In fact the normal Xen emulation
should say "do a #GP", not do the VMware hypercall.

I will say that I had not looked into getting the normal Xen emulation
"fixed" for this case.  In all my testing, I have not see this issue.


With the patch:

Subject: [Xen-devel] [PATCH 5/6] x86/hvm: Forced Emulation Prefix for debug
        builds of Xen
Message-ID: <1411484611-31027-6-git-send-email-andrew.cooper3@xxxxxxxxxx>


I need to look into this.




I think I probably wouldn't make that a blocker for acceptance, though. However...


Thanks.

+{
+ ASSERT(v->domain->arch.hvm_domain.is_vmware_port_enabled);

At the moment I think this ASSERT is misplaced; there are no checks for this anywhere in the handler path to this point. If at any time in the future (or for any other reason) #GP exiting gets enabled (for example, if the introspection stuff wants to be notifed on #GPs), you'll end up going through this path whether or not is_vmware_port_enabled is true.

I think you should instead "if(!is_vmware_port_enabled) return" here. That would effectively isolate these new changes from being able to introduce security issues for VMs which don't enable vmware_port, making it less risky to accept as-is.


Ok, it was a return in an older version.  Happy to switch back.

[snip]

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index fc1f882..6fe9389 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v)
        v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
                | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
+              | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                 (1U << TRAP_gp_fault) : 0)
                | (1U << TRAP_no_device);

Probably not mandatory, but it might be nice to pull this bitmap logic into a function, so that you don't have the code duplication (here and in vmx_update_guest_cr()).


Ok, Will keep it in mind.

   -Don Slutz

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.