[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH v2 1/1] Add IOREQ_TYPE_VMWARE_PORT



On 10/02/14 16:33, Andrew Cooper wrote:
On 02/10/14 19:36, Don Slutz wrote:
Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
---
v2:
     Fixup usage of hvmtrace_io_assist().
     VMware only changes the 32bit part of the register.
        Added vmware_ioreq_t

  xen/arch/x86/hvm/emulate.c        | 72 +++++++++++++++++++++++++++++++++++++++
  xen/arch/x86/hvm/io.c             | 19 +++++++++++
  xen/arch/x86/hvm/vmware/vmport.c  | 24 ++++++++++++-
  xen/include/asm-x86/hvm/emulate.h |  2 ++
  xen/include/asm-x86/hvm/vcpu.h    |  1 +
  xen/include/public/hvm/ioreq.h    | 19 +++++++++++
  6 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c0f47d2..215f049 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -50,6 +50,78 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
      trace_var(event, 0/*!cycles*/, size, buffer);
  }
+int hvmemul_do_vmport(uint16_t addr, int size, int dir,
+                      struct cpu_user_regs *regs)
+{
+    struct vcpu *curr = current;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+    vmware_ioreq_t p = {
+        .type = IOREQ_TYPE_VMWARE_PORT,
+        .addr = addr,
+        .size = size,
+        .dir = dir,
+        .eax = regs->rax,
+        .ebx = regs->rbx,
+        .ecx = regs->rcx,
+        .edx = regs->rdx,
+        .esi = regs->rsi,
+        .edi = regs->rdi,
+    };
+    ioreq_t *pp = (ioreq_t *)&p;
+    ioreq_t op;
Eww.

Just because the C type system lets you abuse it like this doesn't mean
it is a clever idea to.  Please refer to c/s 15a9f34d1b as an example of
the kinds of bugs it causes.

This is a direct result of:


Subject:        Re: [PATCH 1/1] xen-hvm.c: Add support for Xen access to vmport
Date:   Tue, 30 Sep 2014 11:35:44 +0100
From:   Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
To:     Don Slutz <dslutz@xxxxxxxxxxx>
CC: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, qemu-devel@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, Alexander Graf <agraf@xxxxxxx>, Andreas Färber <afaerber@xxxxxxx>, Anthony Liguori <aliguori@xxxxxxxxxx>, Marcel Apfelbaum <marcel.a@xxxxxxxxxx>, Markus Armbruster <armbru@xxxxxxxxxx>, Michael S. Tsirkin <mst@xxxxxxxxxx>



On Mon, 29 Sep 2014, Don Slutz wrote:
On 09/29/14 06:25, Stefano Stabellini wrote:
> On Mon, 29 Sep 2014, Stefano Stabellini wrote:
> > On Fri, 26 Sep 2014, Don Slutz wrote:
> > > This adds synchronisation of the vcpu registers
> > > between Xen and QEMU.

...

> > > +            CPUX86State *env;
> > > +            ioreq_t fake_req = {
> > > +                .type = IOREQ_TYPE_PIO,
> > > +                .addr = (uint16_t)req->size,
> > > +                .size = 4,
> > > +                .dir = IOREQ_READ,
> > > +                .df = 0,
> > > +                .data_is_ptr = 0,
> > > +            };
> Why do you need a fake req?

To transport the 6 VCPU registers (only 32bits of them) that vmport.c
needs to do it's work.

> Couldn't Xen send the real req instead?

Yes, but then a 2nd exchange between QEMU and Xen would be needed
to fetch the 6 VCPU registers.  The ways I know of to fetch the VCPU registers
from Xen, all need many cycles to do their work and return
a lot of data that is not needed.

The other option that I have considered was to extend the ioreq_t type
to have room for these registers, but that reduces by almost half the
maximum number of VCPUs that are supported (They all live on 1 page).

Urgh. Now that I understand the patch better is think it's horrible, no
offense :-)

Why don't you add another new ioreq type to send out the vcpu state?
Something like IOREQ_TYPE_VCPU_SYNC_REGS? You could send it to QEMU
before IOREQ_TYPE_VMWARE_PORT. Actually this solution looks very imilar
to Alex's suggestion.

...


And the ASSERTs below are the attempt to prevent bugs from being added.

Sigh.  Too much with both XEN and QEMU in hard freeze.  I may
have a way to avoid the cast.

+
+    ASSERT(sizeof(p) == sizeof(op));
+    ASSERT(offsetof(ioreq_t, type) == offsetof(vmware_ioreq_t, type));
+    ASSERT(offsetof(ioreq_t, vp_eport) == offsetof(vmware_ioreq_t, vp_eport));
These can be evaluated by the compiler.  They should be BUILD_BUG_ON()s
rather than ASSERT()s.

I looked for this, but did not find it.  Will change.

+
+    switch ( vio->io_state )
+    {
+    case HVMIO_none:
+        break;
+    case HVMIO_completed:
+        vio->io_state = HVMIO_none;
+        goto finish_access_vmport;
+    case HVMIO_dispatched:
+        /* May have to wait for previous cycle of a multi-write to complete. */
+    default:
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    if ( hvm_io_pending(curr) )
+    {
+        gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    vio->io_state = HVMIO_handle_vmport_awaiting_completion;
+    vio->io_size = size;
+
+    /* If there is no backing DM, just ignore accesses */
+    if ( !hvm_has_dm(curr->domain) )
+    {
+        vio->io_state = HVMIO_none;
+        vio->io_data = ~0ul;
+    }
+    else
+    {
+        if ( !hvm_send_assist_req(pp) )
+            vio->io_state = HVMIO_none;
+        return X86EMUL_RETRY;
+    }
+
+ finish_access_vmport:
+    memset(&op, 0, sizeof(op));
+    op.dir = dir;
+    op.addr = (uint16_t)regs->rdx;
+    op.data_is_ptr = 0;
+    op.data = vio->io_data;
Most of this can be achieved with struct initialisation similar to 'p'
above.

Ok, will make it a block.   I did it here (and will continue here) because
this is an error path that should not be taken.

+    hvmtrace_io_assist(0, &op);
+
+    memcpy(&regs->rax, &vio->io_data, vio->io_size);
This is going to need some justification.  Clobbering the registers like
this looks bogus.

Copied code from hvmemul_do_io().  Not sure how to justify it.  I have
some ideas on why it is good.


+
+    return X86EMUL_OKAY;
+}
+
  static int hvmemul_do_io(
      int is_mmio, paddr_t addr, unsigned long *reps, int size,
      paddr_t ram_gpa, int dir, int df, void *p_data)
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 68fb890..a8bf324 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -197,6 +197,25 @@ void hvm_io_assist(ioreq_t *p)
          else
              memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
          break;
+    case HVMIO_handle_vmport_awaiting_completion:
+    {
+        struct cpu_user_regs *regs = guest_cpu_user_regs();
+        vmware_ioreq_t *vp = (vmware_ioreq_t *)p;
+
+        ASSERT(sizeof(*p) == sizeof(*vp));
+        ASSERT(offsetof(ioreq_t, type) == offsetof(vmware_ioreq_t, type));
+        ASSERT(offsetof(ioreq_t, vp_eport) == offsetof(vmware_ioreq_t, 
vp_eport));
+
+        /* Always zero extension for eax */
+        regs->rax = vp->eax;
+        /* Only change the 32bit part of the register */
+        regs->rbx = (regs->rbx & 0xffffffff00000000ull) | vp->ebx;
regs->_e?? allows you to directly access the lower half, avoiding these
bit manipulations.

Also did not know of these (besides being bad names...)

Will change to use them.

   -Don Slutz

~Andrew

+        regs->rcx = (regs->rcx & 0xffffffff00000000ull) | vp->ecx;
+        regs->rdx = (regs->rdx & 0xffffffff00000000ull) | vp->edx;
+        regs->rsi = (regs->rsi & 0xffffffff00000000ull) | vp->esi;
+        regs->rdi = (regs->rdi & 0xffffffff00000000ull) | vp->edi;
+    }
+        break;
      default:
          break;
      }
diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c
index 962ee32..0649106 100644
--- a/xen/arch/x86/hvm/vmware/vmport.c
+++ b/xen/arch/x86/hvm/vmware/vmport.c
@@ -147,9 +147,31 @@ int vmport_ioport(int dir, uint32_t port, uint32_t bytes, 
uint32_t *val)
              regs->rax = 0x0;
              break;
          default:
+        {   /* Let backing DM handle */
+            int rc;
+
              HVMTRACE_ND(VMPORT_UNKNOWN, 0, 1/*cycles*/, 6,
-                        (bytes << 8) + dir, cmd, regs->rbx,
+                        (bytes << 8) | dir, cmd, regs->rbx,
                          regs->rcx, regs->rsi, regs->rdi);
+            rc = hvmemul_do_vmport(BDOOR_PORT, bytes, dir, regs);
+            switch (rc)
+            {
+            case X86EMUL_OKAY:
+                break;
+            case X86EMUL_RETRY:
+            {
+                struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+
+                if ( vio->io_state != HVMIO_handle_vmport_awaiting_completion )
+                    return 0;
+                break;
+            }
+            default:
+                gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
+                domain_crash(current->domain);
+                break;
+            }
+        }
              break;
          }
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 5411302..7d18435 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -53,6 +53,8 @@ struct segment_register *hvmemul_get_seg_reg(
  int hvmemul_do_pio(
      unsigned long port, unsigned long *reps, int size,
      paddr_t ram_gpa, int dir, int df, void *p_data);
+int hvmemul_do_vmport(uint16_t addr, int size, int dir,
+                     struct cpu_user_regs *regs);
void hvm_dump_emulation_state(const char *prefix,
                                struct hvm_emulate_ctxt *hvmemul_ctxt);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 01e0665..1e63d7f 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -36,6 +36,7 @@ enum hvm_io_state {
      HVMIO_awaiting_completion,
      HVMIO_handle_mmio_awaiting_completion,
      HVMIO_handle_pio_awaiting_completion,
+    HVMIO_handle_vmport_awaiting_completion,
      HVMIO_completed
  };
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index 5b5fedf..7d5ba58 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -35,6 +35,7 @@
  #define IOREQ_TYPE_PIO          0 /* pio */
  #define IOREQ_TYPE_COPY         1 /* mmio ops */
  #define IOREQ_TYPE_PCI_CONFIG   2
+#define IOREQ_TYPE_VMWARE_PORT  3
  #define IOREQ_TYPE_TIMEOFFSET   7
  #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
@@ -48,6 +49,8 @@
   *
   * 63....48|47..40|39..35|34..32|31........0
   * SEGMENT |BUS   |DEV   |FN    |OFFSET
+ *
+ * For I/O type IOREQ_TYPE_VMWARE_PORT use the vmware_ioreq.
   */
  struct ioreq {
      uint64_t addr;          /* physical address */
@@ -66,6 +69,22 @@ struct ioreq {
  };
  typedef struct ioreq ioreq_t;
+struct vmware_ioreq {
+    uint32_t esi;
+    uint32_t edi;
+    uint32_t eax;
+    uint32_t ebx;
+    uint32_t ecx;
+    uint32_t edx;
+    uint32_t vp_eport;      /* evtchn for notifications to/from device model */
+    uint16_t addr;
+    uint8_t  state:4;
+    uint8_t  dir:1;         /* 1=read, 0=write */
+    uint8_t  size:3;
+    uint8_t  type;          /* I/O type */
+};
+typedef struct vmware_ioreq vmware_ioreq_t;
+
  struct shared_iopage {
      struct ioreq vcpu_ioreq[1];
  };



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.