[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] Add limited support of VMware's hyper-call



On 09/03/14 04:25, Jan Beulich wrote:
On 03.09.14 at 02:55, <dslutz@xxxxxxxxxxx> wrote:
On 09/02/14 04:16, Jan Beulich wrote:
As I think was said on v1 already - this should be split into smaller
pieces if at all possible. I'm therefore not going to do a full review
at this time, just give a couple of remarks.
The last time (v1's) split was much worse (and so I think that "split
into smaller"
was not said). I could check be most were "combine this patch with that
patch";
and more comment message.

Having been over this code many times in the last month, a possible
quick split (which would not delay v3 too long) would be:

1) Add simple vmport commands like BDOOR_CMD_GETVERSION
2) Add the hypervisor side of VMware guest info.
3) Add the hvm params that come from VMware guest info.
4) Add the hyper calls for libxc to handle VMware guest info.
One thing I'd certainly like to see split out is the save/restore
code. And the second - general - consideration would be to
split hypervisor from tools side changes as much as possible.

I will take that into account.

And should I add the unit tests also (as optional)?
No idea - that's a tools side thing iirc, and hence a question to the
tools maintainers.

Will see if I get a response before v3 code changes have been tested
and are ready to post.



And in any event I'm rather uncomfortable about this getting
enabled unconditionally, also due to (but not limited to this) the
up front allocation of various memory blocks that may end up
never being needed. The main issue I see with this approach is
that guests could end up being misguided into thinking they're
running on VMware when in fact they have code in place to
optimize when running on Xen. We had such detection ordering
issues with Linux checking for both Hyper-V and Xen.

Ok, I do feel strongly that this would need to be independent on
vmware_hw "version". And so I will add a new xl.cfg item to control this.

QEMU on 5/19/2014 added a way to turn it's version off:

+@item vmport=on|off
+Enables emulation of VMWare IO port, for vmmouse etc. (enabled by default)

So should I name it vmware_vmport or just vmport ( I lean to just vmport )?
and unlike QEMU I am fine with the default of off.
I'd prefer the former as being more explicit (vmport alone doesn't
really provide a connection to VMware), but this again is something
to discuss with the tools maintainers.


Ok.

@@ -1532,6 +1561,17 @@ int hvm_domain_initialise(struct domain *d)
       stdvga_deinit(d);
       vioapic_deinit(d);
    fail1:
+    if ( d->arch.hvm_domain.vmport_data )
+    {
+        struct vmport_state *vs = d->arch.hvm_domain.vmport_data;
+        int idx;
+
+        for (idx = 0; idx < vs->used_guestinfo; idx++)
You'll have to go through and fix coding style issues.
Do to the many coding styles I have used in the last 20 years, I can have
style "blindness". I do not find a style issue here based on CODING_STYLE.
All hints are welcome.
Just look at the difference between you if() and your for() - the
latter is lacking blanks inside the parentheses. And that difference
appeared to be a pretty consistent thing throughout the code I
looked a little more closely at.

like I said I do not see the difference until pointed out.  Thank you
for this.  Will fix all my "for"(s) to this style.


@@ -106,6 +107,7 @@ MAKE_INSTR(VMSAVE, 3, 0x0f, 0x01, 0xdb);
   MAKE_INSTR(STGI,   3, 0x0f, 0x01, 0xdc);
   MAKE_INSTR(CLGI,   3, 0x0f, 0x01, 0xdd);
   MAKE_INSTR(INVLPGA,3, 0x0f, 0x01, 0xdf);
+MAKE_INSTR(IN,     1, 0xed);
This name is ambiguous.
There are 4 opcodes and 6 instructions. Not at all sure how to name
the one I need. Here is the info about the IN instruction.
I'd name it INL_DX, but one of the secondary problems you
may need to solve is that unlike for other opcodes you do not
want the operand size override ignored here.

Ok, I will go with INL_DX.  So far I have not found any code that has the
operand size prefix.  I expect that since all code is in 32 or 64 bit
segments and it is expected that 32 bits are used, so the form "IN AX,DX"
(which would have the operand size prefix) is not used.

In any case, I will do some checking on what I get for this case.  I will
agree that handling of the operand size prefix is not clear.  And may
not be needed to be solved at this time.


+static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs, struct vcpu
*v)
+{
+    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
+    unsigned long inst_len = __get_instruction_length(v, INSTR_IN);
+
+    regs->error_code = vmcb->exitinfo1;
+    if ( hvm_long_mode_enabled(v) )
+        HVMTRACE_LONG_C4D(TRAP_GP, TRAP_gp_fault, inst_len,
+                          regs->error_code,
+                          TRC_PAR_LONG(vmcb->exitinfo2));
+    else
+        HVMTRACE_C4D(TRAP_GP, TRAP_gp_fault, inst_len,
+                     regs->error_code, vmcb->exitinfo2);
+
+    if ( inst_len <= 1 && (regs->rdx & 0xffff) == VMPORT_PORT &&
What would inst_len being zero mean here?
It use to mean that AMD was missing the feature:


(XEN) [2014-08-14 20:17:28] - Next-RIP Saved on #VMEXIT

and so I needed to look at the opcode.
__get_instruction_length_from_list() returns using the value resulting
from that feature only when the length is non-zero.

Now that I am using __get_instruction_length() I think it could be changed
to a 1. However I can only test on an AMD server that does have this
feature. Will switch to == 1 and hope for the best.

As far as I can tell __get_instruction_length() does not insure that the
opcode
list passed is found, just that it might be.
Right, and zero is an indication that it wasn't found. Also I just
noticed there's a gdprintk() in that event, which for all other
cases the function gets used for is quite fine. #GP, however,
can result from various instructions, and hence you don't want
a message printed each time it wasn't due to "inl %dx, %eax".


Will look into how to fix this.


+         vmcb->exitinfo2 == 0 && regs->error_code == 0 )
+    {
+        uint32_t magic = regs->rax;
+
+        if ( magic == VMPORT_MAGIC )
This ought to be folded with the previous if(), at once reducing
code redundancy further down in the function.
OK. But that does make the debug log message more complex to code.
Maybe, albeit (as hinted at before) I question the use of these
anyway.


Ok.

+        {
+            unsigned char bytes[1] = { 0 };
Pointless initializer (and it being an array looks suspicious too).
Will drop the initializer. hvm_fetch_from_guest_virt_nofault() does take
an array.
So it is better to say "&tmp, regs->rip, 1" then "bytes, regs->rip, 1" ?
Yes, I think so.
Fine, will change to that.

Also by dropping the initializer, you are stating that
hvm_fetch_from_guest_virt_nofault() does initialize it's output on error (
because the debug log output does include bytes[0]).
Existing debug log output, or one you add? It's a mistake in any case,
I'm just trying to understand whether a fix might be one order
independent of your patch.

It was in code I added.

   Speaking of which - why can't you just define
a new DBG_LEVEL_VMPORT and use the existing HVM_DBG_LOG()?
I did define 23 bits to control various part of the debug. A lot of this
is because there are many times you go through the code. Basicly
a string is 1st passed a length, and the 4 bytes at a time until the
length is handled, followed by state changes...
Which again raises the question of the value of all this debugging
output. Whether or not a niche feature, I don't think it should be
significantly more verbose than the base code we have.

I will spend some time to make this have less options.

    -Don Slutz

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.