 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG
 The real CMPXCHG hvmemul_cmpxchg() implementation works as expected as
far the race conditions go, and returning RETRY for failed writebacks
seems to work without issue for regular Xen emulation.
However, when the patch is used with introspection, I've had a BCCode:
101 BSOD and rare (but several) occasions when the guest becomes
unresponsive (can't close Firefox or have the Windows start menu show up
when clicking the "Start" button, but the guest is otherwise running).
This I believe is due to the do { } while ( rc == X86EMUL_RETRY ); loop
in hvm_emulate_one_vm_event(): I am basically now looping behing the
guest's back until the CMPXCHG succeeds, which can theoretically be a
very long time to execute a CMPXCHG instruction, and most likely not
what the guest OS is expecting.
The alternative (and the current default) is to do nothing on
X86EMUL_RETRY and just allow the guest to re-enter in the same place,
which should trigger the same page fault vm_event, which can hopefully
now be able to emulate the current instruction. However, in the best
case scenario, this just complicates the above loop since the current
instruction will still be unable to complete until emulation succeeds
but this time with VMEXITs. And in the worst case scenario (which is
what happens in my tests) this adds an additional factor of
unpredictability, since the guest quickly BSODs (or rarely just plain
hangs).
I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a
failed CMPXCHG should happen just once, with the proper registers and ZF
set. The guest surely expects neither that the instruction resume until
it succeeds, nor that some hidden loop goes on for an undeterminate
ammount of time until a CMPXCHG succeeds.
The picture is further complicated by the two-part handling of
instructions in x86_emulate(). For example, for CMPXCHG we have:
case X86EMUL_OPC(0x0f, 0xb0): case X86EMUL_OPC(0x0f, 0xb1): /* cmpxchg */
    /* Save real source value, then compare EAX against destination. */
    src.orig_val = src.val;
    src.val = _regs.r(ax);
    /* cmp: %%eax - dst ==> dst and src swapped for macro invocation */
    emulate_2op_SrcV("cmp", dst, src, _regs.eflags);
    if ( _regs.eflags & X86_EFLAGS_ZF )
    {
        /* Success: write back to memory. */
        dst.val = src.orig_val;
    }
    else
    {
        /* Failure: write the value we saw to EAX. */
        dst.type = OP_REG;
        dst.reg  = (unsigned long *)&_regs.r(ax);
    }
    break;
This is the only part that sets the proper registers and ZF, and it's
only the comparison. If this succeeds, x86_emulate() currently assumes
that the writeback part cannot fail. But then the writeback part is:
case OP_MEM:
    if ( !(d & Mov) && (dst.orig_val == dst.val) &&
         !ctxt->force_writeback )
         /* nothing to do */;
    else if ( lock_prefix )
    {
        fail_if(!ops->cmpxchg);
        rc = ops->cmpxchg(
            dst.mem.seg, dst.mem.off, &dst.orig_val,
             &dst.val, dst.bytes, ctxt);
    }
    else
    {
        fail_if(!ops->write);
        rc = ops->write(dst.mem.seg, dst.mem.off,
                        !state->simd_size ? &dst.val : (void *)mmvalp,
                        dst.bytes, ctxt);
        if ( sfence )
            asm volatile ( "sfence" ::: "memory" );
    }
    if ( rc != 0 )
        goto done;
default:
    break;
I now see that this is why using a spinlock only around the writeback
part did not solve the issue: both the compare part and the writeback
part need to be part of the same atomic operation - lock needs to be
aquired before the cmp / ZF part.
Opinions and suggestions are appreciated. If I'm not mistaken, it looks
like the smp_lock design is the better solution to the problem.
Thanks,
Razvan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |