[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



>>> On 02.11.17 at 20:46, <igor.druzhinin@xxxxxxxxxx> wrote:
>> Any ideas about the root cause of the fault and suggestions how to reproduce 
>> it
>> would be welcome. Does this crash really has something to do with PML? I 
>> doubt
>> because the original environment may hardly be called PML-heavy.

Well, PML-heaviness doesn't matter. It's the mere fact that PML
is enabled on the vCPU being destroyed.

> So we finally have complete understanding of what's going on:
> 
> Some vCPU has just migrated to another pCPU and we switched to idle but
> per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is
> how the current logic works. While we're in idle we're issuing
> vcpu_destroy() for some other domain which eventually calls
> vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At
> this moment we get a TLB flush IPI from that same vCPU which is now
> context switching on another pCPU - it appears to clean TLB after
> itself. This vCPU is already marked is_running=1 by the scheduler. In
> the IPI handler we enter __sync_local_execstate() and trying to call
> vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call
> vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE
> crashes the hypervisor.
> 
> So the state transition diagram might look like:
> pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks ->

I'm not really clear about who/what is "idle" here: pCPU1,
pCPU2, or yet something else? If vCPUx migrated to pCPU2,
wouldn't it be put back into runnable state right away, and
hence pCPU2 can't be idle at this point? Yet for pCPU1 I don't
think its idleness would matter much, i.e. the situation could
also arise without it becoming idle afaics. pCPU1 making it
anywhere softirqs are being processed would suffice.

> vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear()
> pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush
> pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH!
> 
> We can basically just fix the condition around vmcs_reload() call but
> I'm not completely sure that it's the right way to do - I don't think
> leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea
> (maybe we need to clean it). What are your thoughts?

per_cpu(curr_vcpu) can only validly be written inside
__context_switch(), hence the only way to achieve this would
be to force __context_switch() to be called earlier than out of
the TLB flush IPI handler, perhaps like in the (untested!) patch
below. Two questions then remain:
- Should we perhaps rather do this in an arch-independent way
  (i.e. ahead of the call to vcpu_destroy() in common code)?
- This deals with only a special case of the more general "TLB
  flush behind the back of a vmx_vmcs_enter() /
  vmx_vmcs_exit() section" - does this need dealing with in a
  more general way? Here I'm thinking of introducing a
  FLUSH_STATE flag to be passed to flush_mask() instead of
  the current flush_tlb_mask() in context_switch() and
  sync_vcpu_execstate(). This could at the same time be used
  for a small performance optimization: At least for HAP vCPU-s
  I don't think we really need the TLB part of the flushes here.

Jan

--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
 
 void vcpu_destroy(struct vcpu *v)
 {
+    /*
+     * Flush all state for this vCPU before fully tearing it down. This is
+     * particularly important for HVM ones on VMX, so that this flushing of
+     * state won't happen from the TLB flush IPI handler behind the back of
+     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
+     */
+    sync_vcpu_execstate(v);
+
     xfree(v->arch.vm_event);
     v->arch.vm_event = NULL;
 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.