[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] VMX: fix VMCS race on context-switch paths
On Wed, 2017-02-15 at 06:03 -0700, Jan Beulich wrote: > > > > On 15.02.17 at 12:55, <JBeulich@xxxxxxxx> wrote: > > > > > On 15.02.17 at 12:48, <sergey.dyasli@xxxxxxxxxx> wrote: > > > > > > On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: > > > > > > > On 15.02.17 at 11:27, <sergey.dyasli@xxxxxxxxxx> wrote: > > > > > > > > > > (XEN) [ 1408.075638] Xen call trace: > > > > > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 > > > > > (XEN) [ 1408.086303] [<ffff82d08016c58d>] > > > > > context_switch+0x85d/0xeb0 > > > > > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] > > > > > schedule.c#schedule+0x46e/0x7d0 > > > > > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 > > > > > (XEN) [ 1408.107925] [<ffff82d080136400>] > > > > timer.c#timer_softirq_action+0x90/0x210 > > > > > (XEN) [ 1408.116263] [<ffff82d08013311c>] > > > > > softirq.c#__do_softirq+0x5c/0x90 > > > > > (XEN) [ 1408.123921] [<ffff82d080167d35>] > > > > > domain.c#idle_loop+0x25/0x60 > > > > > > > > Taking your later reply into account - were you able to figure out > > > > what other party held onto the VMCS being waited for here? > > > > > > Unfortunately, no. It was unclear from debug logs. But judging from > > > the following vmx_do_resume() code: > > > > > > if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) > > > { > > > if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) > > > vmx_load_vmcs(v); > > > } > > > > > > If both of the above conditions are true then vmx_vmcs_reload() will > > > probably hang. > > > > I don't follow (reload should run before this, not after), but I must > > be missing something more general anyway, as I'm seeing the code > > above being needed despite the reload additions. > > I think I've understood part of it over lunch: Surprisingly enough > vmx_ctxt_switch_to() doesn't re-establish the VMCS, so it needs > to be done here. Which I think means we don't need the new > hook at all, as that way the state is no different between going > through ->to() or bypassing it. > > What I continue to not understand is why vmcs_pa would ever > not match current_vmcs when active_cpu is smp_processor_id(). > So far I thought both are always updated together. Looking > further ... This is exactly what will happen should the 3.1 occur: 1. HVM vCPU#1 --> idle vCPU context_switch 2. softirq --> vmx_vmcs_enter() + vmx_vmcs_exit() for a remote vCPU [scenario with PML] This will switch current_vmcs to a remote one. has_hvm_container_vcpu(current) will be false and vmcs will not be reloaded. 3.1. idle vCPU --> HVM vCPU#1 (same) context_switch vmx_do_resume v->arch.hvm_vmx.active_cpu == smp_processor_id() v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) 3.2 idle vCPU --> HVM vCPU#2 (different) __context_switch() vmwrite BUG() This is the original BUG() scenario which my patch addresses. -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |