[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
>>> On 25.06.18 at 14:12, <rcojocaru@xxxxxxxxxxxxxxx> wrote: > On 06/22/2018 07:55 PM, Razvan Cojocaru wrote: >> On 06/22/2018 06:28 PM, Jan Beulich wrote: >>>>>> On 13.06.18 at 10:52, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>>> @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) >>>> } >>>> } >>>> >>>> - if ( idx != vcpu_altp2m(v).p2midx ) >>>> + if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) >>>> { >>>> BUG_ON(idx >= MAX_ALTP2M); >>> >>> In the code immediately ahead of this there is an INVALID_ALTP2M check >>> already (in the else branch). If the __vmread() can legitimately produce >>> this value, why would the domain be crashed when getting back >>> INVALID_ALTP2M in the other case? I think the correctness of your change >>> can only be judged once both code paths behave consistently. >> >> You're right, I had somehow convinced myself that this is a #VE-specific >> problem, but it looks like a generic altp2m problem. I'll simulate the >> other branch in the code and see what it does with my small test >> application. > > After a bit of debugging, the issue explained in full seems to be this > (it indeed appears to be #VE-specific, as initially assumed): client > application calls xc_altp2m_set_domain_state(xci, domid, 1), followed by > xc_altp2m_set_vcpu_enable_notify() (with a suitable gfn), followed by > xc_altp2m_set_domain_state(xci, domid, 0). > > This causes Xen to go through the following steps: > > 1. altp2m_vcpu_initialise() (calls altp2m_vcpu_reset()). > 2. HVMOP_altp2m_vcpu_enable_notify -> vmx_vcpu_update_vmfunc_ve(). > 3. altp2m_vcpu_destroy() (calls altp2m_vcpu_reset() and (indirectly) > vmx_vcpu_update_eptp()). > 4. Still part of the altp2m_vcpu_destroy() workflow, > altp2m_vcpu_update_vmfunc_ve(v) gets called. > > At step 2, vmx_vcpu_update_vmfunc_ve() modifies > v->arch.hvm_vmx.secondary_exec_control (from 0x1054eb to 0x1474eb - > which has the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit set). > > At step 3, altp2m_vcpu_reset() sets av->p2midx = INVALID_ALTP2M, then > vmx_vcpu_update_eptp() sees that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS > is set, and as a consequence calls __vmwrite(EPTP_INDEX, > vcpu_altp2m(v).p2midx). > > Now, at step 4 the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit should now > become 0, because altp2m_vcpu_reset() has set veinfo_gfn to INVALID_GFN. > But _sometimes_, what happens is that _between_ steps 3 and 4 a > vmx_vmexit_handler() occurs, which __vmread()s EPTP_INDEX (on the logic > branch I've tried to fix), compares it to MAX_ALTP2M and then proceeds > to BUG_ON(), bringing the hypervisor down. Thanks for the detailed analysis. With that I wonder whether it is reasonable for a VM exit to occur in parallel with the processing of altp2m_vcpu_destroy(). Shouldn't a domain (or vCPU) undergoing such a mode change be paused? I also remain unconvinced that a similar race is entirely impossible in the non-#VE case. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |