Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/vm_event: fix race between vmx_vmexit_handler() and vm_event_resume()

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 27 Apr 2017 02:52:00 -0600

Cc: tamas@xxxxxxxxxxxxx, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 27 Apr 2017 08:52:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 27.04.17 at 10:34, <andrew.cooper3@xxxxxxxxxx> wrote: > On 27/04/2017 09:01, Jan Beulich wrote: >>>>> On 27.04.17 at 09:22, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>> The introspection agent can reply to a vm_event faster than >>> vmx_vmexit_handler() can complete in some cases, where it is then >>> not safe for vm_event_set_registers() to modify v->arch.user_regs. >> This needs more explanation: I cannot see the connection between >> vmx_vmexit_handler() completing and (un)safety of modification of >> v->arch.user_regs. The latter is unsafe as long as the vCPU hasn't >> gone through __context_switch(), and the former doesn't call that >> function directly or indirectly (i.e. I think it is more than just >> vmx_vmexit_handler() which needs to be completed by the time >> register modification becomes safe to do). > > The test scenario here was to step over an int3 breakpoint by setting > rip += 1. This is a very quick reply and tended to complete before the > vcpu triggering the introspection event had properly paused and been > descheduled. > > If the reply occurs before __context_switch() happens, > __context_switch() clobbers the reply by overwriting v->arch.user_regs > from the stack. If the reply occurs after __context_switch(), but the > pcpu has gone idle and keeps v in context, v->arch.user_regs is not > restored to the stack, and the update is similarly missed. This second case not properly described, I think: v won't be kept in context once we've gone through __context_switch(). If the pCPU goes idle, __context_switch() simply will be deferred until another vCPU gets scheduled onto this pCPU, and be avoided altogether if it's the original vCPU that gets scheduled back onto this pCPU. But I do get the point. I think much if not all of the above (suitably adjusted) needs to go into the commit message. > (There is a > very narrow race condition where both the reply and __context_switch() > are updating v->arch.user_regs, and who knows what will happen, given > our memcpy implementation.) Right, but with the proper serialization of events done now this isn't a problem anymore either, aiui. >>> The patch additionally removes the sync_vcpu_execstate(v) call from >>> vm_event_resume(), which is no longer necessary, which removes the >>> associated broadcast TLB flush (read: performance improvement). >> Depending on the better explanation above, it may or may not be >> further necessary to also better explain the "no longer necessary" >> part here. > > As I understand, it was a previous attempt to fix this bug. > > There is nothing (now) in vm_event_resume() which does anything other > than new updates into v->arch.vm_event and drop the pause reference. > All updated are applied in context, in the return-to-guest path, which > is race free. > > There is no need for the IPI, or to force the target vcpu out of context > if the pcpu is currently idle. I agree, and (as indicated) the better explanation earlier on is probably sufficient then. >>> --- a/xen/common/vm_event.c >>> +++ b/xen/common/vm_event.c >>> @@ -357,6 +357,13 @@ void vm_event_resume(struct domain *d, struct >>> vm_event_domain *ved) >>> { >>> vm_event_response_t rsp; >>> >>> + /* >>> + * vm_event_resume() runs either from XEN_DOMCTL_VM_EVENT_OP_*, or >>> + * EVTCHN_send from the introspection consumer. Both contexts are >>> + * guaranteed not to be the subject of vm_event responses. >>> + */ >>> + ASSERT(d != current->domain); >> What is this meant to guard against? > > Documentation, as much as anything else. It took a long time to > untangle the contexts involved here; I'm not convinced the logic is safe > to run in context, and it certainly doesn't need to. Well, as said - I think it is too strict now: You only need the vCPU not be current afaict, and it really matters here which of the two "current"-s you actually mean (and it looks to me as if you mean the other one, guaranteeing register state to no longer be on the stack). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.