[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load
On 19.09.2022 14:25, Tamas K Lengyel wrote: > On Mon, Sep 19, 2022 at 5:28 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: >> >> On 16.09.2022 23:35, Boris Ostrovsky wrote: >>> >>> On 9/16/22 8:52 AM, Jan Beulich wrote: >>>> On 15.09.2022 16:01, Tamas K Lengyel wrote: >>>>> While experimenting with the vPMU subsystem an ASSERT failure was >>>>> observed in vmx_find_msr because the vcpu_runnable state was true. >>>>> >>>>> The root cause of the bug appears to be the fact that the vPMU subsystem >>>>> doesn't save its state on context_switch. For the further reply below - is this actually true? What is the vpmu_switch_from() (resolving to vpmu_save()) doing then early in context_switch()? >>>>> The vpmu_load function will attempt >>>>> to gather the PMU state if its still loaded two different ways: >>>>> 1. if the current pcpu is not where the vcpu ran before doing a >>>>> remote save >>>>> 2. if the current pcpu had another vcpu active before doing a local >>>>> save >>>>> >>>>> However, in case the prev vcpu is being rescheduled on another pcpu its >>>>> state >>>>> has already changed and vcpu_runnable is returning true, thus #2 will >>>>> trip the >>>>> ASSERT. The only way to avoid this race condition is to make sure the >>>>> prev vcpu is paused while being checked and its context saved. Once the >>>>> prev >>>>> vcpu is resumed and does #1 it will find its state already saved. >>>> While I consider this explanation plausible, I'm worried: >>>> >>>>> --- a/xen/arch/x86/cpu/vpmu.c >>>>> +++ b/xen/arch/x86/cpu/vpmu.c >>>>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest) >>>>> vpmu = vcpu_vpmu(prev); >>>>> >>>>> /* Someone ran here before us */ >>>>> + vcpu_pause(prev); >>>>> vpmu_save_force(prev); >>>>> vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); >>>>> + vcpu_unpause(prev); >>>>> >>>>> vpmu = vcpu_vpmu(v); >>>>> } >>>> We're running with IRQs off here, yet vcpu_pause() waits for the vcpu >>>> to actually be de-scheduled. Even with IRQs on this is already a >>>> relatively heavy operation (also including its impact on the remote >>>> side). Additionally the function is called from context_switch(), and >>>> I'm unsure of the usability of vcpu_pause() on such a path. In >>>> particular: Is there a risk of two CPUs doing this mutually to one >>>> another? If so, is deadlocking excluded? >>>> >>>> Hence at the very least I think the description wants extending, to >>>> discuss the safety of the change. >>>> >>>> Boris - any chance you could comment here? Iirc that's code you did >>>> introduce. >>> >>> >>> Is the assertion in vmx_find_msr() really needs to be for runnable vcpu or >>> can it be a check on whether vcpu is actually running (e.g. >>> RUNSTATE_running)? >> >> You cannot safely check for "running", as "runnable" may transition >> to/from "running" behind your back. > > The more I look at this the more I think the only sensible solution is > to have the vPMU state be saved on vmexit for all vCPUs. Do you really mean vmexit? It would suffice if state was reliably saved during context-switch-out, wouldn't it? At that point the vCPU can't be resumed on another pCPU, yet. > That way all > this having to figure out where and when a context needs saving during > scheduling goes away. Yes, it adds a bit of overhead for cases where > the vCPU will resume on the same pCPU and that context saved could > have been skipped, If you really mean vmexit, then I'm inclined to say that's more than just "a bit of overhead". I'd agree if you really meant context-switch-out, but as said further up it looks to me as if that was already occurring. Apparently I'm overlooking something crucial ... Jan > but it makes it so that the vCPU can be resumed on > any pCPU without having to have evidently fragile checks that may > potentially lead to deadlocks (TBH I don't know if that's a real > concern at the moment because the current setup is very hard to reason > about). We can still keep track if the context needs reloading from > the saved context and skip that if we know the state is still active. > Any objection to that change in light of these issues? > > Tamas
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |