On 9/20/22 10:54 AM, Jan Beulich wrote:
> On 20.09.2022 16:26, Boris Ostrovsky wrote:
>> On 9/20/22 4:01 AM, Jan Beulich wrote:
>>> On 20.09.2022 00:42, Boris Ostrovsky wrote:
>>>> It is saving vpmu data from current pcpu's MSRs for a remote vcpu so @v
>>>> in vmx_find_msr() is not @current:
>>>>
>>>> vpmu_load()
>>>> ...
>>>> prev = per_cpu(last_vcpu, pcpu);
>>>> vpmu_save_force(prev)
>>>> core2_vpmu_save()
>>>> __core2_vpmu_save()
>>>> vmx_read_guest_msr()
>>>> vmx_find_msr()
>>>>
>>>>
>>>> The call to vmx_find_msr() was introduced by 755087eb9b10c. I wonder though whether
>>>> this call is needed when code path above is executed (i.e. when we are saving
>>>> remove vcpu)
>>> How could it not be needed? We need to obtain the guest value. The
>>> thing I don't understand is why this forced saving is necessary,
>>> when context_switch() unconditionally calls vpmu_switch_from().
>>
>> IIRC the logic is:
>>
>> 1. vcpuA runs on pcpu0
>> 2. vcpuA is de-scheduled and is selected to run on pcpu1. It has not yet called vpmu_load() from pcpu1
> The calling of vpmu_load() shouldn't matter here. What does matter is
> that vpmu_save() was necessarily called already.
I thought we don't always save MSRs on context switch because we optimize for case when the same vcpu gets scheduled again. But I am not sure I see this now that I am looking at the code.
I see context_switch calling vpmu_save_from before __context_switch, but if that did actually save the vPMU state it would have reset VPMU_CONTEXT_LOADED, so by the time vpmu_load calls vpmu_save_force it would have just bailed before hitting the ASSERT failure in vmx_find_msrs. The context must still be loaded at that point (I'm trying to verify right now by adding some printks).
OK, Boris was correct above, MSRs are not saved on context switch automatically because of that optimization. VPMU_CONTEXT_SAVE isn't set, so the only thing vpmu_switch_from does is set global control to 0 in case it was a PV vCPU (see core2_vpmu_save checks for both VPMU_CONTEXT_SAVE and VPMU_CONTEXT_LOADED) and vpmu_switch_from doesn't set VPMU_CONTEXT_SAVE. So for HVM vCPUs it does nothing, that's why we still see the context still loaded when we get to vpmu_load.
It would be a simple enough change to make vpmu_switch_from always save the context and then we could get rid of vpmu_load trying to do it later and getting into that ASSERT failure. Thoughts?
Tamas