Xen project Mailing List

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down

On Sat, 27 Feb 2021, Julien Grall wrote: > (+ Dario and George) > > Hi Stefano, > > I have added Dario and George to get some inputs from the scheduling part. > > On 27/02/2021 01:58, Stefano Stabellini wrote: > > On Fri, 26 Feb 2021, Julien Grall wrote: > > > From: Julien Grall <jgrall@xxxxxxxxxx> > > > > > > A vCPU can get scheduled as soon as _VPF_down is cleared. As there is > > > currently not ordering guarantee in arch_set_info_guest(), it may be > > > possible that flag can be observed cleared before the new values of vCPU > > > registers are observed. > > > > > > Add an smp_mb() before the flag is cleared to prevent re-ordering. > > > > > > Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx> > > > > > > --- > > > > > > Barriers should work in pair. However, I am not entirely sure whether to > > > put the other half. Maybe at the beginning of context_switch_to()? > > > > It should be right after VGCF_online is set or cleared, right? > > vcpu_guest_context_t is variable allocated on the heap just for the purpose of > this call. So an ordering with VGFC_online is not going to do anything. > > > So it > > would be: > > > > xen/arch/arm/domctl.c:arch_get_info_guest > > xen/arch/arm/vpsci.c:do_common_cpu_on > > > > But I think it is impossible that either of them get called at the same > > time as arch_set_info_guest, which makes me wonder if we actually need > > the barrier... > arch_get_info_guest() is called without the domain lock held and I can't see > any other lock that could prevent it to be called in // of > arch_set_info_guest(). > > So you could technically get corrupted information from > XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb() before > writing to v->is_initialised. The corresponding read barrier would be in > vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate(). > > But this is not the issue I was originally trying to solve. Currently, > do_common_cpu_on() will roughly do: > > 1) domain_lock(d) > > 2) v->arch.sctlr = ... > v->arch.ttbr0 = ... > > 3) clear_bit(_VFP_down, &v->pause_flags); > > 4) domain_unlock(d) > > 5) vcpu_wake(v); > > If we had only one pCPU on the system, then we would only wake the vCPU in > step 5. We would be fine in this situation. But that's not the interesting > case. > > If you add a second pCPU in the story, it may be possible to have vcpu_wake() > happening in // (see more below). As there is no memory barrier, step 3 may be > observed before 2. So, assuming the vcpu is runnable, we could start to > schedule a vCPU before any update to the registers (step 2) are observed. > > This means that when context_switch_to() is called, we may end up to restore > some old values. > > Now the question is can vcpu_wake() be called in // from another pCPU? AFAICT, > it would be only called if a given flag in v->pause_flags is cleared (e.g. > _VFP_blocked). But can we rely on that? > > Even if we can rely on it, v->pause_flags has other flags in it. I couldn't > rule out that _VPF_down cannot be set at the same time as the other _VPF_*. > > Therefore, I think a barrier is necessary to ensure the ordering. > > Do you agree with this analysis? Yes, I think this makes sense. The corresponding barrier in the scheduling code would have to be after reading _VPF_down and before reading v->arch.sctlr, etc. > > > The issues described here is also quite theoritical because there are > > > hundreds of instructions executed between the time a vCPU is seen > > > runnable and scheduled. But better be safe than sorry :). > > > --- > > > xen/arch/arm/domain.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > > index bdd3d3e5b5d5..2b705e66be81 100644 > > > --- a/xen/arch/arm/domain.c > > > +++ b/xen/arch/arm/domain.c > > > @@ -914,7 +914,14 @@ int arch_set_info_guest( > > > v->is_initialised = 1; > > > if ( ctxt->flags & VGCF_online ) > > > + { > > > + /* > > > + * The vCPU can be scheduled as soon as _VPF_down is cleared. > > > + * So clear the bit *after* the context was loaded. > > > + */ > > > + smp_mb(); > > From the discussion above, I would move this barrier before v->is_initialised. > So we also take care of the issue with arch_get_info_guest(). > > This barrier also can be reduced to a smp_wmb() as we only expect an ordering > between writes. > > The barrier would be paired with the barrier in: > - sync_vcpu_execstate() in the case of arch_get_vcpu_info_guest(). > - context_switch_to() in the case of scheduling (The exact barrier is TDB). OK, this makes sense, but why before: v->is_initialised = 1; instead of right after it? It is just v->pause_flags we care about, right? > > > clear_bit(_VPF_down, &v->pause_flags); > > > + } > > > else > > > set_bit(_VPF_down, &v->pause_flags); > > >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.