|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down
On Sat, 27 Feb 2021, Julien Grall wrote:
> (+ Dario and George)
>
> Hi Stefano,
>
> I have added Dario and George to get some inputs from the scheduling part.
>
> On 27/02/2021 01:58, Stefano Stabellini wrote:
> > On Fri, 26 Feb 2021, Julien Grall wrote:
> > > From: Julien Grall <jgrall@xxxxxxxxxx>
> > >
> > > A vCPU can get scheduled as soon as _VPF_down is cleared. As there is
> > > currently not ordering guarantee in arch_set_info_guest(), it may be
> > > possible that flag can be observed cleared before the new values of vCPU
> > > registers are observed.
> > >
> > > Add an smp_mb() before the flag is cleared to prevent re-ordering.
> > >
> > > Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx>
> > >
> > > ---
> > >
> > > Barriers should work in pair. However, I am not entirely sure whether to
> > > put the other half. Maybe at the beginning of context_switch_to()?
> >
> > It should be right after VGCF_online is set or cleared, right?
>
> vcpu_guest_context_t is variable allocated on the heap just for the purpose of
> this call. So an ordering with VGFC_online is not going to do anything.
>
> > So it
> > would be:
> >
> > xen/arch/arm/domctl.c:arch_get_info_guest
> > xen/arch/arm/vpsci.c:do_common_cpu_on
> >
> > But I think it is impossible that either of them get called at the same
> > time as arch_set_info_guest, which makes me wonder if we actually need
> > the barrier...
> arch_get_info_guest() is called without the domain lock held and I can't see
> any other lock that could prevent it to be called in // of
> arch_set_info_guest().
>
> So you could technically get corrupted information from
> XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb() before
> writing to v->is_initialised. The corresponding read barrier would be in
> vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate().
>
> But this is not the issue I was originally trying to solve. Currently,
> do_common_cpu_on() will roughly do:
>
> 1) domain_lock(d)
>
> 2) v->arch.sctlr = ...
> v->arch.ttbr0 = ...
>
> 3) clear_bit(_VFP_down, &v->pause_flags);
>
> 4) domain_unlock(d)
>
> 5) vcpu_wake(v);
>
> If we had only one pCPU on the system, then we would only wake the vCPU in
> step 5. We would be fine in this situation. But that's not the interesting
> case.
>
> If you add a second pCPU in the story, it may be possible to have vcpu_wake()
> happening in // (see more below). As there is no memory barrier, step 3 may be
> observed before 2. So, assuming the vcpu is runnable, we could start to
> schedule a vCPU before any update to the registers (step 2) are observed.
>
> This means that when context_switch_to() is called, we may end up to restore
> some old values.
>
> Now the question is can vcpu_wake() be called in // from another pCPU? AFAICT,
> it would be only called if a given flag in v->pause_flags is cleared (e.g.
> _VFP_blocked). But can we rely on that?
>
> Even if we can rely on it, v->pause_flags has other flags in it. I couldn't
> rule out that _VPF_down cannot be set at the same time as the other _VPF_*.
>
> Therefore, I think a barrier is necessary to ensure the ordering.
>
> Do you agree with this analysis?
Yes, I think this makes sense. The corresponding barrier in the
scheduling code would have to be after reading _VPF_down and before
reading v->arch.sctlr, etc.
> > > The issues described here is also quite theoritical because there are
> > > hundreds of instructions executed between the time a vCPU is seen
> > > runnable and scheduled. But better be safe than sorry :).
> > > ---
> > > xen/arch/arm/domain.c | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > > index bdd3d3e5b5d5..2b705e66be81 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -914,7 +914,14 @@ int arch_set_info_guest(
> > > v->is_initialised = 1;
> > > if ( ctxt->flags & VGCF_online )
> > > + {
> > > + /*
> > > + * The vCPU can be scheduled as soon as _VPF_down is cleared.
> > > + * So clear the bit *after* the context was loaded.
> > > + */
> > > + smp_mb();
>
> From the discussion above, I would move this barrier before v->is_initialised.
> So we also take care of the issue with arch_get_info_guest().
>
> This barrier also can be reduced to a smp_wmb() as we only expect an ordering
> between writes.
>
> The barrier would be paired with the barrier in:
> - sync_vcpu_execstate() in the case of arch_get_vcpu_info_guest().
> - context_switch_to() in the case of scheduling (The exact barrier is TDB).
OK, this makes sense, but why before:
v->is_initialised = 1;
instead of right after it? It is just v->pause_flags we care about,
right?
> > > clear_bit(_VPF_down, &v->pause_flags);
> > > + }
> > > else
> > > set_bit(_VPF_down, &v->pause_flags);
> > >
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |