[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 02/18] x86/domain: limit window where curr_vcpu != current on context switch



On Wed, Jan 08, 2025 at 04:26:46PM +0000, Alejandro Vallejo wrote:
> This is a net gain even without ASI. Having "current" hold the previous vCPU 
> on
> __context_switch() makes it _a lot_ easier to follow the lazy switch path.
> 
> On Wed Jan 8, 2025 at 2:26 PM GMT, Roger Pau Monne wrote:
> > On x86 Xen will perform lazy context switches to the idle vCPU, where the
> > previously running vCPU context is not overwritten, and only current is 
> > updated
> > to point to the idle vCPU.  The state is then disjunct between current and
> > curr_vcpu: current points to the idle vCPU, while curr_vcpu points to the 
> > vCPU
> > whose context is loaded on the pCPU.
> >
> > While on that lazy context switched state, certain calls (like
> > map_domain_page()) will trigger a full synchronization of the pCPU state by
> > forcing a context switch.  Note however how calling any of such functions
> > inside the context switch code itself is very likely to trigger an infinite
> > recursion loop.
> >
> > Attempt to limit the window where curr_vcpu != current in the context switch
> > code, as to prevent and infinite recursion loop around 
> > sync_local_execstate().
> >
> > This is required for using map_domain_page() in the vCPU context switch 
> > code,
> > otherwise using map_domain_page() in that context ends up in a recursive
> > sync_local_execstate() loop:
> >
> > map_domain_page() -> sync_local_execstate() -> map_domain_page() -> ...
> 
> More generally, it's worth mentioning that we want to establish an invariant
> between a per-cpu variable (curr_vcpu) and the currently running page tables.
> That way it can be used as discriminant to know which are the currently active
> per-vCPU mappings.

You kind of already do this by checking curr_vcpu, as with this
changes there's still a window where the vCPU is lazy context
switched, and hence current != curr_vcpu (and curr_vcpu should signal
what page-tables are loaded).

The main point apart from more accurate signaling of the loaded
page-tables is to avoid infinite recursion if sync_local_execstate()
is called inside the context switch path.

> That's essential for implementing FPU hiding as proposed here:
> 
>   
> https://lore.kernel.org/xen-devel/20241105143310.28301-1-alejandro.vallejo@xxxxxxxxx/
> 
> A shorter form of that should probably be mentioned also...
> 
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v1:
> >  - New in this version.
> > ---
> >  xen/arch/x86/domain.c | 58 +++++++++++++++++++++++++++++++++++--------
> >  xen/arch/x86/traps.c  |  2 --
> >  2 files changed, 48 insertions(+), 12 deletions(-)
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index 78a13e6812c9..1f680bf176ee 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -1982,16 +1982,16 @@ static void load_default_gdt(unsigned int cpu)
> >      per_cpu(full_gdt_loaded, cpu) = false;
> >  }
> >  
> > -static void __context_switch(void)
> > +static void __context_switch(struct vcpu *n)
> >  {
> >      struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
> >      unsigned int          cpu = smp_processor_id();
> >      struct vcpu          *p = per_cpu(curr_vcpu, cpu);
> > -    struct vcpu          *n = current;
> >      struct domain        *pd = p->domain, *nd = n->domain;
> >  
> >      ASSERT(p != n);
> >      ASSERT(!vcpu_cpu_dirty(n));
> > +    ASSERT(p == current);
> >  
> >      if ( !is_idle_domain(pd) )
> >      {
> > @@ -2036,6 +2036,18 @@ static void __context_switch(void)
> >  
> >      write_ptbase(n);
> >  
> > +    /*
> > +     * It's relevant to set both current and curr_vcpu back-to-back, to 
> > avoid a
> > +     * window where calls to mapcache_current_vcpu() during the context 
> > switch
> > +     * could trigger a recursive loop.
> > +     *
> > +     * Do the current switch immediately after switching to the new guest
> > +     * page-tables, so that current is (almost) always in sync with the
> > +     * currently loaded page-tables.
> > +     */
> > +    set_current(n);
> > +    per_cpu(curr_vcpu, cpu) = n;
> 
> ... here. So we're not tempted to move these 2 far off from write_ptbase().

I think the "Do the current switch immediately after switching to the
new guest page-tables" sentence already signals that it's important to
keep the setting of current and curr_vcpu as close to the
write_ptbase() call as possible, but I'm open to suggestions for
better wording.

> > +
> >  #ifdef CONFIG_PV
> >      /* Prefetch the VMCB if we expect to use it later in the context 
> > switch */
> >      if ( using_svm() && is_pv_64bit_domain(nd) && !is_idle_domain(nd) )
> > @@ -2048,8 +2060,6 @@ static void __context_switch(void)
> >      if ( pd != nd )
> >          cpumask_clear_cpu(cpu, pd->dirty_cpumask);
> >      write_atomic(&p->dirty_cpu, VCPU_CPU_CLEAN);
> > -
> > -    per_cpu(curr_vcpu, cpu) = n;
> >  }
> >  
> >  void context_switch(struct vcpu *prev, struct vcpu *next)
> > @@ -2081,16 +2091,36 @@ void context_switch(struct vcpu *prev, struct vcpu 
> > *next)
> >  
> >      local_irq_disable();
> >  
> > -    set_current(next);
> > -
> >      if ( (per_cpu(curr_vcpu, cpu) == next) ||
> >           (is_idle_domain(nextd) && cpu_online(cpu)) )
> >      {
> > +        /*
> > +         * Lazy context switch to the idle vCPU, set current == idle.  Full
> > +         * context switch happens if/when sync_local_execstate() is called.
> > +         */
> > +        set_current(next);
> >          local_irq_enable();
> >      }
> >      else
> >      {
> > -        __context_switch();
> > +        /*
> > +         * curr_vcpu will always point to the currently loaded vCPU 
> > context, as
> 
> nit: s/will always point/always points/ ? It's an inconditional invariant,
> after all.

Sure.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.