[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 02/18] x86/domain: limit window where curr_vcpu != current on context switch
On Wed, Jan 08, 2025 at 04:26:46PM +0000, Alejandro Vallejo wrote: > This is a net gain even without ASI. Having "current" hold the previous vCPU > on > __context_switch() makes it _a lot_ easier to follow the lazy switch path. > > On Wed Jan 8, 2025 at 2:26 PM GMT, Roger Pau Monne wrote: > > On x86 Xen will perform lazy context switches to the idle vCPU, where the > > previously running vCPU context is not overwritten, and only current is > > updated > > to point to the idle vCPU. The state is then disjunct between current and > > curr_vcpu: current points to the idle vCPU, while curr_vcpu points to the > > vCPU > > whose context is loaded on the pCPU. > > > > While on that lazy context switched state, certain calls (like > > map_domain_page()) will trigger a full synchronization of the pCPU state by > > forcing a context switch. Note however how calling any of such functions > > inside the context switch code itself is very likely to trigger an infinite > > recursion loop. > > > > Attempt to limit the window where curr_vcpu != current in the context switch > > code, as to prevent and infinite recursion loop around > > sync_local_execstate(). > > > > This is required for using map_domain_page() in the vCPU context switch > > code, > > otherwise using map_domain_page() in that context ends up in a recursive > > sync_local_execstate() loop: > > > > map_domain_page() -> sync_local_execstate() -> map_domain_page() -> ... > > More generally, it's worth mentioning that we want to establish an invariant > between a per-cpu variable (curr_vcpu) and the currently running page tables. > That way it can be used as discriminant to know which are the currently active > per-vCPU mappings. You kind of already do this by checking curr_vcpu, as with this changes there's still a window where the vCPU is lazy context switched, and hence current != curr_vcpu (and curr_vcpu should signal what page-tables are loaded). The main point apart from more accurate signaling of the loaded page-tables is to avoid infinite recursion if sync_local_execstate() is called inside the context switch path. > That's essential for implementing FPU hiding as proposed here: > > > https://lore.kernel.org/xen-devel/20241105143310.28301-1-alejandro.vallejo@xxxxxxxxx/ > > A shorter form of that should probably be mentioned also... > > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > Changes since v1: > > - New in this version. > > --- > > xen/arch/x86/domain.c | 58 +++++++++++++++++++++++++++++++++++-------- > > xen/arch/x86/traps.c | 2 -- > > 2 files changed, 48 insertions(+), 12 deletions(-) > > > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > > index 78a13e6812c9..1f680bf176ee 100644 > > --- a/xen/arch/x86/domain.c > > +++ b/xen/arch/x86/domain.c > > @@ -1982,16 +1982,16 @@ static void load_default_gdt(unsigned int cpu) > > per_cpu(full_gdt_loaded, cpu) = false; > > } > > > > -static void __context_switch(void) > > +static void __context_switch(struct vcpu *n) > > { > > struct cpu_user_regs *stack_regs = guest_cpu_user_regs(); > > unsigned int cpu = smp_processor_id(); > > struct vcpu *p = per_cpu(curr_vcpu, cpu); > > - struct vcpu *n = current; > > struct domain *pd = p->domain, *nd = n->domain; > > > > ASSERT(p != n); > > ASSERT(!vcpu_cpu_dirty(n)); > > + ASSERT(p == current); > > > > if ( !is_idle_domain(pd) ) > > { > > @@ -2036,6 +2036,18 @@ static void __context_switch(void) > > > > write_ptbase(n); > > > > + /* > > + * It's relevant to set both current and curr_vcpu back-to-back, to > > avoid a > > + * window where calls to mapcache_current_vcpu() during the context > > switch > > + * could trigger a recursive loop. > > + * > > + * Do the current switch immediately after switching to the new guest > > + * page-tables, so that current is (almost) always in sync with the > > + * currently loaded page-tables. > > + */ > > + set_current(n); > > + per_cpu(curr_vcpu, cpu) = n; > > ... here. So we're not tempted to move these 2 far off from write_ptbase(). I think the "Do the current switch immediately after switching to the new guest page-tables" sentence already signals that it's important to keep the setting of current and curr_vcpu as close to the write_ptbase() call as possible, but I'm open to suggestions for better wording. > > + > > #ifdef CONFIG_PV > > /* Prefetch the VMCB if we expect to use it later in the context > > switch */ > > if ( using_svm() && is_pv_64bit_domain(nd) && !is_idle_domain(nd) ) > > @@ -2048,8 +2060,6 @@ static void __context_switch(void) > > if ( pd != nd ) > > cpumask_clear_cpu(cpu, pd->dirty_cpumask); > > write_atomic(&p->dirty_cpu, VCPU_CPU_CLEAN); > > - > > - per_cpu(curr_vcpu, cpu) = n; > > } > > > > void context_switch(struct vcpu *prev, struct vcpu *next) > > @@ -2081,16 +2091,36 @@ void context_switch(struct vcpu *prev, struct vcpu > > *next) > > > > local_irq_disable(); > > > > - set_current(next); > > - > > if ( (per_cpu(curr_vcpu, cpu) == next) || > > (is_idle_domain(nextd) && cpu_online(cpu)) ) > > { > > + /* > > + * Lazy context switch to the idle vCPU, set current == idle. Full > > + * context switch happens if/when sync_local_execstate() is called. > > + */ > > + set_current(next); > > local_irq_enable(); > > } > > else > > { > > - __context_switch(); > > + /* > > + * curr_vcpu will always point to the currently loaded vCPU > > context, as > > nit: s/will always point/always points/ ? It's an inconditional invariant, > after all. Sure. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |