[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 02/18] x86/domain: limit window where curr_vcpu != current on context switch
This is a net gain even without ASI. Having "current" hold the previous vCPU on __context_switch() makes it _a lot_ easier to follow the lazy switch path. On Wed Jan 8, 2025 at 2:26 PM GMT, Roger Pau Monne wrote: > On x86 Xen will perform lazy context switches to the idle vCPU, where the > previously running vCPU context is not overwritten, and only current is > updated > to point to the idle vCPU. The state is then disjunct between current and > curr_vcpu: current points to the idle vCPU, while curr_vcpu points to the vCPU > whose context is loaded on the pCPU. > > While on that lazy context switched state, certain calls (like > map_domain_page()) will trigger a full synchronization of the pCPU state by > forcing a context switch. Note however how calling any of such functions > inside the context switch code itself is very likely to trigger an infinite > recursion loop. > > Attempt to limit the window where curr_vcpu != current in the context switch > code, as to prevent and infinite recursion loop around sync_local_execstate(). > > This is required for using map_domain_page() in the vCPU context switch code, > otherwise using map_domain_page() in that context ends up in a recursive > sync_local_execstate() loop: > > map_domain_page() -> sync_local_execstate() -> map_domain_page() -> ... More generally, it's worth mentioning that we want to establish an invariant between a per-cpu variable (curr_vcpu) and the currently running page tables. That way it can be used as discriminant to know which are the currently active per-vCPU mappings. That's essential for implementing FPU hiding as proposed here: https://lore.kernel.org/xen-devel/20241105143310.28301-1-alejandro.vallejo@xxxxxxxxx/ A shorter form of that should probably be mentioned also... > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > Changes since v1: > - New in this version. > --- > xen/arch/x86/domain.c | 58 +++++++++++++++++++++++++++++++++++-------- > xen/arch/x86/traps.c | 2 -- > 2 files changed, 48 insertions(+), 12 deletions(-) > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index 78a13e6812c9..1f680bf176ee 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1982,16 +1982,16 @@ static void load_default_gdt(unsigned int cpu) > per_cpu(full_gdt_loaded, cpu) = false; > } > > -static void __context_switch(void) > +static void __context_switch(struct vcpu *n) > { > struct cpu_user_regs *stack_regs = guest_cpu_user_regs(); > unsigned int cpu = smp_processor_id(); > struct vcpu *p = per_cpu(curr_vcpu, cpu); > - struct vcpu *n = current; > struct domain *pd = p->domain, *nd = n->domain; > > ASSERT(p != n); > ASSERT(!vcpu_cpu_dirty(n)); > + ASSERT(p == current); > > if ( !is_idle_domain(pd) ) > { > @@ -2036,6 +2036,18 @@ static void __context_switch(void) > > write_ptbase(n); > > + /* > + * It's relevant to set both current and curr_vcpu back-to-back, to > avoid a > + * window where calls to mapcache_current_vcpu() during the context > switch > + * could trigger a recursive loop. > + * > + * Do the current switch immediately after switching to the new guest > + * page-tables, so that current is (almost) always in sync with the > + * currently loaded page-tables. > + */ > + set_current(n); > + per_cpu(curr_vcpu, cpu) = n; ... here. So we're not tempted to move these 2 far off from write_ptbase(). > + > #ifdef CONFIG_PV > /* Prefetch the VMCB if we expect to use it later in the context switch > */ > if ( using_svm() && is_pv_64bit_domain(nd) && !is_idle_domain(nd) ) > @@ -2048,8 +2060,6 @@ static void __context_switch(void) > if ( pd != nd ) > cpumask_clear_cpu(cpu, pd->dirty_cpumask); > write_atomic(&p->dirty_cpu, VCPU_CPU_CLEAN); > - > - per_cpu(curr_vcpu, cpu) = n; > } > > void context_switch(struct vcpu *prev, struct vcpu *next) > @@ -2081,16 +2091,36 @@ void context_switch(struct vcpu *prev, struct vcpu > *next) > > local_irq_disable(); > > - set_current(next); > - > if ( (per_cpu(curr_vcpu, cpu) == next) || > (is_idle_domain(nextd) && cpu_online(cpu)) ) > { > + /* > + * Lazy context switch to the idle vCPU, set current == idle. Full > + * context switch happens if/when sync_local_execstate() is called. > + */ > + set_current(next); > local_irq_enable(); > } > else > { > - __context_switch(); > + /* > + * curr_vcpu will always point to the currently loaded vCPU context, > as nit: s/will always point/always points/ ? It's an inconditional invariant, after all. > + * it's not updated when doing a lazy switch to the idle vCPU. > + */ > + struct vcpu *prev_ctx = per_cpu(curr_vcpu, cpu); > + > + if ( prev_ctx != current ) > + { > + /* > + * Doing a full context switch to a non-idle vCPU from a lazy > + * context switched state. Adjust current to point to the > + * currently loaded vCPU context. > + */ > + ASSERT(current == idle_vcpu[cpu]); > + ASSERT(!is_idle_vcpu(next)); > + set_current(prev_ctx); > + } > + __context_switch(next); > > /* Re-enable interrupts before restoring state which may fault. */ > local_irq_enable(); > @@ -2156,15 +2186,23 @@ int __sync_local_execstate(void) > { > unsigned long flags; > int switch_required; > + unsigned int cpu = smp_processor_id(); > + struct vcpu *p; > > local_irq_save(flags); > > - switch_required = (this_cpu(curr_vcpu) != current); > + p = per_cpu(curr_vcpu, cpu); > + switch_required = (p != current); > > if ( switch_required ) > { > - ASSERT(current == idle_vcpu[smp_processor_id()]); > - __context_switch(); > + ASSERT(current == idle_vcpu[cpu]); > + /* > + * Restore current to the previously running vCPU, __context_switch() > + * will update current together with curr_vcpu. > + */ > + set_current(p); > + __context_switch(idle_vcpu[cpu]); > } > > local_irq_restore(flags); > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index 87b30ce4df2a..487b8c5a78c5 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -2232,8 +2232,6 @@ void __init trap_init(void) > > void activate_debugregs(const struct vcpu *curr) > { > - ASSERT(curr == current); > - > write_debugreg(0, curr->arch.dr[0]); > write_debugreg(1, curr->arch.dr[1]); > write_debugreg(2, curr->arch.dr[2]); Cheers, Alejandro
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |