[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 02/18] x86/domain: limit window where curr_vcpu != current on context switch
On 08.01.2025 15:26, Roger Pau Monne wrote: > On x86 Xen will perform lazy context switches to the idle vCPU, where the > previously running vCPU context is not overwritten, and only current is > updated > to point to the idle vCPU. The state is then disjunct between current and > curr_vcpu: current points to the idle vCPU, while curr_vcpu points to the vCPU > whose context is loaded on the pCPU. > > While on that lazy context switched state, certain calls (like > map_domain_page()) will trigger a full synchronization of the pCPU state by > forcing a context switch. Note however how calling any of such functions > inside the context switch code itself is very likely to trigger an infinite > recursion loop. > > Attempt to limit the window where curr_vcpu != current in the context switch > code, as to prevent and infinite recursion loop around sync_local_execstate(). > > This is required for using map_domain_page() in the vCPU context switch code, > otherwise using map_domain_page() in that context ends up in a recursive > sync_local_execstate() loop: Question is whether it's a good idea in the first place to start using map_domain_page() from the context switch path. Surely there are possible alternatives. > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1982,16 +1982,16 @@ static void load_default_gdt(unsigned int cpu) > per_cpu(full_gdt_loaded, cpu) = false; > } > > -static void __context_switch(void) > +static void __context_switch(struct vcpu *n) > { > struct cpu_user_regs *stack_regs = guest_cpu_user_regs(); > unsigned int cpu = smp_processor_id(); > struct vcpu *p = per_cpu(curr_vcpu, cpu); > - struct vcpu *n = current; > struct domain *pd = p->domain, *nd = n->domain; > > ASSERT(p != n); > ASSERT(!vcpu_cpu_dirty(n)); > + ASSERT(p == current); > > if ( !is_idle_domain(pd) ) > { > @@ -2036,6 +2036,18 @@ static void __context_switch(void) > > write_ptbase(n); > > + /* > + * It's relevant to set both current and curr_vcpu back-to-back, to > avoid a > + * window where calls to mapcache_current_vcpu() during the context > switch > + * could trigger a recursive loop. > + * > + * Do the current switch immediately after switching to the new guest > + * page-tables, so that current is (almost) always in sync with the > + * currently loaded page-tables. > + */ > + set_current(n); > + per_cpu(curr_vcpu, cpu) = n; The latter paragraph of the comment states something that so far wasn't intended, and imo also shouldn't be going forward. It's curr_vcpu which wants to be in sync with the loaded page tables. (Whether pulling ahead its updating is okay is a separate question. All of these actions used to be be very carefully placed they way they are. Which isn't to say that I can exclude things having gone stale ...) And yes, that has always meant that mapcache_current_vcpu()'s condition for calling sync_local_execstate() was building upon the fact that it won't be called from context switching contexts. Did you consider updating that condition (evaluating curr_cpu) instead? > @@ -2048,8 +2060,6 @@ static void __context_switch(void) > if ( pd != nd ) > cpumask_clear_cpu(cpu, pd->dirty_cpumask); > write_atomic(&p->dirty_cpu, VCPU_CPU_CLEAN); > - > - per_cpu(curr_vcpu, cpu) = n; > } > > void context_switch(struct vcpu *prev, struct vcpu *next) > @@ -2081,16 +2091,36 @@ void context_switch(struct vcpu *prev, struct vcpu > *next) > > local_irq_disable(); > > - set_current(next); > - > if ( (per_cpu(curr_vcpu, cpu) == next) || > (is_idle_domain(nextd) && cpu_online(cpu)) ) > { > + /* > + * Lazy context switch to the idle vCPU, set current == idle. Full > + * context switch happens if/when sync_local_execstate() is called. > + */ > + set_current(next); > local_irq_enable(); The comment is misleading as far as the first half of the if() condition goes: No further switching is going to happen in that case, aiui. > } > else > { > - __context_switch(); > + /* > + * curr_vcpu will always point to the currently loaded vCPU context, > as > + * it's not updated when doing a lazy switch to the idle vCPU. > + */ > + struct vcpu *prev_ctx = per_cpu(curr_vcpu, cpu); > + > + if ( prev_ctx != current ) > + { > + /* > + * Doing a full context switch to a non-idle vCPU from a lazy > + * context switched state. Adjust current to point to the > + * currently loaded vCPU context. > + */ > + ASSERT(current == idle_vcpu[cpu]); > + ASSERT(!is_idle_vcpu(next)); > + set_current(prev_ctx); This feels wrong, as in "current" then not representing what it should represent, for a certain time window. I may be dense, but neither comment not description clarify to me why this might be needed. I can see that it's needed to please the ASSERT() you add to __context_switch(), yet then I might ask why that assertion is put there. > + } > + __context_switch(next); > > /* Re-enable interrupts before restoring state which may fault. */ > local_irq_enable(); > @@ -2156,15 +2186,23 @@ int __sync_local_execstate(void) > { > unsigned long flags; > int switch_required; > + unsigned int cpu = smp_processor_id(); > + struct vcpu *p; > > local_irq_save(flags); > > - switch_required = (this_cpu(curr_vcpu) != current); > + p = per_cpu(curr_vcpu, cpu); > + switch_required = (p != current); > > if ( switch_required ) > { > - ASSERT(current == idle_vcpu[smp_processor_id()]); > - __context_switch(); > + ASSERT(current == idle_vcpu[cpu]); > + /* > + * Restore current to the previously running vCPU, __context_switch() > + * will update current together with curr_vcpu. > + */ > + set_current(p); Similarly here. > + __context_switch(idle_vcpu[cpu]); > } > > local_irq_restore(flags); > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -2232,8 +2232,6 @@ void __init trap_init(void) > > void activate_debugregs(const struct vcpu *curr) > { > - ASSERT(curr == current); > - > write_debugreg(0, curr->arch.dr[0]); > write_debugreg(1, curr->arch.dr[1]); > write_debugreg(2, curr->arch.dr[2]); Why would this assertion go away? If it suddenly triggers, the parameter name would now end up being wrong. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |