[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > And this time, do it right. In fact, a similar change was > attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity > on the cpupool_unassign_cpu() path"). But that was buggy, and got > reverted with 8395b67ab0b8a86. > > However, even though reverting was the right thing to do, it > remains true that: > - calling the function is better done in the cpupool cpu removal > code, even if just for simmetry with the cpupool cpu adding path; > - it is not necessary to call it during cpu teardown (for suspend > or shutdown) code as we either are going down and will never > come up (shutdown) or, when coming up, we want everything to be > as before the tearing down process started, and so we would just > undo any update made during the process. > - calling it from the teardown path is not only unnecessary, but > it can trigger an ASSERT(), in case we get, during the process, > to remove the last online pcpu of a domain's node affinity: > > (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466 > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > ... ... ... > (XEN) Xen call trace: > (XEN) [<ffff82d0801055b9>] domain_update_node_affinity+0x113/0x240 > (XEN) [<ffff82d08012e676>] cpu_disable_scheduler+0x334/0x3f2 > (XEN) [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e > (XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b > (XEN) [<ffff82d080130ad9>] stopmachine_action+0x70/0x99 > (XEN) [<ffff82d08013274f>] do_tasklet_work+0x78/0xab > (XEN) [<ffff82d080132a85>] do_tasklet+0x5e/0x8a > (XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 12: > (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466 > (XEN) **************************************** > > Therefore, for all these reasons, move the call from > cpu_disable_schedule() to cpupool_unassign_cpu_helper(). > > While there, add some sanity checking (in the latter function), and > make sure that scanning the domain list is done with domlist_read_lock > held, at least when the system is 'live'. > > I re-tested the scenario described in here: > http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310 > > which is what led to the revert of 93be8285a79c6, and that is > working ok after this commit. > > Signed-off-by: <dario.faggioli@xxxxxxxxxx> Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > --- > Cc: Juergen Gross <jgross@xxxxxxxx> > Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > --- > xen/common/cpupool.c | 18 ++++++++++++++++++ > xen/common/schedule.c | 7 ++++++- > 2 files changed, 24 insertions(+), 1 deletion(-) > > diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c > index 563864d..79bcb21 100644 > --- a/xen/common/cpupool.c > +++ b/xen/common/cpupool.c > @@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, > unsigned int cpu) > static long cpupool_unassign_cpu_helper(void *info) > { > int cpu = cpupool_moving_cpu; > + struct cpupool *c = info; > + struct domain *d; > long ret; > > cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n", > cpupool_cpu_moving->cpupool_id, cpu); > > spin_lock(&cpupool_lock); > + if ( c != cpupool_cpu_moving ) > + { > + ret = -EBUSY; > + goto out; > + } > + > + /* > + * We need this for scanning the domain list, both in > + * cpu_disable_scheduler(), and at the bottom of this function. > + */ > + rcu_read_lock(&domlist_read_lock); > ret = cpu_disable_scheduler(cpu); > cpumask_set_cpu(cpu, &cpupool_free_cpus); > if ( !ret ) > @@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info) > cpupool_cpu_moving = NULL; > } > > + for_each_domain_in_cpupool(d, c) > + { > + domain_update_node_affinity(d); > + } > + rcu_read_unlock(&domlist_read_lock); > out: > spin_unlock(&cpupool_lock); > cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret); > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index eac8804..a1840c9 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -652,6 +652,12 @@ int cpu_disable_scheduler(unsigned int cpu) > if ( c == NULL ) > return ret; > > + /* > + * We'd need the domain RCU lock, but: > + * - when we are called from cpupool code, it's acquired there already; > + * - when we are called for CPU teardown, we're in stop-machine context, > + * so that's not be a problem. > + */ > for_each_domain_in_cpupool ( d, c ) > { > for_each_vcpu ( d, v ) > @@ -741,7 +747,6 @@ int cpu_disable_scheduler(unsigned int cpu) > ret = -EAGAIN; > } > } > - domain_update_node_affinity(d); > } > > return ret; > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |