[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH] xen: make sure the node-affinity is always updated
in particular when a domain moves from a cpupool to another, and when the list of cpus in a cpupool changes. Not doing that may result in the domain's node-affinity mask (as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) ) and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have an empty intersection. This in turn means that, in _csched_cpu_pick(), we think it is legitimate to do a node-affinity load balancing step and, when running this: ... /* Pick an online CPU from the proper affinity mask */ csched_balance_cpumask(vc, balance_step, &cpus); cpumask_and(&cpus, &cpus, online); ... we end up with an empty cpumask (in cpus). At this point, in the following code: .... /* If present, prefer vc's current processor */ cpu = cpumask_test_cpu(vc->processor, &cpus) ? vc->processor : cpumask_cycle(vc->processor, &cpus); .... an ASSERT (from inside cpumask_cycle() ) triggers like this: (XEN) Xen call trace: (XEN) [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652 (XEN) [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10 (XEN) [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e (XEN) [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287 (XEN) [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4 (XEN) [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1 (XEN) [<ffff82d080127793>] do_tasklet_work+0x78/0xab (XEN) [<ffff82d080127a70>] do_tasklet+0x5f/0x8b (XEN) [<ffff82d080158985>] idle_loop+0x57/0x5e (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481 It is for example sufficient to have a domain with node-affinity to NUMA node 1 running, and issueing a `xl cpupool-numa-split' would make the above happen. That is because, by default, all the existing domains remain assigned to the first cpupool, and it now (after the cpupool-numa-split) only includes NUMA node 0. This change prevents that, by making sure that the node-affinity mask is reset and automatically recomputed, if that is the case, in domain_update_node_affinity(), as well as making sure that such function is invoked every time that is required. In fact, it makes much more sense to reset the node-affinity when either it and the domain's vcpu-affinity or the cpumap of the domain's cpupool are inconsistent, rather than not touching it, as it was before. Also, updating the domain's node affinity on the cpupool_unassing_cpu() path, as it already happens on the cpupool_assign_cpu_locked() path, was something we were missing anyway, even independently from the ASSERT triggering. Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx> Cc: Jan Beulich <JBeulich@xxxxxxxx> Cc: Keir Fraser <keir@xxxxxxx> Cc: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> --- xen/common/cpupool.c | 8 ++++++++ xen/common/domain.c | 40 +++++++++++++++++++++++----------------- 2 files changed, 31 insertions(+), 17 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 2164a9f..23e461d 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -355,6 +355,14 @@ int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu) atomic_inc(&c->refcnt); cpupool_cpu_moving = c; cpumask_clear_cpu(cpu, c->cpu_valid); + + rcu_read_lock(&domlist_read_lock); + for_each_domain_in_cpupool(d, c) + { + domain_update_node_affinity(d); + } + rcu_read_unlock(&domlist_read_lock); + spin_unlock(&cpupool_lock); work_cpu = smp_processor_id(); diff --git a/xen/common/domain.c b/xen/common/domain.c index 5999779..f708b3c 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d) cpumask_var_t cpumask; cpumask_var_t online_affinity; const cpumask_t *online; - nodemask_t nodemask = NODE_MASK_NONE; struct vcpu *v; unsigned int node; @@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d) cpumask_or(cpumask, cpumask, online_affinity); } - if ( d->auto_node_affinity ) - { - /* Node-affinity is automaically computed from all vcpu-affinities */ - for_each_online_node ( node ) - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) - node_set(node, nodemask); - - d->node_affinity = nodemask; - } - else + if ( !d->auto_node_affinity ) { /* Node-affinity is provided by someone else, just filter out cpus * that are either offline or not in the affinity of any vcpus. */ - nodemask = d->node_affinity; for_each_node_mask ( node, d->node_affinity ) if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) ) - node_clear(node, nodemask);//d->node_affinity); + node_clear(node, d->node_affinity); + + /* + * If we end up with an empty node-affinity, e.g., because the user + * specified an incompatible vcpu-affinity, or moved the domain to + * a different cpupool, reset the node-affinity and re-calculate it + * (in the body of the if() below). + * + * This is necessary to prevent the schedulers from attempting + * node-affinity load balancing on empty cpumasks, with (potentially) + * nasty effects (increased overhead or even crash!). + */ + if ( nodes_empty(d->node_affinity) ) + d->auto_node_affinity = 1; + } - /* Avoid loosing track of node-affinity because of a bad - * vcpu-affinity has been specified. */ - if ( !nodes_empty(nodemask) ) - d->node_affinity = nodemask; + if ( d->auto_node_affinity ) + { + /* Node-affinity is automatically computed from all vcpu-affinities */ + nodes_clear(d->node_affinity); + for_each_online_node ( node ) + if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) + node_set(node, d->node_affinity); } sched_set_node_affinity(d, &d->node_affinity); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |