[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
On 19/12/12 19:07, Dario Faggioli wrote: static inline int -__csched_vcpu_is_migrateable(struct vcpu *vc, int dest_cpu) +__csched_vcpu_is_migrateable(struct vcpu *vc, int dest_cpu, cpumask_t *mask) { /* * Don't pick up work that's in the peer's scheduling tail or hot on - * peer PCPU. Only pick up work that's allowed to run on our CPU. + * peer PCPU. Only pick up work that prefers and/or is allowed to run + * on our CPU. */ return !vc->is_running && !__csched_vcpu_is_cache_hot(vc) && - cpumask_test_cpu(dest_cpu, vc->cpu_affinity); + cpumask_test_cpu(dest_cpu, mask); +} + +static inline int +__csched_vcpu_should_migrate(int cpu, cpumask_t *mask, cpumask_t *idlers) +{ + /* + * Consent to migration if cpu is one of the idlers in the VCPU's + * affinity mask. In fact, if that is not the case, it just means it + * was some other CPU that was tickled and should hence come and pick + * VCPU up. Migrating it to cpu would only make things worse. + */ + return cpumask_test_cpu(cpu, idlers) && cpumask_test_cpu(cpu, mask); } I don't get what this function is for. The only time you call it is in csched_runq_steal(), immediately after calling __csched_vcpu_is_migrateable(). But is_migrateable() has already checked cpu_mask_test_cpu(cpu, mask). So why do we need to check it again? We could just replace this with cpumask_test_cpu(cpu, prv->idlers). But that clause is going to be either true or false for every single iteration of all the loops, including the loops in csched_load_balance(). Wouldn't it make more sense to check it once in csched_load_balance(), rather than doing all those nested loops? And in any case, looking at the caller of csched_load_balance(), it explicitly says to steal work if the next thing on the runqueue of cpu has a priority of TS_OVER. That was chosen for a reason -- if you want to change that, you should change it there at the top (and make a justification for doing so), not deeply nested in a function like this. Or am I completely missing something? static struct csched_vcpu * -csched_runq_steal(int peer_cpu, int cpu, int pri) +csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) { const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu); + struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, peer_cpu)); const struct vcpu * const peer_vcpu = curr_on_cpu(peer_cpu); struct csched_vcpu *speer; struct list_head *iter; @@ -1265,11 +1395,24 @@ csched_runq_steal(int peer_cpu, int cpu, if ( speer->pri <= pri ) break; - /* Is this VCPU is runnable on our PCPU? */ + /* Is this VCPU runnable on our PCPU? */ vc = speer->vcpu; BUG_ON( is_idle_vcpu(vc) ); - if (__csched_vcpu_is_migrateable(vc, cpu)) + /* + * Retrieve the correct mask for this balance_step or, if we're + * dealing with node-affinity and the vcpu has no node affinity + * at all, just skip this vcpu. That is needed if we want to + * check if we have any node-affine work to steal first (wrt + * any vcpu-affine work). + */ + if ( csched_balance_cpumask(vc, balance_step, + &scratch_balance_mask) ) + continue; Again, I think for clarity the best thing to do here is: if ( balance_step == NODE && cpumask_full(speer->sdom->node_affinity_cpumask) ) continue; csched_balance_cpumask(); /* Etc. */ @@ -1295,7 +1438,8 @@ csched_load_balance(struct csched_privat struct csched_vcpu *speer; cpumask_t workers; cpumask_t *online; - int peer_cpu; + int peer_cpu, peer_node, bstep; + int node = cpu_to_node(cpu); BUG_ON( cpu != snext->vcpu->processor ); online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu)); @@ -1312,42 +1456,68 @@ csched_load_balance(struct csched_privat SCHED_STAT_CRANK(load_balance_other); /* - * Peek at non-idling CPUs in the system, starting with our - * immediate neighbour. + * Let's look around for work to steal, taking both vcpu-affinity + * and node-affinity into account. More specifically, we check all + * the non-idle CPUs' runq, looking for: + * 1. any node-affine work to steal first, + * 2. if not finding anything, any vcpu-affine work to steal. */ - cpumask_andnot(&workers, online, prv->idlers); - cpumask_clear_cpu(cpu, &workers); - peer_cpu = cpu; + for_each_csched_balance_step( bstep ) + { + /* + * We peek at the non-idling CPUs in a node-wise fashion. In fact, + * it is more likely that we find some node-affine work on our same + * node, not to mention that migrating vcpus within the same node + * could well expected to be cheaper than across-nodes (memory + * stays local, there might be some node-wide cache[s], etc.). + */ + peer_node = node; + do + { + /* Find out what the !idle are in this node */ + cpumask_andnot(&workers, online, prv->idlers); + cpumask_and(&workers, &workers, &node_to_cpumask(peer_node)); + cpumask_clear_cpu(cpu, &workers); - while ( !cpumask_empty(&workers) ) - { - peer_cpu = cpumask_cycle(peer_cpu, &workers); - cpumask_clear_cpu(peer_cpu, &workers); + if ( cpumask_empty(&workers) ) + goto next_node; - /* - * Get ahold of the scheduler lock for this peer CPU. - * - * Note: We don't spin on this lock but simply try it. Spinning could - * cause a deadlock if the peer CPU is also load balancing and trying - * to lock this CPU. - */ - if ( !pcpu_schedule_trylock(peer_cpu) ) - { - SCHED_STAT_CRANK(steal_trylock_failed); - continue; - } + peer_cpu = cpumask_first(&workers); + do + { + /* + * Get ahold of the scheduler lock for this peer CPU. + * + * Note: We don't spin on this lock but simply try it. Spinning + * could cause a deadlock if the peer CPU is also load + * balancing and trying to lock this CPU. + */ + if ( !pcpu_schedule_trylock(peer_cpu) ) + { + SCHED_STAT_CRANK(steal_trylock_failed); + peer_cpu = cpumask_cycle(peer_cpu, &workers); + continue; + } - /* - * Any work over there to steal? - */ - speer = cpumask_test_cpu(peer_cpu, online) ? - csched_runq_steal(peer_cpu, cpu, snext->pri) : NULL; - pcpu_schedule_unlock(peer_cpu); - if ( speer != NULL ) - { - *stolen = 1; - return speer; - } + /* Any work over there to steal? */ + speer = cpumask_test_cpu(peer_cpu, online) ? + csched_runq_steal(peer_cpu, cpu, snext->pri, bstep) : NULL; + pcpu_schedule_unlock(peer_cpu); + + /* As soon as one vcpu is found, balancing ends */ + if ( speer != NULL ) + { + *stolen = 1; + return speer; + } + + peer_cpu = cpumask_cycle(peer_cpu, &workers); + + } while( peer_cpu != cpumask_first(&workers) ); + + next_node: + peer_node = cycle_node(peer_node, node_online_map); + } while( peer_node != node ); } These changes all look right. But then, I'm a bit tired, so I'll give it another once-over tomorrow. :-) [To be continued] _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |