[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU
In _csched_cpu_pick() we try to select the best possible CPU for running a VCPU, considering the characteristics of the underlying hardware (i.e., how many threads, core, sockets, and how busy they are). What we want is "the idle execution vehicle with the most idling neighbours in its grouping". In order to achieve it, we select a CPU from the VCPU's affinity, giving preference to its current processor if possible, as the basis for the comparison with all the other CPUs. Problem is, to discount the VCPU itself when computing this "idleness" (in an attempt to be fair wrt its current processor), we arbitrarily and unconditionally consider that selected CPU as idle, even when it is not the case, for instance: 1. If the CPU is not the one where the VCPU is running (perhaps due to the affinity being changed); 2. The CPU is where the VCPU is running, but it has other VCPUs in its runq, so it won't go idle even if the VCPU in question goes. This is exemplified in the trace below: ] 3.466115364 x|------|------| d10v1 22005(2:2:5) 3 [ a 1 8 ] ... ... ... 3.466122856 x|------|------| d10v1 runstate_change d10v1 running->offline 3.466123046 x|------|------| d?v? runstate_change d32767v0 runnable->running ... ... ... ] 3.466126887 x|------|------| d32767v0 28004(2:8:4) 3 [ a 1 8 ] 22005(...) line (the first line) means _csched_cpu_pick() was called on VCPU 1 of domain 10, while it is running on CPU 0, and it choose CPU 8, which is busy ('|'), even if there are plenty of idle CPUs. That is because, as a consequence of changing the VCPU affinity, CPU 8 was chosen as the basis for the comparison, and therefore considered idle (its bit gets unconditionally set in the bitmask representing the idle CPUs). 28004(...) line means the VCPU is woken up and queued on CPU 8's runq, where it waits for a context switch or a migration, in order to be able to execute. This change fixes things by only considering the "guessed" CPU idle if the VCPU in question is both running there and is its only runnable VCPU. While at it, change the name of the two variables (within _csched_cpu_pick() ) counting the numbers of idlers for `cpu' and `nxt' in `nr_idlers_cpu' and `nr_idlers_nxt', which makes their job a little more evident than now that they're just called `weight_*'. Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -59,6 +59,8 @@ #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv) #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv) #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq)) +/* Is the first element of _cpu's runq its idle vcpu? */ +#define IS_RUNQ_IDLE(_cpu) (is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu)) /* @@ -479,9 +481,14 @@ static int * distinct cores first and guarantees we don't do something stupid * like run two VCPUs on co-hyperthreads while there are idle cores * or sockets. + * + * Notice that, when computing the "idleness" of cpu, we may want to + * discount vc. That is, iff vc is the currently running and the only + * runnable vcpu on cpu, we add cpu to the idlers. */ cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers); - cpumask_set_cpu(cpu, &idlers); + if ( current_on_cpu(cpu) == vc && IS_RUNQ_IDLE(cpu) ) + cpumask_set_cpu(cpu, &idlers); cpumask_and(&cpus, &cpus, &idlers); cpumask_clear_cpu(cpu, &cpus); @@ -489,7 +496,7 @@ static int { cpumask_t cpu_idlers; cpumask_t nxt_idlers; - int nxt, weight_cpu, weight_nxt; + int nxt, nr_idlers_cpu, nr_idlers_nxt; int migrate_factor; nxt = cpumask_cycle(cpu, &cpus); @@ -513,12 +520,12 @@ static int cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt)); } - weight_cpu = cpumask_weight(&cpu_idlers); - weight_nxt = cpumask_weight(&nxt_idlers); + nr_idlers_cpu = cpumask_weight(&cpu_idlers); + nr_idlers_nxt = cpumask_weight(&nxt_idlers); /* smt_power_savings: consolidate work rather than spreading it */ if ( sched_smt_power_savings ? - weight_cpu > weight_nxt : - weight_cpu * migrate_factor < weight_nxt ) + nr_idlers_cpu > nr_idlers_nxt : + nr_idlers_cpu * migrate_factor < nr_idlers_nxt ) { cpumask_and(&nxt_idlers, &cpus, &nxt_idlers); spc = CSCHED_PCPU(nxt); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -396,6 +396,9 @@ extern struct vcpu *idle_vcpu[NR_CPUS]; #define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE) #define is_idle_vcpu(v) (is_idle_domain((v)->domain)) +#define current_on_cpu(_c) \ + ( (per_cpu(schedule_data, _c).curr) ) + #define DOMAIN_DESTROYED (1<<31) /* assumes atomic_t is >= 32 bits */ #define put_domain(_d) \ if ( atomic_dec_and_test(&(_d)->refcnt) ) domain_destroy(_d) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |