SEDF: avoid gathering vCPU-s on pCPU0 The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was incompatible with the SEDF scheduler: Any vCPU using VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux guests) ends up on pCPU0 after that call. Obviously, running all PV guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those guests rather sooner than later. So the main thing that was clearly wrong (and bogus from the beginning) was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced by a construct that prefers to put back the vCPU on the pCPU that it got launched on. However, there's one more glitch: When reducing the affinity of a vCPU temporarily, and then widening it again to a set that includes the pCPU that the vCPU was last running on, the generic scheduler code would not force a migration of that vCPU, and hence it would forever stay on the pCPU it last ran on. Since that can again create a load imbalance, the SEDF scheduler wants a migration to happen regardless of it being apparently unnecessary. Of course, an alternative to checking for SEDF explicitly in vcpu_set_affinity() would be to introduce a flags field in struct scheduler, and have SEDF set a "always-migrate-on-affinity-change" flag. Signed-off-by: Jan Beulich --- a/xen/common/sched_sedf.c +++ b/xen/common/sched_sedf.c @@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc online = cpupool_scheduler_cpumask(v->domain->cpupool); cpumask_and(&online_affinity, v->cpu_affinity, online); - return cpumask_first(&online_affinity); + return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1, + &online_affinity); } /* --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co vcpu_schedule_lock_irq(v); cpumask_copy(v->cpu_affinity, affinity); - if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) ) + if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF || + !cpumask_test_cpu(v->processor, v->cpu_affinity) ) set_bit(_VPF_migrating, &v->pause_flags); vcpu_schedule_unlock_irq(v);