[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

>>> On 04.03.13 at 12:22, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
> On Fri, Mar 1, 2013 at 3:35 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
>> guests rather sooner than later.
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>> However, there's one more glitch: When reducing the affinity of a vCPU
>> temporarily, and then widening it again to a set that includes the pCPU
>> that the vCPU was last running on, the generic scheduler code would not
>> force a migration of that vCPU, and hence it would forever stay on the
>> pCPU it last ran on. Since that can again create a load imbalance, the
>> SEDF scheduler wants a migration to happen regardless of it being
>> apparently unnecessary.
> I'm not quite understanding what this is about -- why is this
> necessary for sedf but not for the other schedulers?

The problem with sedf is that it doesn't do any balancing, and
never moves a vCPU to a different pCPU unless the affinity
mask changes in a way that makes this necessary (in which
case it's the generic scheduler code that invokes to relocation).

So when a vCPU narrows its affinity (in the extreme to a mask
with just one bit set) and then widens it again, it would
nevertheless remain on the pCPU it was formerly restricted to.
When (perhaps much later) a second and then a third vCPU
do the same, they may all end up running on the same pCPU,
i.e. we'd get back to the problem addressed by the first half of
the fix here.

While you might argue that affinity changes ought to be done
in a "sensible" way, I think you still realize that there is behavior
here that one wouldn't be able to control without explicitly
adding an intermediate step when doing this for DomU-s. And
you should also keep in mind that there are certain things that
Dom0 needs to change its vCPU affinities for (pv-ops doesn't
do that, which is while certain things there just can't work);
this is how I noticed in the first place that the second half of
the fix is necessary.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.