[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
Date: Fri, 21 Dec 2012 14:29:09 +0000
Cc: Marcus Granado <Marcus.Granado@xxxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>
Delivery-date: Fri, 21 Dec 2012 14:36:07 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 20/12/12 18:18, Dario Faggioli wrote:

On Thu, Dec 20, 2012 at 5:48 PM, George Dunlap
<george.dunlap@xxxxxxxxxxxxx> wrote:

And in any case, looking at the caller of csched_load_balance(), it
explicitly says to steal work if the next thing on the runqueue of cpu has a
priority of TS_OVER.  That was chosen for a reason -- if you want to change
that, you should change it there at the top (and make a justification for
doing so), not deeply nested in a function like this.

Or am I completely missing something?

No, you're right. Trying to solve a nasty issue I was seeing, I overlooked I was
changing the underlying logic until that point... Thanks!

What I want to avoid is the following: a vcpu wakes-up on the busy pcpu Y. As
a consequence, the idle pcpu X is tickled. Then, for any unrelated reason, pcpu
Z reschedules and, as it would go idle too, it looks around for any
vcpu to steal,
finds one in Y's runqueue and grabs it. Afterward, when X gets the IPI and
schedules, it just does not find anyone to run and goes back idling.

Now, suppose the vcpu has X, but *not* Z, in its node-affinity (while
it has a full
vcpu-affinity, i.e., can run everywhere). In this case, a vcpu that
could have run on
a pcpu in its node-affinity, executes outside from it. That happens because,
the NODE_BALANCE_STEP in csched_load_balance(), when called by Z, won't
find anything suitable to steal (provided there actually isn't any
vcpu waiting in
any runqueue with node-affinity with Z), while the CPU_BALANCE_STEP will
find our vcpu. :-(

So, what I wanted is something that could tell me whether the pcpu which is
stealing work is the one that has actually been tickled to do so. I
was then using
the pcpu idleness as a (cheap and easy to check) indication of that,
but I now see
this is having side effects I in the first place did not want to cause.

Sorry for that, I probably spent so much time buried, as you where
saying, in the
various nested loops and calls, that I lost the context a little bit! :-P

OK, that makes sense -- I figured it was something like that. Don'tfeel too bad about missing that connection -- we're all fairly blind toour own code, and I only caught it because I was trying to figure outwhat was going on. That's why we do patch review. :-)

Honestly, the whole "steal work" idea seemed a bit backwards to beginwith, but now that we're not just dealing with "possible" and "notpossible", but with "better" and "worse", the work-stealing method ofload balancing sort of falls down.

It does make sense to do the load-balancing work on idle cpus ratherthan already-busy cpus; but I wonder if what should happen instead isthat before idling, a pcpu chooses a "busy" pcpu and does a global loadbalancing for it -- i.e., pcpu 1 will look at pcpu 5's runqueue, andconsider moving away the vcpus on the runqueue not just to itself but toany available cpu.

That way, in your example, Z might wake up, look at X's runqueue, andsay, "This would probably run well on Y -- I'll migrate it there."


But that's kind of a half-baked idea at this point.

Ok, I think the problem I was describing is real, and I've seen it happening and
causing performances degradation. However, as I think a good solution
is going to
be more complex than I thought, I'd better repost without this
function and deal with
it in a future separate patch (after having figured out the best way
of doing so). Is
that fine with you?


Yes, that's fine.  Thanks, Dario.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
  - From: Dario Faggioli

References:
- [Xen-devel] [PATCH 00 of 10 v2] NUMA aware credit scheduling
  - From: Dario Faggioli
- [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
  - From: George Dunlap
- Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
  - From: Dario Faggioli

Prev by Date: Re: [Xen-devel] How to get a few MSR values from userspace?
Next by Date: Re: [Xen-devel] How to get a few MSR values from userspace?
Previous by thread: Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
Next by thread: Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.