[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
On 20/12/12 18:18, Dario Faggioli wrote: On Thu, Dec 20, 2012 at 5:48 PM, George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:And in any case, looking at the caller of csched_load_balance(), it explicitly says to steal work if the next thing on the runqueue of cpu has a priority of TS_OVER. That was chosen for a reason -- if you want to change that, you should change it there at the top (and make a justification for doing so), not deeply nested in a function like this. Or am I completely missing something?No, you're right. Trying to solve a nasty issue I was seeing, I overlooked I was changing the underlying logic until that point... Thanks! What I want to avoid is the following: a vcpu wakes-up on the busy pcpu Y. As a consequence, the idle pcpu X is tickled. Then, for any unrelated reason, pcpu Z reschedules and, as it would go idle too, it looks around for any vcpu to steal, finds one in Y's runqueue and grabs it. Afterward, when X gets the IPI and schedules, it just does not find anyone to run and goes back idling. Now, suppose the vcpu has X, but *not* Z, in its node-affinity (while it has a full vcpu-affinity, i.e., can run everywhere). In this case, a vcpu that could have run on a pcpu in its node-affinity, executes outside from it. That happens because, the NODE_BALANCE_STEP in csched_load_balance(), when called by Z, won't find anything suitable to steal (provided there actually isn't any vcpu waiting in any runqueue with node-affinity with Z), while the CPU_BALANCE_STEP will find our vcpu. :-( So, what I wanted is something that could tell me whether the pcpu which is stealing work is the one that has actually been tickled to do so. I was then using the pcpu idleness as a (cheap and easy to check) indication of that, but I now see this is having side effects I in the first place did not want to cause. Sorry for that, I probably spent so much time buried, as you where saying, in the various nested loops and calls, that I lost the context a little bit! :-P OK, that makes sense -- I figured it was something like that. Don't feel too bad about missing that connection -- we're all fairly blind to our own code, and I only caught it because I was trying to figure out what was going on. That's why we do patch review. :-) Honestly, the whole "steal work" idea seemed a bit backwards to begin with, but now that we're not just dealing with "possible" and "not possible", but with "better" and "worse", the work-stealing method of load balancing sort of falls down. It does make sense to do the load-balancing work on idle cpus rather than already-busy cpus; but I wonder if what should happen instead is that before idling, a pcpu chooses a "busy" pcpu and does a global load balancing for it -- i.e., pcpu 1 will look at pcpu 5's runqueue, and consider moving away the vcpus on the runqueue not just to itself but to any available cpu. That way, in your example, Z might wake up, look at X's runqueue, and say, "This would probably run well on Y -- I'll migrate it there." But that's kind of a half-baked idea at this point. Ok, I think the problem I was describing is real, and I've seen it happening and causing performances degradation. However, as I think a good solution is going to be more complex than I thought, I'd better repost without this function and deal with it in a future separate patch (after having figured out the best way of doing so). Is that fine with you? Yes, that's fine. Thanks, Dario. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |