[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] planned csched improvements?
On Fri, Oct 9, 2009 at 3:53 PM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote: > After the original announcement of plans to do some work on csched there > wasn't much activity, so I'd like to ask about some observations that I made > with the current implementation and whether it would be expected that > those planned changes would take care of them. There has been activity, but nothing worth sharing yet. :-) I'm working on the new "fairness" algorithm (perhaps called credits, perhaps not), which is a prerequisite for any further work such as load-balancing, power consumption, and so on. Unfortunately, I haven't been able to work on it for more than a week at a time for the last several months before being interrupted with other work-related tasks. :-( Re the items you bring up below: I believe that my planned changes to load-balancing should address the first. First, I plan on making all cores which share an L2 cache to share a runqueue. This will automatically share work among those cores without needing any special load-balancing to be done. Then, I plan on actually calculating: * The per-runqueue load over the last time period * The amount each vcpu is contributing to that load. Then load balancing won't be a matter of looking at the instantaneous runqueue lengths (as it is currently) but to the actual amount of "business" the runqueue has over a period of time. Load balancing will be just that: actually moving vcpus around to make the loads more balanced. Balancing operations will happen at fixed intervals, rather than "whenever a runqueue is idle". But those are just plans now; not a line of code has been written, and schedulers especially are notorious for the Law of Unexpected Consequences. Re soft-lockups: That really shouldn't be possible with the current scheduler; if it happens, it's a bug. Have you pulled from xen-unstable recently? There was a bug introduced a few weeks ago that would cause problems; Keir checked in a fix for that one last week. Otherwise, if you're sure it's not a long hypercall issue, there must be a bug somewhere. The new scheduler will be an almost complete re-write; so it will probably erase this bug, and introduce its own bugs. However, I doubt it will be ready by 3.5, so it's probably worth tracking down and fixing if we can. Hope that answers your question. :-) -George > On a lightly loaded many-core non-hyperthreaded system (e.g. a single > CPU bound process in one VM, and only some background load elsewhere), > I see this CPU bound vCPU permanently switch between sockets, which is > a result of csched_cpu_pick() eagerly moving vCPU-s to "more idle" > sockets. It would seem that some minimal latency consideration might be > useful to get added here, so that a very brief interruption by another > vCPU doesn't result in unnecessary migration. > > As a consequence of that eager moving, in the vast majority of cases > the vCPU in question then (within a very short period of time) either > triggers a cascade of other vCPU migrations, or begins a series of > ping-pongs between (usually two) pCPU-s - until things settle again for > a while. Again, some minimal latency added here might help avoiding > that. > > Finally, in the complete inverse scenario of severely overcommitted > systems (more than two fully loaded vCPU-s per pCPU) I frequently > see Linux' softlockup watchdog kick in, now and then even resulting > in the VM hanging. I had always thought that starvation of a vCPU > for several seconds shouldn't be an issue that early - am I wrong > here? > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |