[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Questions regarding Xen Credit Scheduler
On Mon, Jul 12, 2010 at 4:05 AM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote: >> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same >> priority but higher amount of credits left. Current code just looks at >> the priority. > [snip] >> 5. csched_schedule: Always call csched_load_balance. In the >> csched_load_balance and csched_runq_steal functions, change the logic >> to grab a VCPU with higher credit. Current code just works on >> priority. > > I'm much more wary of these ideas. The problem here is that doing > runqueue tickling and load balancing isn't free -- IPIs can be > expensive, especially if your VMs are running with hardware > virtualization . In fact, with the current scheduler, you get a sort > of n^2 effect, where the time the system spends doing IPIs due to load > balancing squares with the number of schedulable entities. In > addition, frequent migration will reduce cache effectiveness and > increase congestion on the memory bus. > > I presume you want to do this to decrease the latency? Lee et al [1] > actually found that *decreasing* the cpu migrations of their soft > real-time workload led to an overall improvement in quality. The > paper doesn't delve deeply into why, but it seems reasonable to > conclude that although the vcpus may have been able to start their > task sooner (although even that's not guaranteed -- it may have taken > longer to migrate than to get to the front of the runqueue), they > ended their task later, presumably due to cpu stalls on cacheline > misses and so on. > Thanks for this paper. It gives a very interesting analysis on what can go wrong with applications that fall in the middle (need CPU, but are latency sensitive as well). In my experiments, I see some servers like mysql db-servers fall into this category. And as expected they do not do well with some CPU intensive jobs in background, even if I give them highest possible weight (65535). I guess very aggressive migrations might not be a good idea, but there needs to be some way to guarantee such apps getting their fair share at the right time. > I think a much better approach would be: > * To have long-term effective placement, if possible: i.e., distribute > latency-sensitive vcpus > * If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices. These are very interesting ideas indeed. >> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300, >> not 0. I am still not sure why the VCPU is being marked as inactive? >> Can't I just update the credit and let it be active? > So what credit1 does is assume that all workloads fall into two > categories: "active" VMs, which consume as much cpu as they can, and > "inactive" (or "I/O-bound") VMs, which use almost no cpu. "Inactive" > VMs essentially run at BOOST priority, and run whenever they want to. > Then the credit for each timeslice is divided among the "active" VMs. > This way the ones that are consuming cpu don't get too far behind. > > The problem of course, is that most server workloads fall in the > middle: they spend a significant time processing, but also a > significant time waiting for more network packets. This is precisely the problem we are facing. > I looked at the idea of "capping" credit, as you say; but the > steady-state when I worked out the algorithms by hand was that all the > VMs were at their cap all the time, which screwed up other aspects of > the algorithm. Credits need to be thrown away; my proposal was to > divide the credits by 2, rather than setting to 0. This should be a > good mid-way. Sure, dividing by 2 could be a good middle ground. We can additionally not mark them inactive as well? > These things are actually really subtle. I've spent hours and hours > with pencil-and-paper, working out different algorithms by hand, to > see exactly what effect the different changes would have. I even > wrote a discrete event simulator, to make the process a bit faster. > (But of course, to understand why things look the way they do, you > still have to trace through the algorithm manually). If you're really > keen, I can tar it up and send it to you. :-) I am just figuring out how non trivial these apparently small problems are :-) It would be great if you could share your simulator! I will keep you posted on my changes and tests. Thanks, -Gaurav _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |