[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [ED] New credit, experimental, v1
A couple of things are clear from the previous discussion, and from other work on the scheduler: the 10ms granularity of credit charging is much too rough for latency-sensitive workloads. The next version of the scheduler will be keeping track of credits more precisely. That alone won't fix it, however. The credit scheduler is still designed to cross between UNDER and OVER on a regular basis, and to discriminate against VMs which go into OVER. For a while I worked on different algorithms that were similar in nature. However, I kept running up against some fundamental limitations. Credit1's method of "active" vs "inactive" works well if VMs fall directly into those two camps, and are not interactive: active VMs burn all the credits they get, inactive VMs don't get credits but hardly need any cpu. But a number of VMs fall into the middle. (Starting now is where I'm a lot less sure of what I'm saying, and expect to hear that I'm mistaken, or I've missed something.) If VMs are allowed to accumulate credit, then some VMs will continually gain credit. Credit1 solved this by capping credit, and actually discarding it when a VM reached the cap. The problems with this have been discussed before. If you don't discard the credit, however, then the "stable state" of the system is for all VMs to have max credit, and cpu will effectively fall back to round-robin. (Briefly, if all VMs have MAX_CREDITS, and we assign up to 30ms of credit every 30ms, allowing extra credit from one VM to flow into another VM, then after burning some combination of credits, all VMs will be filled up to their max.) I experimented for a while with the idea of discarding credits and assigning new credits from scratch every 100ms. The problem with this was again dealing with assigning credits not burned by some VMs "fairly" (i.e., by weight) to other VMs. The fundamental limitation is that we can't know ahead of time how much a VM will want to use. Even past performance isn't a good measure: just because a VM *didn't* run didnt' mean it *wouldn't* have run given the chance. * Dynamic retroactive earning The solution I'm presenting now is actually very simple computationally. However, it has some very subtle side-effects. The algortihm is this: * Set everyone's credits according to their weights, scaled so that the total amount of assigned credit can be burned up in 50-100ms. * Then, burn credits as a fixed rate, running the VM with the highest amount of credits first. * If a vcpu comes to the front of a queue that has zero credits (i.e., if there are no runnable VMs with credits), then assign credits again; but do not *add* credits, merely *set* them proportional to their weights; any unused credit is effectively discarded. Here are some of the attributes of this algorithm: * The credit assignment will effectively divide time into "earning periods". * The amount of credit added into the system each earning period is equal to the amount burned. So overall system credit and overall system burn will be the same. * Assigning fixed credit but varying the time at which it is delivered effectively changes the effective rate of earning. However, because the thing that varies the rate of earning happens at the end rather than the beginning, the time at which credits are assigned effectively retoractively sets the rate of earning for that earning period. (Think carefully about this for a bit.) So this effectively addresses the two problems we had before: adding too much credit into the system (because earn == burn always), and not knowing in advance how much credit a VM would use, because we set the effective rate of earning not at the beginning, but at the end. * Assigning credit when we first have a vcpu without credit. This scales everyone's earning rate, and discards the credits of those who don't use all of theirs, while making sure that those who do use all of their credits get time proportional to their weight. A couple of conditions are required to make this algorithm work properly in the mix of burn and wake-and-sleep workloads. First, the "time slice" given to a "burn" VM must be a fraction (1/2 at most, probably preferrably less) than the amount of credit that VM has. Anyway, take a look, give it a spin, and let me know what you think. I'm still working on the wake latency properties. But scp to an HVM domU, in competition with spinners, works pretty well for me, and scales well with setting weight. Test some other latency-sensitive workloads and let me know how it fares. I still have some more things to tweak, but I wanted to get the basic idea out there so people could discuss it / ask questions / give feedback / turn up workload or situations where it fails miserably. I should make it clear, this is totally an early design prototype. Still needs work; some ugly hacks; scant attention paid to efficient algorithms; promised features (such as cap, reservation) missing. -George Attachment:
credit2-hypervisor.diff Attachment:
credit2-tools.diff _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |