[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
In the interest of openness (as well as in the interest of taking advantage of all the smart people out there), I'm posting a very early design prototype of the credit2 scheduler. We've had a lot of contributors to the scheduler recently, so I hope that those with interest and knowledge will take a look and let me know what they think at a high level. This first e-mail will discuss the overall goals: the target "sweet spot" use cases to consider, measurable goals for the scheduler, and the target interface / features. This is for general comment. The subsequent e-mail(s?) will include some specific algorithms and changes currently in consideration, as well as some bleeding-edge patches. This will be for people who have a specific interest in the details of the scheduling algorithms. Please feel free to comment / discuss / suggest improvements. 1. Design targets We have three general use cases in mind: Server consolidation, virtual desktop providers, and clients (e.g. XenClient). For servers, our target "sweet spot" for which we will optimize is a system with 2 sockets, 4 cores each socket, and SMT (16 logical cpus). Ideal performance is expected to be reached at about 80% total system cpu utilization; but the system should function reasonably well up to a utilization of 800% (e.g., a load of 8). For virtual desktop systems, we will have a large number of interactive VMs with a lot of shared memory. Most of these will be single-vcpu, or at most 2 vcpus. For client systems, we expect to have 3-4 VMs (including dom0). Systems will probably ahve a single socket with 2 cores and SMT (4 logical cpus). Many VMs will be using PCI pass-through to access network, video, and audio cards. They'll also be running video and audio workloads, which are extremely latency-sensitive. 2. Design goals For each of the target systems and workloads above, we have some high-level goals for the scheduler: * Fairness. In this context, we define "fairness" as the ability to get cpu time proportional to weight. We want to try to make this true even for latency-sensitive workloads such as networking, where long scheduling latency can reduce the throughput, and thus the total amount of time the VM can effectively use. * Good scheduling for latency-sensitive workloads. To the degree we are able, we want this to be true even those which use a significant amount of cpu power: That is, my audio shouldn't break up if I start a cpu hog process in the VM playing the audio. * HT-aware. Running on a logical processor with an idle peer thread is not the same as running on a logical processor with a busy peer thread. The scheduler needs to take this into account when deciding "fairness". * Power-aware. Using as many sockets / cores as possible can increase the total cache size avalable to VMs, and thus (in the absence of inter-VM sharing) increase total computing power; but by keeping multiple sockets and cores powered up, also increases the electrical power used by the system. We want a configurable way to balance between maximizing processing power vs minimizing electrical power. 3. Target interface: The target interface will be similar to credit1: * The basic unit is the VM "weight". When competing for cpu resources, VMs will get a share of the resources proportional to their weight. (e.g., two cpu-hog workloads with weights of 256 and 512 will get 33% and 67% of the cpu, respectively). * Additionally, we will be introducing a "reservation" or "floor". (I'm open to name changes on this one.) This will be a minimum amount of cpu time that a VM can get if it wants it. For example, one could give dom0 a "reservation" of 50%, but leave the weight at 256. No matter how many other VMs run with a weight of 256, dom0 will be guaranteed to get 50% of one cpu if it wants it. * The "cap" functionality of credit1 will be retained. This is a maximum amount of cpu time that a VM can get: i.e., a VM with a cap of 50% will only get half of one cpu, even if the rest of the system is completely idle. * We will also have an interface to the cpu-vs-electrical power. This is yet to be defined. At the hypervisor level, it will probably be a number representing the "badness" of powering up extra cpus / cores. At the tools level, there will probably be the option of either specifying the number, or of using one of 2/3 pre-defined values {power, balance, green/battery}. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |