[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
George Dunlap wrote: 1. Design targets We have three general use cases in mind: Server consolidation, virtual desktop providers, and clients (e.g. XenClient). For servers, our target "sweet spot" for which we will optimize is a system with 2 sockets, 4 cores each socket, and SMT (16 logical cpus). Ideal performance is expected to be reached at about 80% total system cpu utilization; but the system should function reasonably well up to a utilization of 800% (e.g., a load of 8). Is that forward-looking enough? That hardware is currently available; what's going to be commonplace in 2-3 years? For virtual desktop systems, we will have a large number of interactive VMs with a lot of shared memory. Most of these will be single-vcpu, or at most 2 vcpus. For client systems, we expect to have 3-4 VMs (including dom0). Systems will probably ahve a single socket with 2 cores and SMT (4 logical cpus). Many VMs will be using PCI pass-through to access network, video, and audio cards. They'll also be running video and audio workloads, which are extremely latency-sensitive. 2. Design goals For each of the target systems and workloads above, we have some high-level goals for the scheduler: * Fairness. In this context, we define "fairness" as the ability to get cpu time proportional to weight. We want to try to make this true even for latency-sensitive workloads such as networking, where long scheduling latency can reduce the throughput, and thus the total amount of time the VM can effectively use. * Good scheduling for latency-sensitive workloads. To the degree we are able, we want this to be true even those which use a significant amount of cpu power: That is, my audio shouldn't break up if I start a cpu hog process in the VM playing the audio. * HT-aware. Running on a logical processor with an idle peer thread is not the same as running on a logical processor with a busy peer thread. The scheduler needs to take this into account when deciding "fairness". Would it be worth just pair-scheduling HT threads so they're always running in the same domain? * Power-aware. Using as many sockets / cores as possible can increase the total cache size avalable to VMs, and thus (in the absence of inter-VM sharing) increase total computing power; but by keeping multiple sockets and cores powered up, also increases the electrical power used by the system. We want a configurable way to balance between maximizing processing power vs minimizing electrical power. I don't remember if there's a proper term for this, but what about having multiple domains sharing the same scheduling context, so that a stub domain can be co-scheduled with its main domain, rather than having them treated separately? Also, a somewhat related point, some kind of directed schedule so that when one vcpu is synchronously waiting on anohter vcpu, have it directly hand over its pcpu to avoid any cross-cpu overhead (including the ability to take advantage of directly using hot cache lines). That would be useful for intra-domain IPIs, etc, but also inter-domain context switches (domain<->stub, frontend<->backend, etc). 3. Target interface: The target interface will be similar to credit1: * The basic unit is the VM "weight". When competing for cpu resources, VMs will get a share of the resources proportional to their weight. (e.g., two cpu-hog workloads with weights of 256 and 512 will get 33% and 67% of the cpu, respectively). * Additionally, we will be introducing a "reservation" or "floor". (I'm open to name changes on this one.) This will be a minimum amount of cpu time that a VM can get if it wants it. For example, one could give dom0 a "reservation" of 50%, but leave the weight at 256. No matter how many other VMs run with a weight of 256, dom0 will be guaranteed to get 50% of one cpu if it wants it. How does the reservation interact with the credits? Is the reservtion in addition to its credits, or does using the reservation consume them? * The "cap" functionality of credit1 will be retained. This is a maximum amount of cpu time that a VM can get: i.e., a VM with a cap of 50% will only get half of one cpu, even if the rest of the system is completely idle. * We will also have an interface to the cpu-vs-electrical power. This is yet to be defined. At the hypervisor level, it will probably be a number representing the "badness" of powering up extra cpus / cores. At the tools level, there will probably be the option of either specifying the number, or of using one of 2/3 pre-defined values {power, balance, green/battery}. Is it worth taking into account the power cost of cache misses vs hits?Do vcpus running on pcpus running at less than 100% speed consume fewer credits? Is there any explicit interface to cpu power state management, or would that be decoupled? J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |