|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
George Dunlap wrote: 1. Design targets We have three general use cases in mind: Server consolidation, virtual desktop providers, and clients (e.g. XenClient). For servers, our target "sweet spot" for which we will optimize is a system with 2 sockets, 4 cores each socket, and SMT (16 logical cpus). Ideal performance is expected to be reached at about 80% total system cpu utilization; but the system should function reasonably well up to a utilization of 800% (e.g., a load of 8). Is that forward-looking enough? That hardware is currently available; what's going to be commonplace in 2-3 years? For virtual desktop systems, we will have a large number of interactive VMs with a lot of shared memory. Most of these will be single-vcpu, or at most 2 vcpus. For client systems, we expect to have 3-4 VMs (including dom0). Systems will probably ahve a single socket with 2 cores and SMT (4 logical cpus). Many VMs will be using PCI pass-through to access network, video, and audio cards. They'll also be running video and audio workloads, which are extremely latency-sensitive. 2. Design goals For each of the target systems and workloads above, we have some high-level goals for the scheduler: * Fairness. In this context, we define "fairness" as the ability to get cpu time proportional to weight. We want to try to make this true even for latency-sensitive workloads such as networking, where long scheduling latency can reduce the throughput, and thus the total amount of time the VM can effectively use. * Good scheduling for latency-sensitive workloads. To the degree we are able, we want this to be true even those which use a significant amount of cpu power: That is, my audio shouldn't break up if I start a cpu hog process in the VM playing the audio. * HT-aware. Running on a logical processor with an idle peer thread is not the same as running on a logical processor with a busy peer thread. The scheduler needs to take this into account when deciding "fairness". Would it be worth just pair-scheduling HT threads so they're always running in the same domain? * Power-aware. Using as many sockets / cores as possible can increase the total cache size avalable to VMs, and thus (in the absence of inter-VM sharing) increase total computing power; but by keeping multiple sockets and cores powered up, also increases the electrical power used by the system. We want a configurable way to balance between maximizing processing power vs minimizing electrical power. I don't remember if there's a proper term for this, but what about having multiple domains sharing the same scheduling context, so that a stub domain can be co-scheduled with its main domain, rather than having them treated separately? Also, a somewhat related point, some kind of directed schedule so that when one vcpu is synchronously waiting on anohter vcpu, have it directly hand over its pcpu to avoid any cross-cpu overhead (including the ability to take advantage of directly using hot cache lines). That would be useful for intra-domain IPIs, etc, but also inter-domain context switches (domain<->stub, frontend<->backend, etc). 3. Target interface: The target interface will be similar to credit1: * The basic unit is the VM "weight". When competing for cpu resources, VMs will get a share of the resources proportional to their weight. (e.g., two cpu-hog workloads with weights of 256 and 512 will get 33% and 67% of the cpu, respectively). * Additionally, we will be introducing a "reservation" or "floor". (I'm open to name changes on this one.) This will be a minimum amount of cpu time that a VM can get if it wants it. For example, one could give dom0 a "reservation" of 50%, but leave the weight at 256. No matter how many other VMs run with a weight of 256, dom0 will be guaranteed to get 50% of one cpu if it wants it. How does the reservation interact with the credits? Is the reservtion in addition to its credits, or does using the reservation consume them? Is it worth taking into account the power cost of cache misses vs hits?Do vcpus running on pcpus running at less than 100% speed consume fewer credits? Is there any explicit interface to cpu power state management, or would that be decoupled? J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |