[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] Scheduler work, part 1: High-level goals and interface.
Tian, Kevin wrote: >> From: Jeremy Fitzhardinge >> Sent: 2009年4月10日 2:42 >> >> George Dunlap wrote: >> >>> 1. Design targets >>> >>> We have three general use cases in mind: Server >>> >> consolidation, virtual >> >>> desktop providers, and clients (e.g. XenClient). >>> >>> For servers, our target "sweet spot" for which we will optimize is a >>> system with 2 sockets, 4 cores each socket, and SMT (16 >>> >> logical cpus). >> >>> Ideal performance is expected to be reached at about 80% total system >>> cpu utilization; but the system should function reasonably well up to >>> a utilization of 800% (e.g., a load of 8). >>> >>> >> Is that forward-looking enough? That hardware is currently available; >> what's going to be commonplace in 2-3 years? >> > > good point. > > >>> * HT-aware. >>> >>> Running on a logical processor with an idle peer thread is not the >>> same as running on a logical processor with a busy peer thread. The >>> scheduler needs to take this into account when deciding "fairness". >>> >>> >> Would it be worth just pair-scheduling HT threads so they're always >> running in the same domain? >> > > running same domain doesn't help fairness and instead, it worsens. > I don't know what the performance characteristics of modern-HT is, but in P4-HT the throughput of a given thread was very dependent on what the other thread was doing. If its competing with some other arbitrary domain, then its hard to make any estimates about what the throughput of a given vcpu's thread is. If we present them as sibling pairs to guests, then it becomes the guest OS's problem (ie, we don't try to hide the true nature of these pcpus). That's fairer for the guest, because they know what they're getting, and Xen can charge the guest for cpu use on a thread-pair, rather than trying to work out how the two threads compete. In other words, if only one thread is running, then it can charge max-thread-throughput; if both are running, it can charge max-core-throughput (possibly scaled by whatever performance mode the core is running in). >>> * Power-aware. >>> >>> Using as many sockets / cores as possible can increase the >>> >> total cache >> >>> size avalable to VMs, and thus (in the absence of inter-VM sharing) >>> increase total computing power; but by keeping multiple sockets and >>> cores powered up, also increases the electrical power used by the >>> system. We want a configurable way to balance between maximizing >>> processing power vs minimizing electrical power. >>> >>> >> I don't remember if there's a proper term for this, but what about >> having multiple domains sharing the same scheduling context, so that a >> stub domain can be co-scheduled with its main domain, rather >> than having >> them treated separately? >> > > This is really desired. > > >> Also, a somewhat related point, some kind of directed schedule so that >> when one vcpu is synchronously waiting on anohter vcpu, have >> it directly >> hand over its pcpu to avoid any cross-cpu overhead (including the >> ability to take advantage of directly using hot cache lines). That >> would be useful for intra-domain IPIs, etc, but also inter-domain >> context switches (domain<->stub, frontend<->backend, etc). >> > > The hard part here is to find the hint on WHICH vcpu that given > cpu is waiting, which is not straightforward. Of course stub > domain is most possible example, but it may be already cleanly > addressed if above co-scheduling could be added? :-) > I'm being unclear by conflating two issues. One is that when dom0 (or driver domain) does some work on behalf of a guest, it seems like it would be useful for the time used to be credited against the guest rather than against dom0. My thought is that, rather than having the scheduler parameters be the implicit result of "vcpu A belongs to domain X, charge X", each vcpu has a charging domain which can be updated via (privileged) hypercall. When dom0 is about to do some work, it updates the charging domain accordingly (with some machinery to make that a per-task property within the kernel so that task context switches update the vcpu state appropriately). A further extension would be the idea of charging grants, where domain A could grant domain B charging rights, and B could set its vcpus to charge A as an unprivileged operation. As with grant tables, revocation poses some interesting problems. This is a generalization of coscheduled stub domains, because you could achieve the same effect by making the stub domain simply switch all its vcpus to charge its main domain. How to schedule vcpus? They could either be scheduled as if they were part of the other domain; or be scheduled with their "home" domain, but their time spent is charged against the other domain. The former is effectively priority inheritance, and raises all the the normal issues - but it would be appropriate for co-scheduled stub domains. The latter makes more sense for dom0, but its less clear what it actually means: does it consume any home domain credits? What happens if the other domain's credits are all consumed? Could two domains collude to get more than their fair share of cpu? The second issue is trying to share pcpu resources between vcpus where appropriate. The obvious case is doing some kind of cross-domain copy operation, where the data could well be hot in cache, so if you use the same pcpu you can just get cache hits. Of course there's the tradeoff that you're necessarily serialising things which could be done in parallel, so perhaps it doesn't work well in practice. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |