[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH 00 of 10 v2] NUMA aware credit scheduling
Hello Everyone, Here it is the take 2 of the NUMA aware credit scheduling series. Sorry it took a bit, but I had to take care of those nasty bugs causing scheduling anomalies, as they were getting in the way and messing up the numbers when trying to evaluate performances of this! :-) I also rewrote most of the core of the two step vcpu and node affinity balancing algorithm, as per George's suggestion during last round, to try to squeeze a little bit more of performances improvement. As already and repeatedly said, what the series does is providing the (credit) scheduler with the knowledge of a domain's node-affinity. It will then always try to run domain's vCPUs on one of those nodes first. Only if that turns out to be impossible, it falls back to the old behaviour. (BTW, for any update on the status of my "quest" about improving NUMA support in Xen, look http://wiki.xen.org/wiki/Xen_NUMA_Roadmap.) I rerun my usual benchmark, SpecJBB2005, plus some others, i.e., some configurations of sysbench and lmbench. A little bit more about them follows: * SpecJBB is all about throughput, so pinning is likely the ideal solution. * Sysbench-memory is the time it takes for writing a fixed amount of memory (and then it is the throughput that is measured). What we expect is locality to be important, but at the same time the potential imbalances due to pinning could have a say in it. * Lmbench-proc is the time it takes for a process to fork a fixed amount of children. This is much more about latency than throughput, with locality of memory accesses playing a smaller role and, again, imbalances due to pinning being a potential issue. On a 2 nodes, 16 cores system, where I can have 2 to 10 VMs (2 vCPUs each) executing the benchmarks concurrently, here's what I get: ---------------------------------------------------- | SpecJBB2005, throughput (the higher the better) | ---------------------------------------------------- | #VMs | No affinity | Pinning | NUMA scheduling | | 2 | 43451.853 | 49876.750 | 49693.653 | | 6 | 29368.589 | 33782.132 | 33692.936 | | 10 | 19138.934 | 21950.696 | 21413.311 | ---------------------------------------------------- | Sysbench memory, throughput (the higher the better) ---------------------------------------------------- | #VMs | No affinity | Pinning | NUMA scheduling | | 2 | 484.42167 | 552.32667 | 552.86167 | | 6 | 404.43667 | 440.00056 | 449.42611 | | 10 | 296.45600 | 315.51733 | 331.49067 | ---------------------------------------------------- | LMBench proc, latency (the lower the better) | ---------------------------------------------------- | #VMs | No affinity | Pinning | NUMA scheduling | ---------------------------------------------------- | 2 | 824.00437 | 749.51892 | 741.42952 | | 6 | 942.39442 | 985.02761 | 974.94700 | | 10 | 1254.3121 | 1363.0792 | 1301.2917 | ---------------------------------------------------- Which, reasoning in terms of %-performances increase/decrease, means NUMA aware scheduling does as follows, as compared to no affinity at all and to pinning: ---------------------------------- | SpecJBB2005 (throughput) | ---------------------------------- | #VMs | No affinity | Pinning | | 2 | +14.36% | -0.36% | | 6 | +14.72% | -0.26% | | 10 | +11.88% | -2.44% | ---------------------------------- | Sysbench memory (throughput) | ---------------------------------- | #VMs | No affinity | Pinning | | 2 | +14.12% | +0.09% | | 6 | +11.12% | +2.14% | | 10 | +11.81% | +5.06% | ---------------------------------- | LMBench proc (latency) | ---------------------------------- | #VMs | No affinity | Pinning | ---------------------------------- | 2 | +10.02% | +1.07% | | 6 | +3.45% | +1.02% | | 10 | +2.94% | +4.53% | ---------------------------------- Numbers seem to tell we're being successful in taking advantage of both the improved locality (when compared to no affinity) and the greater flexibility the NUMA aware scheduling approach gives us (when compared to pinning). In fact, when throughput only is concerned (SpecJBB case), it behaves almost on par with pinning, and a lot better than no affinity at all. Moreover, we're even able to do better than them both, when latency comes a little bit more into the game and the imbalances caused by pinning would make things worse than not having any affinity, like in the sysbench and, especially, in the LMBench case. Here are the patches included in the series. I '*'-ed ones already received one or more acks during v1. However, there are patches that were significantly reworked since then. In that case, I just ignored that, and left them with my SOB only, as I think they definitely need to be re-reviewd. :-) * [ 1/10] xen, libxc: rename xenctl_cpumap to xenctl_bitmap * [ 2/10] xen, libxc: introduce node maps and masks [ 3/10] xen: sched_credit: let the scheduler know about node-affinity [ 4/10] xen: allow for explicitly specifying node-affinity * [ 5/10] libxc: allow for explicitly specifying node-affinity * [ 6/10] libxl: allow for explicitly specifying node-affinity [ 7/10] libxl: optimize the calculation of how many VCPUs can run on a candidate * [ 8/10] libxl: automatic placement deals with node-affinity * [ 9/10] xl: add node-affinity to the output of `xl list` [10/10] docs: rearrange and update NUMA placement documentation Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |