Xen project Mailing List

Re: [Xen-devel] [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>

Date: Tue, 09 Oct 2012 12:02:00 +0200

Cc: Marcus Granado <Marcus.Granado@xxxxxxxxxxxxx>, Andre Przywara <andre.przywara@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>

Delivery-date: Tue, 09 Oct 2012 10:02:32 +0000

Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=s0YxwfsQYUmNpa+HanAsErz9FPlJ36AfR1WFmiIoZDNpVB7s1/LwiaV2 6VQ1NML35SxN4jgnJA9RQwzPx3eoeBqIu/SQjH05oEwcjIUbeYuoBbzyg RrQPOwt6kuIaDapW91wVFvaLk9GBaaH4iC2AHSONs0NnMFi2PEFOYt94X WoxJLpbrX37oH7bSLNtEhV49BZ1j2qtnQ8DWOZQ/ctJ/rUDXLLENPT7+u +Uw0Z0ANICSjGMMBUtSe7d8+R1VEV;

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am 05.10.2012 16:08, schrieb Dario Faggioli:

Hi Everyone,

Here it comes a patch series instilling some NUMA awareness in the Credit
scheduler.

What the patches do is teaching the Xen's scheduler how to try maximizing
performances on a NUMA host, taking advantage of the information coming from
the automatic NUMA placement we have in libxl.  Right now, the
placement algorithm runs and selects a node (or a set of nodes) where it is best
to put a new domain on. Then, all the memory for the new domain is allocated
from those node(s) and all the vCPUs of the new domain are pinned to the pCPUs
of those node(s). What we do here is, instead of statically pinning the domain's
vCPUs to the nodes' pCPUs, have the (Credit) scheduler _prefer_ running them
there. That enables most of the performances benefits of "real" pinning, but
without its intrinsic lack of flexibility.

The above happens by extending to the scheduler the knowledge of a domain's
node-affinity. We then ask it to first try to run the domain's vCPUs on one of
the nodes the domain has affinity with. Of course, if that turns out to be
impossible, it falls back on the old behaviour (i.e., considering vcpu-affinity
only).

Just allow me to mention that NUMA aware scheduling not only is one of the item
of the NUMA roadmap I'm trying to maintain here
http://wiki.xen.org/wiki/Xen_NUMA_Roadmap. It is also one of the features we
decided we want for Xen 4.3 (and thus it is part of the list of such features
that George is maintaining).

Up to now, I've been able to thoroughly test this only on my 2 NUMA nodes
testbox, by running the SpecJBB2005 benchmark concurrently on multiple VMs, and
the results looks really nice.  A full set of what I got can be found inside my
presentation from last XenSummit, which is available here:

  
http://www.slideshare.net/xen_com_mgr/numa-and-virtualization-the-case-of-xen?ref=http://www.xen.org/xensummit/xs12na_talks/T9.html

However, I rerun some of the tests in these last days (since I changed some
bits of the implementation) and here's what I got:

-------------------------------------------------------
  SpecJBB2005 Total Aggregate Throughput
-------------------------------------------------------
#VMs       No NUMA affinity     NUMA affinity&    +/- %
                                   scheduling
-------------------------------------------------------
    2            34653.273          40243.015    +16.13%
    4            29883.057          35526.807    +18.88%
    6            23512.926          27015.786    +14.89%
    8            19120.243          21825.818    +14.15%
   10            15676.675          17701.472    +12.91%

Basically, results are consistent with what is shown in the super-nice graphs I
have in the slides above! :-) As said, this looks nice to me, especially
considering that my test machine is quite small, i.e., its 2 nodes are very
close to each others from a latency point of view. I really expect more
improvement on bigger hardware, where much greater NUMA effect is to be
expected.  Of course, I myself will continue benchmarking (hopefully, on
systems with more than 2 nodes too), but should anyone want to run its own
testing, that would be great, so feel free to do that and report results to me
and/or to the list!

A little bit more about the series:

  1/8 xen, libxc: rename xenctl_cpumap to xenctl_bitmap
  2/8 xen, libxc: introduce node maps and masks

Is some preparation work.

  3/8 xen: let the (credit) scheduler know about `node affinity`

Is where the vcpu load balancing logic of the credit scheduler is modified to
support node-affinity.

  4/8 xen: allow for explicitly specifying node-affinity
  5/8 libxc: allow for explicitly specifying node-affinity
  6/8 libxl: allow for explicitly specifying node-affinity
  7/8 libxl: automatic placement deals with node-affinity

Is what wires the in-scheduler node-affinity support with the external world.
Please, note that patch 4 touches XSM and Flask, which is the area with which I
have less experience and less chance to test properly. So, If Daniel and/or
anyone interested in that could take a look and comment, that would be awesome.

  8/8 xl: report node-affinity for domains

Is just some small output enhancement.

Apart from the minor comment to Patch 3: Acked-by: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.