[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes

To: Andre Przywara <andre.przywara@xxxxxxx>
From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
Date: Fri, 20 Jul 2012 10:38:00 +0200
Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>, Dario Faggioli <raistlin@xxxxxxxx>
Delivery-date: Fri, 20 Jul 2012 08:38:19 +0000
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=wCZIUD5iLxjK3IYuBLxJkaPrYbXzNMbqqAbpIxR9sk+5Dq9dcefgAv3+ pKUSW8VeWQ4oANPiULk/vSbUREpG1nL98Q6elF9OLpPsA2K1jyZREANJY LWteoioRdWS5KKRougF9BLGiLXgRFz6poPknlAMN2rsBbZ9pBpI/XUCbb Yi037ozRtuERy5gIquPGV7jLEzz233yL7de/7jD2mRcY0XQdoixCuKjH0 Bw76E2KfWqqIPtn+H/296/+5GKJ3C;
List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am 20.07.2012 10:26, schrieb Andre Przywara:

On 07/20/2012 10:20 AM, Dario Faggioli wrote:

On Thu, 2012-07-19 at 16:22 +0200, Dario Faggioli wrote:

Interesting. That's really the kind of testing we need in order to
fine-tune the details. Thanks for doing this.

Then I started 32 guests, each 4 vCPUs and 1 GB of RAM.
Now since the code prefers free memory so much over free CPUs, the
placement was the following:
node0: guests 2,5,8,11,14,17,20,25,30
node1: guests 21,27
node2: none
node3: none
node4: guests 1,4,7,10,13,16,19,23,29
node5: guests 24,31
node6: guests 3,6,9,12,15,18,22,28
node7: guests 26,32

As you can see, the nodes with more memory are _way_ overloaded, while
the lower memory ones are underutilized. In fact the first 20 guests
didn't use the other nodes at all.
I don't care so much about the two memory-less nodes, but I'd like to
know how you came to the magic "3" in the formula:

+
+ return sign(3*freememkb_diff + nrdomains_diff);
+}


That all being said, this is the first time the patchset had the chance
to run on such a big system, so I'm definitely open to suggestion on how
to make that formula better in reflecting what we think it's The Right
Thing!

Thinking more about this, I realize that I was implicitly assuming some
symmetry in the amount of memory each nodes comes with, which is
probably something I shouldn't have done...

I really am not sure what to do here, perhaps treating the two metrics
more evenly? Or maybe even reverse the logic and give nr_domains more
weight?


I replaced the 3 with 1 already, that didn't change so much. I think we
should kind of reverse the importance of node load, since starving for
CPU time is much worse than bad memory latency. I will do some
experiments...

I was also thinking whether it could be worthwhile to consider the total
number of vcpus on a node instead than the number of domain, but again,
that's not guaranteed to be any more meaningful (suppose there are a lot
of idle vcpus)...


Right, that was my thinking on the ride to work also ;-)
What about this: 1P and 2P guests really use their vCPUs, but for bigger
guests we assume only a fractional usage?


Hmm. I wouldn't be sure about this. I would guess there is a reason a guest has
more than 1 vcpu. Normally this is because the guest needs more power.

Having only 1 vcpu means this is enough. It probably would need only 0.1 vcpus.

A guest with many vcpus suffers much more from a heavy loaded node, as lock
contention in the guest is increasing with number of vcpus and waiting for a
vcpu holding a lock but being preempted.


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
  - From: Dario Faggioli

References:
- [Xen-devel] [PATCH 0 of 3 v5/leftover] Automatic NUMA placement for xl
  - From: Dario Faggioli
- [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
  - From: Andre Przywara
- Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
  - From: Andre Przywara

Prev by Date: Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
Next by Date: [Xen-devel] [help]What's the relationship between libxl libxc xenstore privcmd hypercall?
Previous by thread: Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
Next by thread: Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.