[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes



Am 20.07.2012 10:26, schrieb Andre Przywara:
On 07/20/2012 10:20 AM, Dario Faggioli wrote:
On Thu, 2012-07-19 at 16:22 +0200, Dario Faggioli wrote:
Interesting. That's really the kind of testing we need in order to
fine-tune the details. Thanks for doing this.

Then I started 32 guests, each 4 vCPUs and 1 GB of RAM.
Now since the code prefers free memory so much over free CPUs, the
placement was the following:
node0: guests 2,5,8,11,14,17,20,25,30
node1: guests 21,27
node2: none
node3: none
node4: guests 1,4,7,10,13,16,19,23,29
node5: guests 24,31
node6: guests 3,6,9,12,15,18,22,28
node7: guests 26,32

As you can see, the nodes with more memory are _way_ overloaded, while
the lower memory ones are underutilized. In fact the first 20 guests
didn't use the other nodes at all.
I don't care so much about the two memory-less nodes, but I'd like to
know how you came to the magic "3" in the formula:

+
+ return sign(3*freememkb_diff + nrdomains_diff);
+}


That all being said, this is the first time the patchset had the chance
to run on such a big system, so I'm definitely open to suggestion on how
to make that formula better in reflecting what we think it's The Right
Thing!

Thinking more about this, I realize that I was implicitly assuming some
symmetry in the amount of memory each nodes comes with, which is
probably something I shouldn't have done...

I really am not sure what to do here, perhaps treating the two metrics
more evenly? Or maybe even reverse the logic and give nr_domains more
weight?

I replaced the 3 with 1 already, that didn't change so much. I think we
should kind of reverse the importance of node load, since starving for
CPU time is much worse than bad memory latency. I will do some
experiments...

I was also thinking whether it could be worthwhile to consider the total
number of vcpus on a node instead than the number of domain, but again,
that's not guaranteed to be any more meaningful (suppose there are a lot
of idle vcpus)...

Right, that was my thinking on the ride to work also ;-)
What about this: 1P and 2P guests really use their vCPUs, but for bigger
guests we assume only a fractional usage?

Hmm. I wouldn't be sure about this. I would guess there is a reason a guest has
more than 1 vcpu. Normally this is because the guest needs more power.

Having only 1 vcpu means this is enough. It probably would need only 0.1 vcpus.

A guest with many vcpus suffers much more from a heavy loaded node, as lock
contention in the guest is increasing with number of vcpus and waiting for a
vcpu holding a lock but being preempted.


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.