[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes



On 15/06/12 18:04, Dario Faggioli wrote:
If a domain does not have a VCPU affinity, try to pin it automatically to some
PCPUs. This is done taking into account the NUMA characteristics of the host.
In fact, we look for a combination of host's NUMA nodes with enough free memory
and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes.

Once we know which ones, among all the possible combinations, represents valid
placement candidates for a domain, use some heuistics for deciding which is the
best. For instance, smaller candidates are considered to be better, both from
the domain's point of view (fewer memory spreading among nodes) and from the
system as a whole point of view (fewer memoy fragmentation).  In case of
candidates of equal sizes (i.e., with the same number of nodes), the one with
the greater amount of memory wins, as this is also good for keeping memory
fragmentation under control.

This all happens internally to libxl, and no API for driving the mechanism is
provided for now. This matches what xend already does.

Signed-off-by: Dario Faggioli<dario.faggioli@xxxxxxxxxx>
Overall I think this approach is much better. I haven't done an extremely detailed review, but I do have some comments below.
+    /*
+     * Round up and down some of the constraints. For instance, the minimum
+     * number of cpus a candidate should have must at least be non-negative.
+     * Regarding the minimum number of NUMA nodes, if not explicitly specified
+     * (i.e., min_nodes<= 0), we try to figure out a sensible number of nodes
+     * from where to start generating candidates, if possible (or just start
+     * from 1 otherwise). The maximum number of nodes should not exceed the
+     * number of existent NUMA nodes on the host, or the candidate genaration
+     * won't work properly.
+     */
+    min_cpus = min_cpus<= 0 ? 0 : min_cpus;
Wouldn't it just make more sense to specify that "min_cpus" (and other parameters) had to be >=0?

In any case, this is a very complicated way of saying "if (min_cpus<0) min_cpus = 0;".
+    if (min_nodes<= 0) {
+        int cpus_per_node;
+
+        cpus_per_node = cpus_per_node_count(tinfo, nr_cpus, ninfo, nr_nodes);
+        if (cpus_per_node == 0)
+            min_nodes = 1;
+        else
+            min_nodes = (min_cpus + cpus_per_node - 1) / cpus_per_node;
+    }
Same here; you could just write "if(!min_nodes) {..."
+    min_nodes = min_nodes>  nr_nodes ? nr_nodes : min_nodes;
+    if (max_nodes<= 0)
+        max_nodes = nr_nodes;
+    else
+        max_nodes = max_nodes>  nr_nodes ? nr_nodes : max_nodes;
if (max_nodes == 0 || max_nodes > nr_nodes) max_nodes = nr_nodes;
+
+/*
+ * The NUMA placement candidates are reordered according to the following
+ * heuristics:
+ *  - candidates involving fewer nodes come first. In case two (or
+ *    more) candidates span the same number of nodes,
+ *  - candidates with greater amount of free memory come first. In
+ *    case two (or more) candidates differ in their amount of free
+ *    memory by less than 10%,
Interesting idea -- sounds pretty reasonable.
+ *  - candidates with fewer domains insisting on them at the time of
+ *    this call come first.
Do you mean "existing"?  I think "assigned to" is probably better.
+ */
+static int numa_cmpf(const void *v1, const void *v2)
+{
+    const libxl__numa_candidate *c1 = (const libxl__numa_candidate*) v1;
+    const libxl__numa_candidate *c2 = (const libxl__numa_candidate*) v2;
+    double mem_diff = labs(c1->free_memkb - c2->free_memkb);
+    double mem_avg = (c1->free_memkb + c2->free_memkb) / 2.0;
+
+    if (c1->nr_nodes != c2->nr_nodes)
+        return c1->nr_nodes - c2->nr_nodes;
+
+    if ((mem_diff / mem_avg) * 100.0<  10.0&&
+        c1->nr_domains != c2->nr_domains)
+        return c1->nr_domains - c2->nr_domains;
I realize this isn't a hot path, but it seems like moving into FP is really unnecessary. You can just do this:

if ( ((mem_diff * 100) / mem_avg) < 10 ...

One minor note: I personally think it's a lot more readable to put the '&&' and '||' on the same line as the next item, rather than the previous item; i.e.:
 if ( expression_a
&& expression_b )

One more thing: Is there a reason why you put get_numa_candidates() in libxl_internal.h, but not sort_numa_candidates()? It seems like both or neither should go. :-)

That's all I have for now. I'm OK with the general approach, so here's a "weak ack", so if a maintainer is happy with the code, he can check it in:

Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

I'll try to come back and give a more detailed code review if I get a chance.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.