|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes
On Tue, 2012-07-17 at 16:55 +0100, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 1 of 3 v4/leftover] libxl: enable automatic
> placement of guests on NUMA nodes"):
> > If a domain does not have a VCPU affinity, try to pin it automatically to
> > some
> > PCPUs. This is done taking into account the NUMA characteristics of the
> > host.
> > In fact, we look for a combination of host's NUMA nodes with enough free
> > memory
> > and number of PCPUs for the new domain, and pin it to the VCPUs of those
> > nodes.
>
> Thanks for this admirably clear patch.
>
Thanks to you for looking at it.
> Can you please rewrap your commit messages to around 70 lines ? Many
> VCSs indent them in the log in some situations, and as you see here
> mail programs indent them when quoting too.
>
That should have happened in the first place. As I'm reposting, I'll
take extra attention to that, sorry.
> > +/* Subtract two values and translate the result in [0, 1] */
> > +static double normalized_diff(double a, double b)
> > +{
> > +#define max(a, b) (a > b ? a : b)
> > + if (!a && a == b)
> > + return 0.0;
> > + return (a - b) / max(a, b);
> > +}
>
> 1. This macro max() should be in libxl_internal.h.
> 2. It should be MAX so people are warned it's a macro
> 3. It should have all the necessary ()s for macro precedence safety
>
Ok, will do that.
> > + double freememkb_diff = normalized_diff(c2->free_memkb,
> > c1->free_memkb);
> > + double nrdomains_diff = normalized_diff(c1->nr_domains,
> > c2->nr_domains);
> > +
> > + if (c1->nr_nodes != c2->nr_nodes)
> > + return c1->nr_nodes - c2->nr_nodes;
> > +
> > + return sign(3*freememkb_diff + nrdomains_diff);
>
> The reason you need what sign() does is that you need to convert from
> double to int, I guess.
>
Mostly for that and to make what's happening even more clear.
> > +
> > + /*
> > + * Check if the domain has any CPU affinity. If not, try to build up
> > one.
> > + * In case numa_place_domain() find at least a suitable candidate, it
> > will
> > + * affect info->cpumap accordingly; if it does not, it just leaves it
> > + * as it is. This means (unless some weird error manifests) the
> > subsequent
> > + * call to libxl_set_vcpuaffinity_all() will do the actual placement,
> > + * whatever that turns out to be.
> > + */
> > + if (libxl_bitmap_is_full(&info->cpumap)) {
> > + int rc = numa_place_domain(gc, info);
> > + if (rc)
> > + return rc;
> > + }
>
> I guess it would be preferable to do this only if the bitmap was full
> by default, so that setting the bitmap explicitly to all cpus still
> works.
>
> I'm not sure that that's essential to have, though.
>
I was thinking about this right in this days, and I think it should be
exactly as you say, as one need a mechanism for disabling this thing as
a whole. I really don't think it should take to much to put something
together, even as a separate, future, patch, if these get checked-in.
Thanks.
> > + /*
> > + * Conditions are met, we can add this combination to the
> > + * NUMA placement candidates list. We first make sure there
> > + * is enough space in there, and then we initialize the new
> > + * candidate element with the node map corresponding to the
> > + * combination we are dealing with. Memory allocation for
> > + * expanding the array that hosts the list happens in chunks
> > + * equal to the number of NUMA nodes in the system (to
> > + * avoid allocating memory each and every time we find a
> > + * new candidate).
> > + */
> > + if (*nr_cndts == array_size)
> > + array_size += nr_nodes;
> > + GCREALLOC_ARRAY(new_cndts, array_size);
>
> This part of the algorithm is quadratic in the number of combinations
> divided by the number of nodes. So the algorithm is
> O( ( C( nr_nodes, min_nodes ) / min_nodes )^2 )
> which is quite bad really.
>
I might well be wrong, but I was thinking to it as something like this:
O( C(nr_nodes,(nr_nodes/2)) * nr_nodes )
That's because the external while() is repeated, at most, nr_nodes times
(if min_nodes=1 and max_nodes=nr_nodes). Each of these steps hosts a
for() which visits all the combinations, the maximum number of which
ISTR to be C(nr_nodes,(nr_nodes/2)).
I'm not sure what you meant when putting min_nodes up there in your
formula (min_nodes is likely to be 1 most of the cases...), so I can't
get numbers and compare them, but it looked (looks?) a bit less bad to
me... Or did I make some obvious mistake I'm not seeing right now? :-(
> At the very least this needs to be an exponential allocation, eg
> + array_size += nr_nodes + array_size;
>
> But if you didn't insist on creating the whole list and sorting it,
> you would avoid this allocation entirely, wouldn't you ?
>
I'll kill the separation between candidate identification and sorting:
that's easy, quick, and will make George happy as well. :-)
> Should we bve worried that this algorithm will be too slow even if it
> involves just
> O( C(nr_nodes,min_nodes) )
> iterations ?
>
I'm commenting about this in the other thread you opened...
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |