[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement

On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
> <dario.faggioli@xxxxxxxxxx> wrote:
> > In fact, among placement candidates with the same number of nodes, the
> > closer the various nodes are to each others, the better the performances
> > for a domain placed there.
> Looks good overall -- my only worry is the N^2 nature of the
> algorithm.  We're already doing some big combinatorial thing to
> generate the candidates, right?  
It is, with N being the number of nodes, which we discussed thoroughly
already a couple of months ago, and reached consensus on the fact that N
will stay less than 8 for the next 5 (but probably even more) years. :-)

In any case, if something really unexpected happens, and N jumps to
anything bigger than 16, the placement algorithm won't even start, and
we'll never reach this point!.

Moreover, given the number we're playing with, I don't think this
specific patch is adding much complexity, as we already have the
function that counts the number of vCPUs (as it was for xend) bound to a
candidate, which is Ndoms*Nvcpus, and we're very likely going o have
much more domains than nodes. :-)

> And now we're doing N^2 for each
> candidate? 
Again, yes, but that is turning it from Ndoms*Nvcpus to
Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC,
Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which
means 50*2 vs 8*8.

> Suppose we get an ARM system with 4096 cores and 128 NUMA
> nodes?  If Xen 4.4 doesn't come out until March 2014, there will still
> be distros using 4.3 through mid-2015.
Right, but I really don't think that monster is actually made out of
4096 cores arranged in 128 _NUMA_ nodes on which you run the same
instance of the hypervisor.

I also recall hearing the numbers and the use of the word "node", but I
really think they was rather referred to a cluster architecture where "a
node" means something more like "a server", each one running their copy
of Xen (although they'll be packed all together in the same rack,
talking via some super-fast interconnect).
I'm pretty sure I remember Stefano speculating about the need to use
some orchestration layer (like {Cloud,Open}Stack) _within_ those big
irons to deal exactly with that... Stefano, am I talking nonsense? :-D

Finally, allow me to say that the whole placement algorithm already
interacts quite nicely with cpupools. Thus, even in the unlikely event
of an actual 128 NUMA nodes machine, you can have, say, 16 cpupools with
8 nodes each (or vice versa), and the algorithm will be back dealing
with _no_more_than_ 8 (or 16) nodes. Yes, right now this would require
for someone to manually setup the pools and decide which domain to put
where. However, it would be very very easy to add, at that point,
something doing this pooling and more coarse placing automatically (and
quickly). In fact, we can even think about having it for 4.3, if you
really believe it's necessary.

> I seem to remember having a discussion about this issue already, but I
> can't remember what the outcome was...
Yep, we did, and the outcome was right what I tried to summarize
above. :-)

Thanks and Regards,

<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.