[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote: > On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli > <dario.faggioli@xxxxxxxxxx> wrote: > > In fact, among placement candidates with the same number of nodes, the > > closer the various nodes are to each others, the better the performances > > for a domain placed there. > > Looks good overall -- my only worry is the N^2 nature of the > algorithm. We're already doing some big combinatorial thing to > generate the candidates, right? > It is, with N being the number of nodes, which we discussed thoroughly already a couple of months ago, and reached consensus on the fact that N will stay less than 8 for the next 5 (but probably even more) years. :-) In any case, if something really unexpected happens, and N jumps to anything bigger than 16, the placement algorithm won't even start, and we'll never reach this point!. Moreover, given the number we're playing with, I don't think this specific patch is adding much complexity, as we already have the function that counts the number of vCPUs (as it was for xend) bound to a candidate, which is Ndoms*Nvcpus, and we're very likely going o have much more domains than nodes. :-) > And now we're doing N^2 for each > candidate? > Again, yes, but that is turning it from Ndoms*Nvcpus to Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC, Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which means 50*2 vs 8*8. > Suppose we get an ARM system with 4096 cores and 128 NUMA > nodes? If Xen 4.4 doesn't come out until March 2014, there will still > be distros using 4.3 through mid-2015. > Right, but I really don't think that monster is actually made out of 4096 cores arranged in 128 _NUMA_ nodes on which you run the same instance of the hypervisor. I also recall hearing the numbers and the use of the word "node", but I really think they was rather referred to a cluster architecture where "a node" means something more like "a server", each one running their copy of Xen (although they'll be packed all together in the same rack, talking via some super-fast interconnect). I'm pretty sure I remember Stefano speculating about the need to use some orchestration layer (like {Cloud,Open}Stack) _within_ those big irons to deal exactly with that... Stefano, am I talking nonsense? :-D Finally, allow me to say that the whole placement algorithm already interacts quite nicely with cpupools. Thus, even in the unlikely event of an actual 128 NUMA nodes machine, you can have, say, 16 cpupools with 8 nodes each (or vice versa), and the algorithm will be back dealing with _no_more_than_ 8 (or 16) nodes. Yes, right now this would require for someone to manually setup the pools and decide which domain to put where. However, it would be very very easy to add, at that point, something doing this pooling and more coarse placing automatically (and quickly). In fact, we can even think about having it for 4.3, if you really believe it's necessary. > I seem to remember having a discussion about this issue already, but I > can't remember what the outcome was... > Yep, we did, and the outcome was right what I tried to summarize above. :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |