[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 10 of 10 [RFC]] xl: Some automatic NUMA placement documentation



On Thu, 2012-04-12 at 10:11 +0100, Ian Campbell wrote:
> On Wed, 2012-04-11 at 14:17 +0100, Dario Faggioli wrote:
> > Add some rationale and usage documentation for the new automatic
> > NUMA placement feature of xl.
> > 
> > TODO: * Decide whether we want to have things like "Future Steps/Roadmap"
> >         and/or "Performances/Benchmarks Results" here as well.
> 
> I think these would be better in the list archives and on the wiki
> respectively.
> 
Ok, fine. I already posted the link in this thread and will continue to
do so, as I'll put together a blog post and a wiki page about
benchmarks.

As for future steps/roadmap, let's first see what comes out from this
series... :-)

> > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> > 
> > diff --git a/docs/misc/xl-numa-placement.txt 
> > b/docs/misc/xl-numa-placement.txt
> > new file mode 100644
> > --- /dev/null
> > +++ b/docs/misc/xl-numa-placement.txt
> 
> It looks like you are using something approximating markdown syntax
> here, so you might as well name this xl-numa-placement.markdown and get
> a .html version etc almost for free.
> 
Actually, that was another question I had and forgot to ask, i.e., what
format should this file come with. I sort of took inspiration from
xl-disk-configuration.txt and went for a plain text file, but I of
course can go for a full-fledged markdown syntax. Thanks.

> > +Of course, if a domain is known to only run on a subset of the physical
> > +CPUs of the host, it is very easy to turn all its memory accesses into
> > +local ones, by just constructing it's node affinity (in Xen) basing on
> 
>                                                                 ^based
> 
Ok, to this ans to all typos/english howlers as well. Thanks a lot for
looking into this! :-)

> > + * `nodes = [ '0', '1' ]` and `cpus = "0"`, with CPU 0 within node 0:
> > +   (i.e., cpu affinity subset of node affinity):
> > +     domain's vcpus can and will only run on host CPU 0. As node affinity
> > +     is being explicitly set to host NUMA nodes 0 and 1 --- which includes
> > +     CPU 0 --- all the memory access of the domain will be local;
> 
> In this case won't some of (half?) the memory come from node 1 and
> therefore be non-local to cpu 0?
> 
Oops, yep, you're right, that's not what I meant to write!

> > +
> > + * `nodes = [ '0', '1' ]` and `cpus = "0, 4", with CPU 0 in node 0 but
> > +   CPU 4 in, say, node 2 (i.e., cpu affinity superset of node affinity):
> > +     domain's vcpus can run on host CPUs 0 and 4, with CPU 4 not being 
> > within
> > +     the node affinity (explicitly set to host NUMA nodes 0 and 1). The
> > +     (credit) scheduler will try to keep memory accesses local by 
> > scheduling
> > +     the domain's vcpus on CPU 0, but it may not achieve 100% success;
> > +
> > + * `nodes = [ '0', '1' ]` and `cpus = "4"`, with CPU 4 within, say, node 2
> 
> These examples might be a little clearer if you defined up front what
> the nodes and cpus were and then used that for all of them?
> 
Good, idea, I will do that.

> A bunch of what follows would be good to have in the xl or xl.cfg man
> pages too/instead. (I started with this docs patch so I haven't actually
> looked at the earlier ones yet, perhaps this is already the case)
> 
Single patches that introduces the various features tries to document
them as well, but not with this level of details. I'm fine with putting
there whatever you think it could fit, just le me know, perhaps on the
comments on those patches, or whatever you like. 

> > +
> > + * "auto": automatic placement by means of a not better specified (xl
> > +           implementation dependant) algorithm. It is basically for those
> > +           who do want automatic placement, but have no idea what policy
> > +           or algorithm would be better... <<Just give me a sane default!>>
> > +
> > + * "ffit": automatic placement via the First Fit algorithm, applied 
> > checking
> > +           the memory requirement of the domain against the amount of free
> > +           memory in the various host NUMA nodes;
> > +
> > + * "bfit": automatic placement via the Best Fit algorithm, applied checking
> > +           the memory requirement of the domain against the amount of free
> > +           memory in the various host NUMA nodes;
> > +
> > + * "wfit": automatic placement via the Worst Fit algorithm, applied 
> > checking
> > +           the memory requirement of the domain against the amount of free
> > +           memory in the various host NUMA nodes;
> >
> > <snip>
> >
> > + * `nodes_policy="auto"` (or `"ffit"`, `"bfit"`, `"wfit"`) and `nodes=2`:
> > +     xl will try fitting the domain on the host NUMA nodes by using the
> > +     requested policy and only the number of nodes specified in `nodes=`
> > +     (2 in this example).
> 
> Number of nodes rather than specifically node 2? This is different to
> the examples in the preceding section?
> 
It is. I'll try to clarify things as per your suggestion. However,
talking about syntax, here's what the series allows "nodes" and
"nodes_policy" to be:

 * "nodes=": - a list (`[ '0', '3' ]`), and in this case the elements 
               of the list are specific nodes you want to use;
             - an integer (`2`), and in this case that is the _number_ 
               of nodes you want to use, with the algorithm free to
               arbitrary decide which ones to pick;
             - the string `"auto"`, and in this case you tell the 
               algorithm: <<please, do whatever you like and make me  
               happy>> :-)

 * "nodes_policy=" - the string `"auto"`, the same as above
                   - the strings `"ffit"`, `"bfit"` and `"wfit"`, with 
                     the meaning reported by the doc in he patch.

There is some overlapping but I wanted to make it possible for one to
write just things like:

nodes = [ '0', '3' ]

or: 

nodes = "auto"

or:

nodes_policy = "wfit"
nodes = 2

without introducing too much different options. On the down side, this
could obviously lead to awkward or nonsensical combinations... I tried
to intercept the worst of them during config file parsing, and can
surely push this farther.

So the important question here is, besides from the fact I'll try to
clarify things better, do you think the interface is both comprehensive
and clear enough? Or should we think to something different?

Thanks a lot again and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.