[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 9/9] libxl: vnuma topology configuration parser and doc



So, as far as I understood it, v11 did not include a toolstack part, so
this version is what I should be looking at, is this correct?

I'm also looking at the commits in the github repo (for v11), and the
code seems to be the same as here.

Please, do correct me if I'm wrong. Hopefully, the comments I'm leaving
about this version, will be useful anyway.

Also, as we agreed upon changing the libxl data structures for this
feature (i.e., the vNUMA related field names and types), the review will
focus on the behavior of the xl parser (and on the structure of the
parsing code, of course), and neglect that part.

So, first of all, about the subject: 'libxl: vnuma...'. This patch looks
more about xl than libxl (so I'd expect something like 'xl: vnuma...').

On mer, 2014-09-03 at 00:24 -0400, Elena Ufimtseva wrote:
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -264,6 +264,83 @@ if the values of B<memory=> and B<maxmem=> differ.
>  A "pre-ballooned" HVM guest needs a balloon driver, without a balloon driver
>  it will crash.
>
In general, about the config file interface, what I think what we want
an user to be able to specify is:

 1. number of virtual NUMA nodes
    (default: 1)
 2. amount of memory on each virtual NUMA node
    (default: total tot_memory / nr_nodes)
 3. the 'distances' from each node to each other one
    (default: something sane :-) )
 4. what vcpus on each virtual NUMA node
    (default: distribute evenly)
 5. virtual node to physical node mapping
    (default: driven by the existing automatic placement logic, or
    by pinning)

Which is pretty much what this patch allows to do, so, good! :-D

The syntax used for each of these parameters is also mostly fine, IMO,
with the only exception of the mapping between vCPUs and vNODEs (more on
that below).

About the name of the parameters, in this patch, we have:

 1. "vnuma_nodes"
 2. "vnuma_mem"
 3. "vdistance"
 4. "vnuma_cpumap"
 5. "vnuma_nodemap"

I have to admit, I don't especially like these, so here's my proposal.
I'm of course open to other views here, I'm actually just tossing names
there (with a brief rationale and explanation)... :-)

 1. "vnodes": for vCPUs, we have "vcpus", I think something along the 
              same line would make sense;
 2. "vnodes_memory" : same reasoning as above (BTW, we may want to go
                      for something like "vnodes_maxmem", as that is
                      probably what we're actually specifying, isn't it?
                      I'm not sure how important it is to stress that,
                      though);
 3. "vnodes_distance[s]": or just "vnodes_dist[s]"
 4. "vcpus_to_vnodes"
 5. "vnodes_to_pnodes": not particularly proud of this, but can't think
                        of anything better right now.
 
Thoughts?

> +=item B<vnuma_nodes=N>
> +
> +Number of vNUMA nodes the guest will be initialized with on boot.
> +PV guest by default will have one vnuma node.
> +
It's not only PV guests that will have, by default, just one node...
This is true for every guest (the reason being vNUMA is not even
implemented for such other guest types!! :-) ).

As it stands, it looks to me like other kind of guest than PV can have a
different number of NUMA nodes by default.

> +=item B<vnuma_mem=[vmem1, vmem2, ...]>
> +
> +List of memory sizes for each node, defined in MBytes. Number of items 
> listed must
> +match nr_vnodes. 
>
"must match the number of virtual NUMA nodes specified by
B<vnuma_nodes=>"

> If the sum of all vnode memories does not match the domain memory
>
", specified by B<memory=> and/or B<maxmem=>,"

> +or there are missing nodes, it will fail.
> +If not specified, memory will be equally split between vnodes. Current 
> minimum
> +memory size for one node is limited by 32MB.
> +
> +Example: vnuma_mem=[1024, 1024, 2048, 2048]
> +Total amount of memory in guest: 6GB
> +

> +=item B<vnuma_vcpumap=[node_nr, node_nr, ...]>
> +
> +Defines vcpu to vnode mapping as a list of integers. The position in the list
> +is a vcpu number, and the value is the vnode number to which the vcpu will be
> +assigned to.
> +Current limitations:
> +- vNUMA node must have at least one vcpu, otherwise default vcpu_to_vnode 
> will be used.
> +- Total number of vnodes cannot be bigger then number of vcpus.
> +
> +Example:
> +Map of 4 vcpus to 2 vnodes:
> +0,1 vcpu -> vnode0
> +2,3 vcpu -> vnode1:
> +
> +vnuma_vcpumap = [0, 0, 1, 1]
> + 4 vcpus here -  0  1  2  3
> +
>
I wonder whether it would be better to turn this into something lieke
the below:

["0,1,2,3", "4-7", "8,10,12,14", "9,11,13,15"]

With the following meaning: there are 4 nodes, and vCPUs 0,1,2,3 belongs
to node 0, vCPUs 4,5,6,7 belongs to node 1, etc.

It looks a lot more easy to use to me. With the current syntax, the same
configuration would be expressed like this:

[0,0,0,0,1,1,1,1,2,3,2,3,2,3,2,3]

Quite hard to get right when both writing and reading it, isn't it?

Elena's example looks like this:

["0,1", "2,3"] (with my notation)
[0, 0, 1, 1]   (with this patch notation)

From an implementation point of view, the code for parsing strings like
these (and also for parsing list of strings like this), is already there
in xl, so it shouldn't even be too big of a piece of work...

> +=item B<vnuma_vnodemap=[p1, p2, ..., pn]>
> +
> +List of physical node numbers, position in the list represents vnode number.
> +Used for manual placement of vnuma nodes to physical NUMA nodes.
> +Will not be used if automatic numa placement is active.
> +
> +Example:
> +assume NUMA machine with 4 physical nodes. Placing vnuma node 0 to pnode 2,
> +vnuma node 1 to pnode 3:
> +vnode0 -> pnode2
> +vnode1 -> pnode3
> +
> +vnuma_vnodemap=[2, 3]
> +first vnode will be placed on node 2, second on node 3.
> +
>
I'm fine with this.

> +=item B<vnuma_autoplacement=[0|1]>
> +
To be killed. :-)

>  =head3 Event Actions
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 409a795..1af2250 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -40,6 +40,7 @@
>  #include "libxl_json.h"
>  #include "libxlutil.h"
>  #include "xl.h"
> +#include "libxl_vnuma.h"
>  
>  /* For calls which return an errno on failure */
>  #define CHK_ERRNOVAL( call ) ({                                         \
> @@ -797,6 +798,432 @@ static void parse_vcpu_affinity(libxl_domain_build_info 
> *b_info,
>      }
>  }
>  
> +static unsigned int get_list_item_uint(XLU_ConfigList *list, unsigned int i)
> +{
> +    const char *buf;
> +    char *ep;
> +    unsigned long ul;
> +    int rc = -EINVAL;
> +
> +    buf = xlu_cfg_get_listitem(list, i);
> +    if (!buf)
> +        return rc;
> +    ul = strtoul(buf, &ep, 10);
> +    if (ep == buf)
> +        return rc;
> +    if (ul >= UINT16_MAX)
> +        return rc;
> +    return (unsigned int)ul;
> +}
> +
I see what you want to achieve here, but I don't particularly like
having this as an helper here. If we want something this generic,
perhaps put this in xlu? Although, I'm not sure about this either...

It's probably not such a great deal, and it's a tool's maintainer call,
I think.

> +static void vdistance_set(unsigned int *vdistance,
> +                                unsigned int nr_vnodes,
> +                                unsigned int samenode,
> +                                unsigned int othernode)
> +{
> +    unsigned int idx, slot;
> +    for (idx = 0; idx < nr_vnodes; idx++)
> +        for (slot = 0; slot < nr_vnodes; slot++)
> +            *(vdistance + slot * nr_vnodes + idx) =
> +                idx == slot ? samenode : othernode;
> +}
> +
> +static void vcputovnode_default(unsigned int *cpu_to_node,
> +                                unsigned int nr_vnodes,
> +                                unsigned int max_vcpus)
> +{
> +    unsigned int cpu;
> +    for (cpu = 0; cpu < max_vcpus; cpu++)
> +        cpu_to_node[cpu] = cpu % nr_vnodes;
> +}
> +
> +/* Split domain memory between vNUMA nodes equally. */
> +static int split_vnumamem(libxl_domain_build_info *b_info)
> +{
> +    unsigned long long vnodemem = 0;
> +    unsigned long n;
> +    unsigned int i;
> +
> +    if (b_info->vnodes == 0)
> +        return -1;
> +
This is never called (and that's ok!) with when b_info->vnodes==0, so I
find the check more confusing than helpful.

An assert(), maybe?

> +    vnodemem = (b_info->max_memkb >> 10) / b_info->vnodes;
> +    if (vnodemem < MIN_VNODE_SIZE)
> +        return -1;
> +    /* reminder in MBytes. */
> +    n = (b_info->max_memkb >> 10) % b_info->vnodes;
> +    /* get final sizes in MBytes. */
> +    for (i = 0; i < (b_info->vnodes - 1); i++)
> +        b_info->vnuma_mem[i] = vnodemem;
> +    /* add the reminder to the last node. */
> +    b_info->vnuma_mem[i] = vnodemem + n;
> +    return 0;
> +}
> +
> +static void vnuma_vnodemap_default(unsigned int *vnuma_vnodemap,
> +                                   unsigned int nr_vnodes)
> +{
> +    unsigned int i;
> +    for (i = 0; i < nr_vnodes; i++)
> +        vnuma_vnodemap[i] = VNUMA_NO_NODE;
> +}
> +
> +/*
> + * init vNUMA to "zero config" with one node and all other
> + * topology parameters set to default.
> + */
> +static int vnuma_default_config(libxl_domain_build_info *b_info)
> +{
> +    b_info->vnodes = 1;
> +    /* all memory goes to this one vnode, as well as vcpus. */
> +    if (!(b_info->vnuma_mem = (uint64_t *)calloc(b_info->vnodes,
> +                                sizeof(*b_info->vnuma_mem))))
> +        goto bad_vnumazerocfg;
> +
> +    if (!(b_info->vnuma_vcpumap = (unsigned int *)calloc(b_info->max_vcpus,
> +                                sizeof(*b_info->vnuma_vcpumap))))
> +        goto bad_vnumazerocfg;
> +
> +    if (!(b_info->vdistance = (unsigned int *)calloc(b_info->vnodes *
> +                                b_info->vnodes, sizeof(*b_info->vdistance))))
> +        goto bad_vnumazerocfg;
> +
> +    if (!(b_info->vnuma_vnodemap = (unsigned int *)calloc(b_info->vnodes,
> +                                sizeof(*b_info->vnuma_vnodemap))))
> +        goto bad_vnumazerocfg;
> +
> +    b_info->vnuma_mem[0] = b_info->max_memkb >> 10;
> +
> +    /* all vcpus assigned to this vnode. */
> +    vcputovnode_default(b_info->vnuma_vcpumap, b_info->vnodes,
> +                        b_info->max_vcpus);
> +
> +    /* default vdistance is 10. */
> +    vdistance_set(b_info->vdistance, b_info->vnodes, 10, 10);
> +
> +    /* VNUMA_NO_NODE for vnode_to_pnode. */
> +    vnuma_vnodemap_default(b_info->vnuma_vnodemap, b_info->vnodes);
> +
> +    /*
> +     * will be placed to some physical nodes defined by automatic
> +     * numa placement or VNUMA_NO_NODE will not request exact node.
> +     */
> +    libxl_defbool_set(&b_info->vnuma_autoplacement, true);
> +    return 0;
> +
> + bad_vnumazerocfg:
> +    return -1;
> +}
> +
Can't this, and with it most of the above helpers, be implemented in
such a way that it fits inside libxl__domain_build_info_setdefault()
(and, if yes, moved in a libxl related patch)?

I'm not 100% sure, but it really looks similar to the kind of things we
usually do there rather than in xl.

> +static void free_vnuma_info(libxl_domain_build_info *b_info)
> +{
> +    free(b_info->vnuma_mem);
> +    free(b_info->vdistance);
> +    free(b_info->vnuma_vcpumap);
> +    free(b_info->vnuma_vnodemap);
> +
> +    b_info->vnuma_mem = NULL;
> +    b_info->vdistance = NULL;
> +    b_info->vnuma_vcpumap = NULL;
> +    b_info->vnuma_vnodemap = NULL;
> +
> +    b_info->vnodes = 0;
> +    b_info->vmemranges = 0;
> +}
> +
This appears to be called only once, right before an "exit(1)", in xl.
Do we need it? For sure we don't need the =NULL and =0 thing, do we?

> +static int parse_vnuma_mem(XLU_Config *config,
> +                            libxl_domain_build_info **b_info)
> +{
> +    libxl_domain_build_info *dst;
>
Why the **b_info and the *dst thing?

> +    XLU_ConfigList *vnumamemcfg;
> +    int nr_vnuma_regions, i;
> +    unsigned long long vnuma_memparsed = 0;
> +    unsigned long ul;
> +    const char *buf;
> +    char *ep;
> +
> +    dst = *b_info;
> +    if (!xlu_cfg_get_list(config, "vnuma_mem",
> +                          &vnumamemcfg, &nr_vnuma_regions, 0)) {
> +
> +        if (nr_vnuma_regions != dst->vnodes) {
> +            fprintf(stderr, "Number of numa regions (vnumamem = %d) is \
> +                    incorrect (should be %d).\n", nr_vnuma_regions,
> +                    dst->vnodes);
>
I find the term 'regions' rather confusing, both in the variable name
and in the error message. I'd go for something like nr_vnuma_mem, for
the variable. For the message, maybe something like this:

 "Specified the amount of memory for %d virtual nodes,\n"
 "while number of virtual node is %d!\n"

But maybe it's just me. :-)

Oh, the "(vnumamem = %d)" part, at least, makes very few sense to me, as
I don't thing the word 'vnumamem', written like that and associated with
an integer, makes much sense to an user. At least that should change, I
think.

> +            goto bad_vnuma_mem;
> +        }
> +
> +        dst->vnuma_mem = calloc(dst->vnodes,
> +                                 sizeof(*dst->vnuma_mem));
>
If you go for the libxl__build_info_setdefault() route I suggested
above, this may need to change a bit, as you'll probably find
b_info->vnuma_mem allocated already.

Anyway, I don't think it would be a big deal to, in that case, either
use realloc() here, or use NULL as default value (and hence set it to
that in libxl__build_info_sedefault() ).

> +        if (dst->vnuma_mem == NULL) {
> +            fprintf(stderr, "Unable to allocate memory for vnuma ranges.\n");
>
Ditto, about 'ranges'.

> +            goto bad_vnuma_mem;
> +        }
> +
> +        /*
> +         * Will parse only nr_vnodes times, even if we have more/less 
> regions.
>
How can it be that you have fewer or more entries than b_info->vnodes? 

> +         * Take care of it later if less or discard if too many regions.
>
You took care of that above already...

> +         */
> +        for (i = 0; i < dst->vnodes; i++) {
> +            buf = xlu_cfg_get_listitem(vnumamemcfg, i);
> +            if (!buf) {
> +                fprintf(stderr,
> +                        "xl: Unable to get element %d in vnuma memory 
> list.\n", i);
> +                goto bad_vnuma_mem;
>
The common pattern in xl seems to be to exit() right away, if this
happens (whether to exit with 1 or -1, that seems a bit inconsistent to
me, at a quick glance).

> +            }
> +
> +            ul = strtoul(buf, &ep, 10);
> +            if (ep == buf) {
> +                fprintf(stderr, "xl: Invalid argument parsing vnumamem: 
> %s.\n", buf);
> +                goto bad_vnuma_mem;
>
And the same (i.e., just exit() ) applies here... And pretty much
everywhere, actually! :-)

> +            }
> +
> +            /* 32Mb is a min size for a node, taken from Linux */
> +            if (ul >= UINT32_MAX || ul < MIN_VNODE_SIZE) {
>
What's the idea behing UINT32_MAX here?

I guess what I mean is, if there is, for any reason, a maximum virtual
node size, then it may be useful to #define it, e.g., MAX_VNODE_SIZE
(just as you're doing for MIN_VNODE_SIZE). If there is not, then we can
just live with what the type can handle, and it will be lower layers
that will return appropriate errors if values are wrong/unrealistic.

Like this, you're saying that such maximum virtual node size is 4096 TB
(if I'm not doing the math wrong), which I don't think it makes much
sense... :-P

> +                fprintf(stderr, "xl: vnuma memory %lu is not within %u - %u 
> range.\n",
> +                        ul, MIN_VNODE_SIZE, UINT32_MAX);
>
"xl: memory for virtual node %lu is outside of [%u, %u] MB.\n"

> +                goto bad_vnuma_mem;
> +            }
> +
> +            /* memory in MBytes */
> +            dst->vnuma_mem[i] = ul;
> +        }
> +
> +        /* Total memory for vNUMA parsed to verify */
> +        for (i = 0; i < nr_vnuma_regions; i++)
> +            vnuma_memparsed = vnuma_memparsed + (dst->vnuma_mem[i]);
> +
Not a big deal, but can't you accumulate while doing the parsing?

> +        /* Amount of memory for vnodes same as total? */
> +        if ((vnuma_memparsed << 10) != (dst->max_memkb)) {
> +            fprintf(stderr, "xl: vnuma memory is not the same as domain \
> +                    memory size.\n");
> +            goto bad_vnuma_mem;
> +        }
> +    } else {
> +        dst->vnuma_mem = calloc(dst->vnodes,
> +                                      sizeof(*dst->vnuma_mem));
> +        if (dst->vnuma_mem == NULL) {
> +            fprintf(stderr, "Unable to allocate memory for vnuma ranges.\n");
> +            goto bad_vnuma_mem;
> +        }
> +
The allocation is the same as above. It can live outside of the if,
can't it? Especially considering that, as said before, you can just
exit() on failure.

> +        fprintf(stderr, "WARNING: vNUMA memory ranges were not 
> specified.\n");
> +        fprintf(stderr, "Using default equal vnode memory size %lu Kbytes \
> +                to cover %lu Kbytes.\n",
> +                dst->max_memkb / dst->vnodes, dst->max_memkb);
> +
Does this deserve a warning? In the man page, we say it's legal not to
specify this, so I may well have read that, and may be doing this on
purpose...

> +        if (split_vnumamem(dst) < 0) {
> +            fprintf(stderr, "Could not split vnuma memory into equal 
> chunks.\n");
> +            goto bad_vnuma_mem;
> +        }
> +    }
> +    return 0;
> +
> + bad_vnuma_mem:
> +    return -1;
> +}
> +
> +static int parse_vnuma_distance(XLU_Config *config,
> +                                libxl_domain_build_info **b_info)
> +{
> +    libxl_domain_build_info *dst;
>
Same as above.

> +    XLU_ConfigList *vdistancecfg;
> +    int nr_vdist;
> +
> +    dst = *b_info;
> +    dst->vdistance = calloc(dst->vnodes * dst->vnodes,
> +                               sizeof(*dst->vdistance));
> +    if (dst->vdistance == NULL)
> +        goto bad_distance;
> +
> +    if (!xlu_cfg_get_list(config, "vdistance", &vdistancecfg, &nr_vdist, 0)) 
> {
> +        int d1, d2, i;
> +        /*
> +         * First value is the same node distance, the second as the
> +         * rest of distances. The following is required right now to
>
I'm not a native English speaker, but 'The latter' sounds more correct
than 'The following' (if that is what you mean; if it's not, then I
didn't understand what you mean! :-) ).

> +         * avoid non-symmetrical distance table as it may break latest 
> kernel.
> +         * TODO: Better way to analyze extended distance table, possibly
> +         * OS specific.
> +         */
> +
Ok. So, AFAIUI, for now, you want the list to be two elements long, is
that right? If yes, should we check that nr_vdist==2 ?

What happens if I have b_info->vnodes > 1, and I pass a list only one
element long ?

> +        for (i = 0; i < nr_vdist; i++) {
> +            d1 = get_list_item_uint(vdistancecfg, i);
> +        }
> +
Mmm... What's this?

> +        d1 = get_list_item_uint(vdistancecfg, 0);
> +        if (dst->vnodes > 1)
> +           d2 = get_list_item_uint(vdistancecfg, 1);
> +        else
> +           d2 = d1;
> +
> +        if (d1 >= 0 && d2 >= 0) {
> +            if (d1 < d2)
> +                fprintf(stderr, "WARNING: vnuma distance d1 < d2, %u < 
> %u\n", d1, d2);
>
I agree this looks weird, and deserves a warning.

I'd mention why that is the case, in the warning message, i.e., because
the distance from a node to _itself_ is being set to a bigger value than
the istance between two different nodes.

> +            vdistance_set(dst->vdistance, dst->vnodes, d1, d2);
> +        } else {
> +            fprintf(stderr, "WARNING: vnuma distance values are 
> incorrect.\n");
> +            goto bad_distance;
>
After printing this "WARNING", you goto bad_distance, which means this
returns -1, which means we below exit(1).

So, this really looks like an error, rather than a warning to me.

> +        }
> +    } else {
> +        fprintf(stderr, "Could not parse vnuma distances.\n");
> +        vdistance_set(dst->vdistance, dst->vnodes, 10, 20);
> +    }
> +    return 0;
> +
> + bad_distance:
> +    return -1;
>
As said before, xl often just exit on failures like these.

> +}
> +
> +static int parse_vnuma_vcpumap(XLU_Config *config,
> +                                libxl_domain_build_info **b_info)
> +{
> +    libxl_domain_build_info *dst;
>
...

> +    XLU_ConfigList *vcpumap;
> +    int nr_vcpumap, i;
> +
> +    dst = *b_info;
> +    dst->vnuma_vcpumap = (unsigned int *)calloc(dst->max_vcpus,
> +                                     sizeof(*dst->vnuma_vcpumap));
>
You don't need the cast.

> +    if (dst->vnuma_vcpumap == NULL)
> +        goto bad_vcpumap;
> +
exit(1);

> +    if (!xlu_cfg_get_list(config, "vnuma_vcpumap",
> +                          &vcpumap, &nr_vcpumap, 0)) {
> +        if (nr_vcpumap == dst->max_vcpus) {
>
    if (nr_vcpumap != b_info->max_vcpus) {
        fprintf(stderr, "xl: mapping of vCPUs to vNODEs specified for "
                        "%d vCPUs, while the domain has %d",
                nr_vcpumap, b_inf->max_vcpus);
        exit(1)
    }

and then continue with one less level of indentation. :-)

> +            unsigned int  vnode, vcpumask = 0, vmask;
> +
> +            vmask = ~(~0 << nr_vcpumap);
> +            for (i = 0; i < nr_vcpumap; i++) {
> +                vnode = get_list_item_uint(vcpumap, i);
> +                if (vnode >= 0 && vnode < dst->vnodes) {
> +                    vcpumask |= (1 << i);
> +                    dst->vnuma_vcpumap[i] = vnode;
> +                }
> +            }
> +
> +            /* Did it covered all vnodes in the vcpu mask? */
> +            if ( !(((vmask & vcpumask) + 1) == (1 << nr_vcpumap)) ) {
> +                fprintf(stderr, "WARNING: Not all vnodes were covered \
> +                        in numa_cpumask.\n");
> +                goto bad_vcpumap;
> +            }
>
As we settled on using libxl_bitmap for this, I expect this to change
quite a bit (e.g., we'd have to build a bitmap per each node, the
libxl_bitmap manipulating functions should be used, etc.).

> +        } else {
> +            fprintf(stderr, "WARNING:  Bad vnuma_vcpumap.\n");
> +            goto bad_vcpumap;
> +        }
> +    }
> +    else
> +        vcputovnode_default(dst->vnuma_vcpumap,
> +                            dst->vnodes,
> +                            dst->max_vcpus);
> +    return 0;
> +
> + bad_vcpumap:
> +    return -1;
> +}
> +
> +static int parse_vnuma_vnodemap(XLU_Config *config,
> +                                libxl_domain_build_info **b_info)
> +{
> +    libxl_domain_build_info *dst;
No need?

> +    XLU_ConfigList *vnodemap;
> +    int nr_vnodemap, i;
> +
> +    dst = *b_info;
> +
> +    /* There is mapping to NUMA physical nodes? */
> +    dst->vnuma_vnodemap = (unsigned int *)calloc(dst->vnodes,
> +                           sizeof(*dst->vnuma_vnodemap));
>
no cast.

> +    if (dst->vnuma_vnodemap == NULL)
> +        goto bad_vnodemap;
> +
> +    if (!xlu_cfg_get_list(config, "vnuma_vnodemap",
> +                          &vnodemap, &nr_vnodemap, 0)) {
> +        /*
> +         * If not specified or incorrect, will be defined
> +         * later based on the machine architecture, configuration
> +         * and memory availble when creating domain.
> +         */
> +        libxl_defbool_set(&dst->vnuma_autoplacement, false);
>
As agreed in the other part of this thread, there is already an
automatic placement switch to act on, in this case.

Note that the way that switch is being used right now in libxl code, may
need to change a bit, but that is, nevertheless, the right thing to do.
So have a look at that, please. :-)

> +        if (nr_vnodemap == dst->vnodes) {
> +            unsigned int vnodemask = 0, pnode, smask;
> +            smask = ~(~0 << dst->vnodes);
> +            for (i = 0; i < dst->vnodes; i++) {
> +                pnode = get_list_item_uint(vnodemap, i);
> +                if (pnode >= 0) {
> +                    vnodemask |= (1 << i);
> +                    dst->vnuma_vnodemap[i] = pnode;
> +                }
> +            }
> +
> +            /* Did it covered all vnodes in the mask? */
> +            if ( !(((vnodemask & smask) + 1) == (1 << nr_vnodemap)) ) {
> +                fprintf(stderr, "WARNING: Not all vnodes were covered \
> +                        vnuma_vnodemap.\n");
> +                fprintf(stderr, "Automatic placement will be used for 
> vnodes.\n");
> +                libxl_defbool_set(&dst->vnuma_autoplacement, true);
> +                vnuma_vnodemap_default(dst->vnuma_vnodemap, dst->vnodes);
> +            }
> +        }
> +        else {
> +            fprintf(stderr, "WARNING: Incorrect vnuma_vnodemap.\n");
> +            fprintf(stderr, "Automatic placement will be used for 
> vnodes.\n");
> +            libxl_defbool_set(&dst->vnuma_autoplacement, true);
> +            vnuma_vnodemap_default(dst->vnuma_vnodemap, dst->vnodes);
> +        }
> +    }
>
This is also about to change quite substantially, so I'm avoiding
commenting on it...

> +static void parse_vnuma_config(XLU_Config *config,
> +                               libxl_domain_build_info *b_info)
> +{
> +    long l;
> +
> +    if (!xlu_cfg_get_long (config, "vnodes", &l, 0)) {
> +        if (l > MAX_VNUMA_NODES) {
> +            fprintf(stderr, "Too many vnuma nodes, max %d is allowed.\n",
> +                    MAX_VNUMA_NODES);
> +            goto bad_vnuma_config;
> +        }
> +        b_info->vnodes = l;
> +
> +        if (!xlu_cfg_get_defbool(config, "vnuma_autoplacement",
> +                    &b_info->vnuma_autoplacement, 0))
> +            libxl_defbool_set(&b_info->vnuma_autoplacement, false);
> +
"vnuma_autoplacement" is, AFAIUI, going away, so no comments. ;-P


Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.