[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2 4/7] libxl/vNUMA: vNUMA libxl functionality.



On Fri, Sep 13, 2013 at 9:50 AM, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote:
> vNUMA libxl supporting functionality.
> libxl supporting functionality for vNUMA includes:
> * having vNUMA memory areas sizes, transforms it to
> start and end pfns based on domain e820 map;
> * contructs vnode_to_pnode map for vNUMA nodes memory
> allocation and pass it to Xen; the mechanism considers
> automatic NUMA placement in case of presence of hardware
> NUMA; In best case scenario all vnodes will be allocated
> within one pnode. If the domain spans different pnodes,
> the vnodes will be one-by-one placed to pnodes. If such
> allocation is impossible due to the memory constraints,
> the allocation will use default mechanism; this is worst
> case scenario.

Why would someone want to make a VM with two vnodes and then put them
on the same pnode?  Apart from testing, of course, but our defaults
should be for the common case of real users.


[snip]

> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index a78c91d..f7b1aeb 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -308,3 +308,89 @@ int libxl__arch_domain_create(libxl__gc *gc, 
> libxl_domain_config *d_config,
>
>      return ret;
>  }
> +
> +unsigned long e820_memory_hole_size(unsigned long start, unsigned long end, 
> struct e820entry e820[], int nr)
> +{
> +#define clamp(val, min, max) ({             \
> +    typeof(val) __val = (val);              \
> +    typeof(min) __min = (min);              \
> +    typeof(max) __max = (max);              \
> +    (void) (&__val == &__min);              \
> +    (void) (&__val == &__max);              \
> +    __val = __val < __min ? __min: __val;   \
> +    __val > __max ? __max: __val; })

What on earth is going on here?

Are these comparison of pointers to work around some gdb "unused
variable" warnings or something?

This would be much better as a static inline function that explicitly
returns a value.

> +    int i;
> +    unsigned long absent, start_pfn, end_pfn;
> +    absent = start - end;
> +    for(i = 0; i < nr; i++) {
> +        if(e820[i].type == E820_RAM) {
> +            start_pfn = clamp(e820[i].addr, start, end);
> +            end_pfn =   clamp(e820[i].addr + e820[i].size, start, end);
> +            absent -= end_pfn - start_pfn;
> +        }
> +    }
> +    return absent;
> +}
> +
> +/*
> + * Aligns memory blocks in domain info for linux NUMA build image.
> + * Takes libxl_domain_build_info memory sizes and returns aligned to
> + * domain e820 map linux numa blocks in guest pfns.
> + */
> +int libxl__vnuma_align_mem(libxl__gc *gc,
> +                            uint32_t domid,
> +                            libxl_domain_build_info *b_info, /* IN: mem 
> sizes */
> +                            vnuma_memblk_t *memblks)        /* OUT: linux 
> numa blocks in pfn */
> +{
> +#ifndef roundup
> +#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#endif
> +    int i, rc;
> +    unsigned long shift = 0, size, node_min_size = 1, limit;
> +    unsigned long end_max;
> +    uint32_t nr;
> +    struct e820entry map[E820MAX];
> +
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> +    if (rc < 0) {
> +        errno = rc;
> +        return -EINVAL;
> +    }
> +    nr = rc;
> +    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> +                       (b_info->max_memkb - b_info->target_memkb) +
> +                       b_info->u.pv.slack_memkb);
> +    if (rc)
> +        return ERROR_FAIL;
> +    end_max = map[nr-1].addr + map[nr-1].size;
> +    shift = 0;
> +    memset(memblks, 0, sizeof(*memblks)*b_info->nr_vnodes);
> +    memblks[0].start = map[0].addr;
> +    for(i = 0; i < b_info->nr_vnodes; i++) {
> +        memblks[i].start += shift;

OK, at this point memblks[i] should be 0, since we zeroed it up
there... why are you += instead of just =?

And the code below is confusing and I think isn't doing what you think
it does.  Walk me through this.

> +        memblks[i].end += shift + b_info->vnuma_memszs[i];

.end started at 0, so this is effectively an obfuscated '='.

So suppose vnuma_memszs is 256MiB; so now .end == .start + 256MiB.

> +        limit = size = memblks[i].end - memblks[i].start;

So now limit and size are both 256MiB.

> +        while (memblks[i].end - memblks[i].start -
> +                e820_memory_hole_size(memblks[i].start, memblks[i].end, map, 
> nr)
> +                < size) {

And suppose there was a hole that overlapped here, so we enter this loop.

> +            memblks[i].end += node_min_size;
> +            shift += node_min_size;
> +            if (memblks[i].end - memblks[i].start >= limit) {

Now, at this point, unless node_min_size is negative, this is *always*
going to be true.

> +                memblks[i].end = memblks[i].start + limit;
> +                break;

So we set .end back to what it was before (.start + 256MiB) and break
out of the loop?

> +            }
> +            if (memblks[i].end == end_max) {
> +                memblks[i].end = end_max;
> +                break;
> +            }
> +        }
> +        shift = memblks[i].end;

And throw away whatever we did to shift before?

What was it you were actually trying to do here?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.