[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC v2 4/7] libxl/vNUMA: vNUMA libxl functionality.
On Fri, Sep 13, 2013 at 9:50 AM, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote: > vNUMA libxl supporting functionality. > libxl supporting functionality for vNUMA includes: > * having vNUMA memory areas sizes, transforms it to > start and end pfns based on domain e820 map; > * contructs vnode_to_pnode map for vNUMA nodes memory > allocation and pass it to Xen; the mechanism considers > automatic NUMA placement in case of presence of hardware > NUMA; In best case scenario all vnodes will be allocated > within one pnode. If the domain spans different pnodes, > the vnodes will be one-by-one placed to pnodes. If such > allocation is impossible due to the memory constraints, > the allocation will use default mechanism; this is worst > case scenario. Why would someone want to make a VM with two vnodes and then put them on the same pnode? Apart from testing, of course, but our defaults should be for the common case of real users. [snip] > diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c > index a78c91d..f7b1aeb 100644 > --- a/tools/libxl/libxl_x86.c > +++ b/tools/libxl/libxl_x86.c > @@ -308,3 +308,89 @@ int libxl__arch_domain_create(libxl__gc *gc, > libxl_domain_config *d_config, > > return ret; > } > + > +unsigned long e820_memory_hole_size(unsigned long start, unsigned long end, > struct e820entry e820[], int nr) > +{ > +#define clamp(val, min, max) ({ \ > + typeof(val) __val = (val); \ > + typeof(min) __min = (min); \ > + typeof(max) __max = (max); \ > + (void) (&__val == &__min); \ > + (void) (&__val == &__max); \ > + __val = __val < __min ? __min: __val; \ > + __val > __max ? __max: __val; }) What on earth is going on here? Are these comparison of pointers to work around some gdb "unused variable" warnings or something? This would be much better as a static inline function that explicitly returns a value. > + int i; > + unsigned long absent, start_pfn, end_pfn; > + absent = start - end; > + for(i = 0; i < nr; i++) { > + if(e820[i].type == E820_RAM) { > + start_pfn = clamp(e820[i].addr, start, end); > + end_pfn = clamp(e820[i].addr + e820[i].size, start, end); > + absent -= end_pfn - start_pfn; > + } > + } > + return absent; > +} > + > +/* > + * Aligns memory blocks in domain info for linux NUMA build image. > + * Takes libxl_domain_build_info memory sizes and returns aligned to > + * domain e820 map linux numa blocks in guest pfns. > + */ > +int libxl__vnuma_align_mem(libxl__gc *gc, > + uint32_t domid, > + libxl_domain_build_info *b_info, /* IN: mem > sizes */ > + vnuma_memblk_t *memblks) /* OUT: linux > numa blocks in pfn */ > +{ > +#ifndef roundup > +#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y)) > +#endif > + int i, rc; > + unsigned long shift = 0, size, node_min_size = 1, limit; > + unsigned long end_max; > + uint32_t nr; > + struct e820entry map[E820MAX]; > + > + libxl_ctx *ctx = libxl__gc_owner(gc); > + rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX); > + if (rc < 0) { > + errno = rc; > + return -EINVAL; > + } > + nr = rc; > + rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb, > + (b_info->max_memkb - b_info->target_memkb) + > + b_info->u.pv.slack_memkb); > + if (rc) > + return ERROR_FAIL; > + end_max = map[nr-1].addr + map[nr-1].size; > + shift = 0; > + memset(memblks, 0, sizeof(*memblks)*b_info->nr_vnodes); > + memblks[0].start = map[0].addr; > + for(i = 0; i < b_info->nr_vnodes; i++) { > + memblks[i].start += shift; OK, at this point memblks[i] should be 0, since we zeroed it up there... why are you += instead of just =? And the code below is confusing and I think isn't doing what you think it does. Walk me through this. > + memblks[i].end += shift + b_info->vnuma_memszs[i]; .end started at 0, so this is effectively an obfuscated '='. So suppose vnuma_memszs is 256MiB; so now .end == .start + 256MiB. > + limit = size = memblks[i].end - memblks[i].start; So now limit and size are both 256MiB. > + while (memblks[i].end - memblks[i].start - > + e820_memory_hole_size(memblks[i].start, memblks[i].end, map, > nr) > + < size) { And suppose there was a hole that overlapped here, so we enter this loop. > + memblks[i].end += node_min_size; > + shift += node_min_size; > + if (memblks[i].end - memblks[i].start >= limit) { Now, at this point, unless node_min_size is negative, this is *always* going to be true. > + memblks[i].end = memblks[i].start + limit; > + break; So we set .end back to what it was before (.start + 256MiB) and break out of the loop? > + } > + if (memblks[i].end == end_max) { > + memblks[i].end = end_max; > + break; > + } > + } > + shift = memblks[i].end; And throw away whatever we did to shift before? What was it you were actually trying to do here? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |