[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 10 of 10 v3] Some automatic NUMA placement documentation
On 04/07/12 17:18, Dario Faggioli wrote: # HG changeset patch # User Dario Faggioli<raistlin@xxxxxxxx> # Date 1341416324 -7200 # Node ID f1523c3dc63746e07b11fada5be3d461c3807256 # Parent 885e2f385601d66179058bfb6bd3960f17d5e068 Some automatic NUMA placement documentation About rationale, usage and (some small bits of) API. Signed-off-by: Dario Faggioli<dario.faggioli@xxxxxxxxxx> Acked-by: Ian Campbell<ian.campbell@xxxxxxxxxx> Changes from v1: * API documentation moved close to the actual functions. diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown new file mode 100644 --- /dev/null +++ b/docs/misc/xl-numa-placement.markdown @@ -0,0 +1,91 @@ +# Guest Automatic NUMA Placement in libxl and xl # + +## Rationale ## + +NUMA means the memory accessing times of a program running on a CPU depends on +the relative distance between that CPU and that memory. In fact, most of the +NUMA systems are built in such a way that each processor has its local memory, +on which it can operate very fast. On the other hand, getting and storing data +from and on remote memory (that is, memory local to some other processor) is +quite more complex and slow. On these machines, a NUMA node is usually defined +as a set of processor cores (typically a physical CPU package) and the memory +directly attached to the set of cores. + +The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) machines by +assigning to its domain a "node affinity", i.e., a set of NUMA nodes of the +host from which it gets its memory allocated. + +NUMA awareness becomes very important as soon as many domains start running +memory-intensive workloads on a shared host. In fact, the cost of accessing non +node-local memory locations is very high, and the performance degradation is +likely to be noticeable. + +## Guest Placement in xl ## + +If using xl for creating and managing guests, it is very easy to ask for both +manual or automatic placement of them across the host's NUMA nodes. + +Note that xm/xend does the very same thing, the only differences residing in +the details of the heuristics adopted for the placement (see below). + +### Manual Guest Placement with xl ### + +Thanks to the "cpus=" option, it is possible to specify where a domain should +be created and scheduled on, directly in its config file. This affects NUMA +placement and memory accesses as the hypervisor constructs the node affinity of +a VM basing right on its CPU affinity when it is created. + +This is very simple and effective, but requires the user/system administrator +to explicitly specify affinities for each and every domain, or Xen won't be +able to guarantee the locality for their memory accesses. + +It is also possible to deal with NUMA by partitioning the system using cpupools +(available in the upcoming release of Xen, 4.2). Again, this could be "The +Right Answer" for many needs and occasions, but has to to be carefully +considered and manually setup by hand. + +### Automatic Guest Placement with xl ### + +In case no "cpus=" option is specified in the config file, libxl tries to I think "If no 'cpus=' option..." is better here. +figure out on its own on which node(s) the domain could fit best. It is +worthwhile noting that optimally fitting a set of VMs on the NUMA nodes of an +host host is an incarnation of the Bin Packing Problem. In fact, the various host host I think you can just say "...is an incarnation of the Bin Packing Problem, which is known to be NP-hard." We will therefore be using some heuristics."+VMs with different memory sizes are the items to be packed, and the host nodes +are the bins. That is known to be NP-hard, thus, it is probably better to +tackle the problem with some sort of hauristics, as we do not have any oracle +available! (nb the spelling of "heuristics" as well.) I think I would say "candidates with fewer nodes" here; "small candidates" doesn't convey "fewer nodes" to me.+ +The first thing to do is finding a node, or even a set of nodes, that have +enough free memory and enough physical CPUs for accommodating the one new +domain. The idea is to find a spot for the domain with at least as much free +memory as it has configured, and as much pCPUs as it has vCPUs. After that, +the actual decision on which solution to go for happens accordingly to the +following heuristics: + + * candidates involving fewer nodes come first. In case two (or more) + candidates span the same number of nodes, + * the amount of free memory and the number of domains assigned to the + candidates are considered. In doing that, candidates with greater amount + of free memory and fewer assigned domains are preferred, with free memory + "weighting" three times as much as number of domains. + +Giving preference to small candidates ensures better performance for the guest, We normally don't say "big amount", but "large amount" (don't ask me why -- just sounds a bit funny to me). So this would be "largest amount".+as it avoid spreading its memory among different nodes. Favouring the nodes +that have the biggest amounts of free memory helps keeping the memory +fragmentation small, from a system wide perspective. However, in case more Again, s/in case/if/; Other than that, looks good to me. -George +candidates fulfil these criteria by roughly the same extent, having the number +of domains the candidates are "hosting" helps balancing the load on the various +nodes. + +## Guest Placement within libxl ## + +xl achieves automatic NUMA just because libxl does it interrnally. +No API is provided (yet) for interacting with this feature and modify +the library behaviour regarding automatic placement, it just happens +by default if no affinity is specified (as it is with xm/xend). + +For actually looking and maybe tweaking the mechanism and the algorithms it +uses, all is implemented as a set of libxl internal interfaces and facilities. +Look at the comment "Automatic NUMA placement" in libxl\_internal.h. + +Note this may change in future versions of Xen/libxl. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |