[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH RFC v2 0/7] xen: vNUMA introduction
This series of patches introduces vNUMA topology awareness and provides interfaces and data structures to enable vNUMA for PV domU guests. vNUMA topology support should be supported by PV guest kernel. Corresponging patches should be applied. Introduction ------------- vNUMA topology is exposed to the PV guest to improve performance when running workloads on NUMA machines. XEN vNUMA implementation provides a way to create vNUMA-enabled guests on NUMA/UMA and map vNUMA topology to physical NUMA in a optimal way. XEN vNUMA support Current set of patches introduces subop hypercall that is available for enlightened PV guests with vNUMA patches applied. Domain structure was modified to reflect per-domain vNUMA topology for use in other vNUMA-aware subsystems (e.g. ballooning). libxc libxc provides interfaces to build PV guests with vNUMA support and in case of NUMA machines provides initial memory allocation on physical NUMA nodes. This implemented by utilizing nodemap formed by automatic NUMA placement. Details are in patch #3. libxl libxl provides a way to predefine in VM config vNUMA topology - number of vnodes, memory arrangement, vcpus to vnodes assignment, distance map. PV guest As of now, only PV guest can take advantage of vNUMA functionality. vNUMA Linux patches should be applied and NUMA support should be compiled in kernel. Example of booting vNUMA enabled pv domU: NUMA machine: cpu_topology : cpu: core socket node 0: 0 0 0 1: 1 0 0 2: 2 0 0 3: 3 0 0 4: 0 1 1 5: 1 1 1 6: 2 1 1 7: 3 1 1 numa_info : node: memsize memfree distances 0: 17664 12243 10,20 1: 16384 11929 20,10 VM config: memory = 16384 vcpus = 8 name = "rcbig" vnodes = 8 vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g" vcpu_to_vnode ="5 6 7 4 3 2 1 0" root@superpipe:~# xl list -n Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 4096 1 r----- 581.5 any node r9 1 2048 1 -b---- 19.9 0 rc9k1 2 2048 6 -b---- 21.1 1 *rcbig 6 16384 8 -b---- 4.9 any node xl debug-keys u: XEN) Memory location of each domain: (XEN) Domain 0 (total: 1048576): (XEN) Node 0: 510411 (XEN) Node 1: 538165 (XEN) Domain 2 (total: 524288): (XEN) Node 0: 0 (XEN) Node 1: 524288 (XEN) Domain 3 (total: 4194304): (XEN) Node 0: 2621440 (XEN) Node 1: 1572864 (XEN) Domain has 8 vnodes (XEN) pnode 0: vnodes: 0 (2048), 1 (2048), 2 (2048), 3 (2048), 4 (2048), (XEN) pnode 1: vnodes: 5 (2048), 6 (2048), 7 (2048), (XEN) Domain vcpu to vnode: 5 6 7 4 3 2 1 0 pv linux boot (domain 3): [ 0.000000] init_memory_mapping: [mem 0x00100000-0x37fffffff] [ 0.000000] [mem 0x00100000-0x37fffffff] page 4k [ 0.000000] RAMDISK: [mem 0x01dd6000-0x0347dfff] [ 0.000000] vNUMA: memblk[0] - 0x0 0x80000000 [ 0.000000] vNUMA: memblk[1] - 0x80000000 0x100000000 [ 0.000000] vNUMA: memblk[2] - 0x100000000 0x180000000 [ 0.000000] vNUMA: memblk[3] - 0x180000000 0x200000000 [ 0.000000] vNUMA: memblk[4] - 0x200000000 0x280000000 [ 0.000000] vNUMA: memblk[5] - 0x280000000 0x300000000 [ 0.000000] vNUMA: memblk[6] - 0x300000000 0x380000000 [ 0.000000] vNUMA: memblk[7] - 0x380000000 0x400000000 [ 0.000000] NUMA: Initialized distance table, cnt=8 [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x7fffffff] [ 0.000000] NODE_DATA [mem 0x7ffd9000-0x7fffffff] [ 0.000000] Initmem setup node 1 [mem 0x80000000-0xffffffff] [ 0.000000] NODE_DATA [mem 0xfffd9000-0xffffffff] [ 0.000000] Initmem setup node 2 [mem 0x100000000-0x17fffffff] [ 0.000000] NODE_DATA [mem 0x17ffd9000-0x17fffffff] [ 0.000000] Initmem setup node 3 [mem 0x180000000-0x1ffffffff] [ 0.000000] NODE_DATA [mem 0x1fffd9000-0x1ffffffff] [ 0.000000] Initmem setup node 4 [mem 0x200000000-0x27fffffff] [ 0.000000] NODE_DATA [mem 0x27ffd9000-0x27fffffff] [ 0.000000] Initmem setup node 5 [mem 0x280000000-0x2ffffffff] [ 0.000000] NODE_DATA [mem 0x2fffd9000-0x2ffffffff] [ 0.000000] Initmem setup node 6 [mem 0x300000000-0x37fffffff] [ 0.000000] NODE_DATA [mem 0x37ffd9000-0x37fffffff] [ 0.000000] Initmem setup node 7 [mem 0x380000000-0x3ffffffff] [ 0.000000] NODE_DATA [mem 0x3fdff7000-0x3fe01dfff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal [mem 0x100000000-0x3ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x0009ffff] [ 0.000000] node 0: [mem 0x00100000-0x7fffffff] [ 0.000000] node 1: [mem 0x80000000-0xffffffff] [ 0.000000] node 2: [mem 0x100000000-0x17fffffff] [ 0.000000] node 3: [mem 0x180000000-0x1ffffffff] [ 0.000000] node 4: [mem 0x200000000-0x27fffffff] [ 0.000000] node 5: [mem 0x280000000-0x2ffffffff] [ 0.000000] node 6: [mem 0x300000000-0x37fffffff] [ 0.000000] node 7: [mem 0x380000000-0x3ffffffff] [ 0.000000] On node 0 totalpages: 524191 [ 0.000000] DMA zone: 56 pages used for memmap [ 0.000000] DMA zone: 21 pages reserved [ 0.000000] DMA zone: 3999 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 7112 pages used for memmap [ 0.000000] DMA32 zone: 520192 pages, LIFO batch:31 [ 0.000000] On node 1 totalpages: 524288 [ 0.000000] DMA32 zone: 7168 pages used for memmap [ 0.000000] DMA32 zone: 524288 pages, LIFO batch:31 [ 0.000000] On node 2 totalpages: 524288 [ 0.000000] Normal zone: 7168 pages used for memmap [ 0.000000] Normal zone: 524288 pages, LIFO batch:31 [ 0.000000] On node 3 totalpages: 524288 [ 0.000000] Normal zone: 7168 pages used for memmap [ 0.000000] Normal zone: 524288 pages, LIFO batch:31 [ 0.000000] On node 4 totalpages: 524288 [ 0.000000] Normal zone: 7168 pages used for memmap [ 0.000000] Normal zone: 524288 pages, LIFO batch:31 [ 0.000000] On node 5 totalpages: 524288 [ 0.000000] Normal zone: 7168 pages used for memmap [ 0.000000] Normal zone: 524288 pages, LIFO batch:31 [ 0.000000] On node 6 totalpages: 524288 [ 0.000000] Normal zone: 7168 pages used for memmap [ 0.000000] Normal zone: 524288 pages, LIFO batch:31 [ 0.000000] On node 7 totalpages: 524288 [ 0.000000] Normal zone: 7168 pages used for memmap [ 0.000000] Normal zone: 524288 pages, LIFO batch:31 [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs [ 0.000000] No local APIC present [ 0.000000] APIC: disable apic facility [ 0.000000] APIC: switched to apic NOOP [ 0.000000] nr_irqs_gsi: 16 [ 0.000000] Booting paravirtualized kernel on Xen [ 0.000000] Xen version: 4.4-unstable (preserve-AD) [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:8 nr_node_ids:8 [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fc00000 s85120 r8192 d21376 u2097152 [ 0.000000] pcpu-alloc: s85120 r8192 d21376 u2097152 alloc=1*2097152 [ 0.000000] pcpu-alloc: [0] 0 [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 [6] 6 [7] 7 [ 0.000000] Built 8 zonelists in Node order, mobility grouping on. Total pages: 4136842 numactl withing running guest: root@heatpipe:~# numactl --ha available: 8 nodes (0-7) node 0 cpus: 7 node 0 size: 2047 MB node 0 free: 2001 MB node 1 cpus: 6 node 1 size: 2048 MB node 1 free: 2008 MB node 2 cpus: 5 node 2 size: 2048 MB node 2 free: 2010 MB node 3 cpus: 4 node 3 size: 2048 MB node 3 free: 2009 MB node 4 cpus: 3 node 4 size: 2048 MB node 4 free: 2009 MB node 5 cpus: 0 node 5 size: 2048 MB node 5 free: 1982 MB node 6 cpus: 1 node 6 size: 2048 MB node 6 free: 2008 MB node 7 cpus: 2 node 7 size: 2048 MB node 7 free: 1944 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 20 20 20 20 20 20 20 1: 20 10 20 20 20 20 20 20 2: 20 20 10 20 20 20 20 20 3: 20 20 20 10 20 20 20 20 4: 20 20 20 20 10 20 20 20 5: 20 20 20 20 20 10 20 20 6: 20 20 20 20 20 20 10 20 7: 20 20 20 20 20 20 20 10 root@heatpipe:~# numastat -c Per-node numastat info (in MBs): Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total ------ ------ ------ ------ ------ ------ ------ ------ ----- Numa_Hit 37 43 35 42 43 97 45 58 401 Numa_Miss 0 0 0 0 0 0 0 0 0 Numa_Foreign 0 0 0 0 0 0 0 0 0 Interleave_Hit 7 7 7 7 7 7 7 7 56 Local_Node 28 34 26 33 34 97 36 49 336 Other_Node 9 9 9 9 9 0 9 9 65 Patchset applies to latest Xen tree commit e008e9119d03852020b93e1d4da9a80ec1af9c75 Available at http://git.gitorious.org/xenvnuma/xenvnuma.git Elena Ufimtseva (7): Xen vNUMA for PV guests. Per-domain vNUMA initialization. vNUMA nodes allocation on NUMA nodes. vNUMA libxl supporting functionality. vNUMA VM config parsing functions xl.cgf documentation update for vNUMA. NUMA debug-key additional output for vNUMA docs/man/xl.cfg.pod.5 | 50 +++++++++++ tools/libxc/xc_dom.h | 9 ++ tools/libxc/xc_dom_x86.c | 77 ++++++++++++++-- tools/libxc/xc_domain.c | 57 ++++++++++++ tools/libxc/xenctrl.h | 9 ++ tools/libxc/xg_private.h | 1 + tools/libxl/libxl.c | 19 ++++ tools/libxl/libxl.h | 20 ++++- tools/libxl/libxl_arch.h | 5 ++ tools/libxl/libxl_dom.c | 105 +++++++++++++++++++++- tools/libxl/libxl_internal.h | 3 + tools/libxl/libxl_types.idl | 5 +- tools/libxl/libxl_x86.c | 86 ++++++++++++++++++ tools/libxl/xl_cmdimpl.c | 205 ++++++++++++++++++++++++++++++++++++++++++ xen/arch/x86/numa.c | 23 ++++- xen/common/domain.c | 25 +++++- xen/common/domctl.c | 68 +++++++++++++- xen/common/memory.c | 56 ++++++++++++ xen/include/public/domctl.h | 15 +++- xen/include/public/memory.h | 9 +- xen/include/xen/domain.h | 11 +++ xen/include/xen/sched.h | 1 + xen/include/xen/vnuma.h | 27 ++++++ 23 files changed, 869 insertions(+), 17 deletions(-) create mode 100644 xen/include/xen/vnuma.h -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |