[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 0/4] vnuma introduction



Hello

I sent v11 of patches which includes only patches 1-4 that did not
have serious objections;

The patches 5-9 will need to be slightly changed to make sure on arm
tools will not break.
The sent patches do not include tools and only have libxc hypercall
wrapper. The rest is
hypevisor part only.

Konrad,
I guess I will need more time past Sept 10 to finish libxl patches. Do
I have to send some
special email to request exclusion for them from cutoff?

PS.
I will be away until Monday September 8.

Thank you
Elena


On Fri, Sep 5, 2014 at 12:06 AM, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote:
> vNUMA introduction
>
> This series of patches introduces vNUMA topology awareness and
> provides interfaces and data structures to enable vNUMA for
> PV guests. There is a plan to extend this support for dom0 and
> HVM domains.
>
> v11 of the series has patches 1 to 4, the libxl and libxc are to follow
> in the next version after required changes will be made.
>
> vNUMA topology support should be supported by PV guest kernel.
> Corresponding patches should be applied.
>
> Introduction
> -------------
>
> vNUMA topology is exposed to the PV guest to improve performance when running
> workloads on NUMA machines. vNUMA enabled guests may be running on non-NUMA
> machines and thus having virtual NUMA topology visible to guests.
> XEN vNUMA implementation provides a way to run vNUMA-enabled guests on 
> NUMA/UMA
> and flexibly map vNUMA topology to physical NUMA topology.
>
> Mapping to physical NUMA topology may be done in manual and automatic way.
> By default, every PV domain has one vNUMA node. It is populated by default
> parameters and does not affect performance. To use automatic way of 
> initializing
> vNUMA topology, configuration file need only to have number of vNUMA nodes
> defined. Not-defined vNUMA topology parameters will be initialized to default
> ones.
>
> vNUMA topology is currently defined as a set of parameters such as:
>     number of vNUMA nodes;
>     number of vNUMA memory regions;
>     distance table;
>     vranges memory sizes*;
>     vcpus to vnodes mapping;
>     vnode to pnode map (for NUMA machines).
>
> * As this series introduces vNUMA for PV guests, there is one range per vNUMA
> node limit;
>
> This set of patches introduces two hypercall subops: XEN_DOMCTL_setvnumainfo
> and XENMEM_get_vnuma_info.
>
>     XEN_DOMCTL_setvnumainfo is used by toolstack to populate domain
> vNUMA topology with user defined configuration or the parameters by default.
> vNUMA is defined for every PV domain and if no vNUMA configuration found,
> one vNUMA node is initialized and all cpus are assigned to it. All other
> parameters set to their default values.
>
>     XENMEM_gevnumainfo is used by the PV domain to get the information
> from hypervisor about vNUMA topology. Guest sends its memory sizes allocated
> for different vNUMA parameters and hypervisor fills it with topology.
> Future work to use this in HVM guests in the toolstack is required and
> in the hypervisor to allow HVM guests to use these hypercalls.
>
> libxl
>
> libxl allows us to define vNUMA topology in configuration file and verifies 
> that
> configuration is correct. libxl also verifies mapping of vnodes to pnodes and
> uses it in case of NUMA-machine and if automatic placement was disabled. In 
> case
> of incorrect/insufficient configuration, one vNUMA node will be initialized
> and populated with default values.
>
> libxc
>
> libxc builds the vnodes memory addresses for guest and makes necessary
> alignments to the addresses. It also takes into account guest e820 memory map
> configuration. The domain memory is allocated and vnode to pnode mapping
> is used to determine target node for particular vnode. If this mapping was not
> defined, it is not a NUMA machine or automatic NUMA placement is enabled, the
> default not node-specific allocation will be used.
>
> hypervisor vNUMA initialization
>
> PV guest
>
> As of now, only PV guest can take advantage of vNUMA functionality.
> Such guest allocates the memory for NUMA topology, sets number of nodes and
> cpus so hypervisor has information about how much memory guest has
> preallocated for vNUMA topology. Further guest makes subop hypercall
> XENMEM_getvnumainfo.
> If for some reason vNUMA topology cannot be initialized, Linux guest
> will have only one NUMA node initialized (standard Linux behavior).
> To enable this, vNUMA Linux patches should be applied and vNUMA supporting
> patches should be applied to PV kernel.
>
> Linux kernel patch is available here:
> https://git.gitorious.org/vnuma/linux_vnuma.git
> git://gitorious.org/vnuma/linux_vnuma.git
>
> Automatic vNUMA placement
>
> vNUMA automatic placement will be enabled if numa automatic placement is
> not in enabled or, if disabled, if vnode to pnode mapping is incorrect. If
> vnode to pnode mapping is correct and automatic NUMA placement disabled,
> vNUMA nodes will be allocated on nodes as it was specified in the guest
> config file.
>
> Xen patchset is available here:
> https://git.gitorious.org/vnuma/xen_vnuma.git
> git://gitorious.org/vnuma/xen_vnuma.git
>
>
> Examples of booting vNUMA enabled PV Linux guest on real NUMA machine:
>
> memory = 4000
> vcpus = 2
> # The name of the domain, change this if you want more than 1 VM.
> name = "null"
> vnodes = 2
> #vnumamem = [3000, 1000]
> #vnumamem = [4000,0]
> vdistance = [10, 20]
> vnuma_vcpumap = [1, 0]
> vnuma_vnodemap = [1]
> vnuma_autoplacement = 0
> #e820_host = 1
>
> [    0.000000] Linux version 3.15.0-rc8+ (assert@superpipe) (gcc version 
> 4.7.2 (Debian 4.7.2-5) ) #43 SMP Fri Jun 27 01:23:11 EDT 2014
> [    0.000000] Command line: root=/dev/xvda1 ro earlyprintk=xen debug 
> loglevel=8 debug print_fatal_signals=1 loglvl=all guest_loglvl=all LOGLEVEL=8 
> earlyprintk=xen sched_debug
> [    0.000000] ACPI in unprivileged domain disabled
> [    0.000000] e820: BIOS-provided physical RAM map:
> [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
> [    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
> [    0.000000] Xen: [mem 0x0000000000100000-0x00000000f9ffffff] usable
> [    0.000000] bootconsole [xenboot0] enabled
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] DMI not present or invalid.
> [    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
> [    0.000000] No AGP bridge found
> [    0.000000] e820: last_pfn = 0xfa000 max_arch_pfn = 0x400000000
> [    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
> [    0.000000] init_memory_mapping: [mem 0xf9e00000-0xf9ffffff]
> [    0.000000]  [mem 0xf9e00000-0xf9ffffff] page 4k
> [    0.000000] BRK [0x019c8000, 0x019c8fff] PGTABLE
> [    0.000000] BRK [0x019c9000, 0x019c9fff] PGTABLE
> [    0.000000] init_memory_mapping: [mem 0xf8000000-0xf9dfffff]
> [    0.000000]  [mem 0xf8000000-0xf9dfffff] page 4k
> [    0.000000] BRK [0x019ca000, 0x019cafff] PGTABLE
> [    0.000000] BRK [0x019cb000, 0x019cbfff] PGTABLE
> [    0.000000] BRK [0x019cc000, 0x019ccfff] PGTABLE
> [    0.000000] BRK [0x019cd000, 0x019cdfff] PGTABLE
> [    0.000000] init_memory_mapping: [mem 0x80000000-0xf7ffffff]
> [    0.000000]  [mem 0x80000000-0xf7ffffff] page 4k
> [    0.000000] init_memory_mapping: [mem 0x00100000-0x7fffffff]
> [    0.000000]  [mem 0x00100000-0x7fffffff] page 4k
> [    0.000000] RAMDISK: [mem 0x01dd8000-0x035c5fff]
> [    0.000000] Nodes received = 2
> [    0.000000] NUMA: Initialized distance table, cnt=2
> [    0.000000] Initmem setup node 0 [mem 0x00000000-0x7cffffff]
> [    0.000000]   NODE_DATA [mem 0x7cfd9000-0x7cffffff]
> [    0.000000] Initmem setup node 1 [mem 0x7d000000-0xf9ffffff]
> [    0.000000]   NODE_DATA [mem 0xf9828000-0xf984efff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
> [    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x00001000-0x0009ffff]
> [    0.000000]   node   0: [mem 0x00100000-0x7cffffff]
> [    0.000000]   node   1: [mem 0x7d000000-0xf9ffffff]
> [    0.000000] On node 0 totalpages: 511903
> [    0.000000]   DMA zone: 64 pages used for memmap
> [    0.000000]   DMA zone: 21 pages reserved
> [    0.000000]   DMA zone: 3999 pages, LIFO batch:0
> [    0.000000]   DMA32 zone: 7936 pages used for memmap
> [    0.000000]   DMA32 zone: 507904 pages, LIFO batch:31
> [    0.000000] On node 1 totalpages: 512000
> [    0.000000]   DMA32 zone: 8000 pages used for memmap
> [    0.000000]   DMA32 zone: 512000 pages, LIFO batch:31
> [    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
> [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> [    0.000000] nr_irqs_gsi: 16
> [    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff]
> [    0.000000] e820: [mem 0xfa000000-0xffffffff] available for PCI devices
> [    0.000000] Booting paravirtualized kernel on Xen
> [    0.000000] Xen version: 4.5-unstable (preserve-AD)
> [    0.000000] setup_percpu: NR_CPUS:20 nr_cpumask_bits:20 nr_cpu_ids:2 
> nr_node_ids:2
> [    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007ac00000 s85888 r8192 
> d20608 u2097152
> [    0.000000] pcpu-alloc: s85888 r8192 d20608 u2097152 alloc=1*2097152
> [    0.000000] pcpu-alloc: [0] 0 [1] 1
> [    0.000000] xen: PV spinlocks enabled
> [    0.000000] Built 2 zonelists in Node order, mobility grouping on.  Total 
> pages: 1007882
> [    0.000000] Policy zone: DMA32
> [    0.000000] Kernel command line: root=/dev/xvda1 ro earlyprintk=xen debug 
> loglevel=8 debug print_fatal_signals=1 loglvl=all guest_loglvl=all LOGLEVEL=8 
> earlyprintk=xen sched_debug
> [    0.000000] Memory: 3978224K/4095612K available (4022K kernel code, 769K 
> rwdata, 1744K rodata, 1532K init, 1472K bss, 117388K reserved)
> [    0.000000] Enabling automatic NUMA balancing. Configure with 
> numa_balancing= or the kernel.numa_balancing sysctl
> [    0.000000] installing Xen timer for CPU 0
> [    0.000000] tsc: Detected 2394.276 MHz processor
> [    0.004000] Calibrating delay loop (skipped), value calculated using timer 
> frequency.. 4788.55 BogoMIPS (lpj=9577104)
> [    0.004000] pid_max: default: 32768 minimum: 301
> [    0.004179] Dentry cache hash table entries: 524288 (order: 10, 4194304 
> bytes)
> [    0.006782] Inode-cache hash table entries: 262144 (order: 9, 2097152 
> bytes)
> [    0.007216] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
> [    0.007288] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 
> bytes)
> [    0.007935] CPU: Physical Processor ID: 0
> [    0.007942] CPU: Processor Core ID: 0
> [    0.007951] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
> [    0.007951] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
> [    0.007951] tlb_flushall_shift: 6
> [    0.021249] cpu 0 spinlock event irq 17
> [    0.021292] Performance Events: unsupported p6 CPU model 45 no PMU driver, 
> software events only.
> [    0.022162] NMI watchdog: disabled (cpu0): hardware events not enabled
> [    0.022625] installing Xen timer for CPU 1
>
> root@heatpipe:~# numactl --ha
> available: 2 nodes (0-1)
> node 0 cpus: 0
> node 0 size: 1933 MB
> node 0 free: 1894 MB
> node 1 cpus: 1
> node 1 size: 1951 MB
> node 1 free: 1926 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
>
> root@heatpipe:~# numastat
>                            node0           node1
> numa_hit                   52257           92679
> numa_miss                      0               0
> numa_foreign                   0               0
> interleave_hit              4254            4238
> local_node                 52150           87364
> other_node                   107            5315
>
> root@superpipe:~# xl debug-keys u
>
> (XEN) Domain 7 (total: 1024000):
> (XEN)     Node 0: 1024000
> (XEN)     Node 1: 0
> (XEN)     Domain has 2 vnodes, 2 vcpus
> (XEN)         vnode 0 - pnode 0, 2000 MB, vcpu nums: 0
> (XEN)         vnode 1 - pnode 0, 2000 MB, vcpu nums: 1
>
>
> memory = 4000
> vcpus = 8
> # The name of the domain, change this if you want more than 1 VM.
> name = "null1"
> vnodes = 8
> #vnumamem = [3000, 1000]
> vdistance = [10, 40]
> #vnuma_vcpumap = [1, 0, 3, 2]
> vnuma_vnodemap = [1, 0, 1, 1, 0, 0, 1, 1]
> vnuma_autoplacement = 1
> e820_host = 1
>
> [    0.000000] Freeing ac228-fa000 pfn range: 318936 pages freed
> [    0.000000] 1-1 mapping on ac228->100000
> [    0.000000] Released 318936 pages of unused memory
> [    0.000000] Set 343512 page(s) to 1-1 mapping
> [    0.000000] Populating 100000-14ddd8 pfn range: 318936 pages added
> [    0.000000] e820: BIOS-provided physical RAM map:
> [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
> [    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
> [    0.000000] Xen: [mem 0x0000000000100000-0x00000000ac227fff] usable
> [    0.000000] Xen: [mem 0x00000000ac228000-0x00000000ac26bfff] reserved
> [    0.000000] Xen: [mem 0x00000000ac26c000-0x00000000ac57ffff] unusable
> [    0.000000] Xen: [mem 0x00000000ac580000-0x00000000ac5a0fff] reserved
> [    0.000000] Xen: [mem 0x00000000ac5a1000-0x00000000ac5bbfff] unusable
> [    0.000000] Xen: [mem 0x00000000ac5bc000-0x00000000ac5bdfff] reserved
> [    0.000000] Xen: [mem 0x00000000ac5be000-0x00000000ac5befff] unusable
> [    0.000000] Xen: [mem 0x00000000ac5bf000-0x00000000ac5cafff] reserved
> [    0.000000] Xen: [mem 0x00000000ac5cb000-0x00000000ac5d9fff] unusable
> [    0.000000] Xen: [mem 0x00000000ac5da000-0x00000000ac5fafff] reserved
> [    0.000000] Xen: [mem 0x00000000ac5fb000-0x00000000ac6b5fff] unusable
> [    0.000000] Xen: [mem 0x00000000ac6b6000-0x00000000ac7fafff] ACPI NVS
> [    0.000000] Xen: [mem 0x00000000ac7fb000-0x00000000ac80efff] unusable
> [    0.000000] Xen: [mem 0x00000000ac80f000-0x00000000ac80ffff] ACPI data
> [    0.000000] Xen: [mem 0x00000000ac810000-0x00000000ac810fff] unusable
> [    0.000000] Xen: [mem 0x00000000ac811000-0x00000000ac812fff] ACPI data
> [    0.000000] Xen: [mem 0x00000000ac813000-0x00000000ad7fffff] unusable
> [    0.000000] Xen: [mem 0x00000000b0000000-0x00000000b3ffffff] reserved
> [    0.000000] Xen: [mem 0x00000000fed20000-0x00000000fed3ffff] reserved
> [    0.000000] Xen: [mem 0x00000000fed50000-0x00000000fed8ffff] reserved
> [    0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
> [    0.000000] Xen: [mem 0x00000000ffa00000-0x00000000ffa3ffff] reserved
> [    0.000000] Xen: [mem 0x0000000100000000-0x000000014ddd7fff] usable
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] DMI not present or invalid.
> [    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
> [    0.000000] No AGP bridge found
> [    0.000000] e820: last_pfn = 0x14ddd8 max_arch_pfn = 0x400000000
> [    0.000000] e820: last_pfn = 0xac228 max_arch_pfn = 0x400000000
> [    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
> [    0.000000] init_memory_mapping: [mem 0x14da00000-0x14dbfffff]
> [    0.000000]  [mem 0x14da00000-0x14dbfffff] page 4k
> [    0.000000] BRK [0x019cd000, 0x019cdfff] PGTABLE
> [    0.000000] BRK [0x019ce000, 0x019cefff] PGTABLE
> [    0.000000] init_memory_mapping: [mem 0x14c000000-0x14d9fffff]
> [    0.000000]  [mem 0x14c000000-0x14d9fffff] page 4k
> [    0.000000] BRK [0x019cf000, 0x019cffff] PGTABLE
> [    0.000000] BRK [0x019d0000, 0x019d0fff] PGTABLE
> [    0.000000] BRK [0x019d1000, 0x019d1fff] PGTABLE
> [    0.000000] BRK [0x019d2000, 0x019d2fff] PGTABLE
> [    0.000000] init_memory_mapping: [mem 0x100000000-0x14bffffff]
> [    0.000000]  [mem 0x100000000-0x14bffffff] page 4k
> [    0.000000] init_memory_mapping: [mem 0x00100000-0xac227fff]
> [    0.000000]  [mem 0x00100000-0xac227fff] page 4k
> [    0.000000] init_memory_mapping: [mem 0x14dc00000-0x14ddd7fff]
> [    0.000000]  [mem 0x14dc00000-0x14ddd7fff] page 4k
> [    0.000000] RAMDISK: [mem 0x01dd8000-0x0347ffff]
> [    0.000000] Nodes received = 8
> [    0.000000] NUMA: Initialized distance table, cnt=8
> [    0.000000] Initmem setup node 0 [mem 0x00000000-0x1f3fffff]
> [    0.000000]   NODE_DATA [mem 0x1f3d9000-0x1f3fffff]
> [    0.000000] Initmem setup node 1 [mem 0x1f800000-0x3e7fffff]
> [    0.000000]   NODE_DATA [mem 0x3e7d9000-0x3e7fffff]
> [    0.000000] Initmem setup node 2 [mem 0x3e800000-0x5dbfffff]
> [    0.000000]   NODE_DATA [mem 0x5dbd9000-0x5dbfffff]
> [    0.000000] Initmem setup node 3 [mem 0x5e000000-0x7cffffff]
> [    0.000000]   NODE_DATA [mem 0x7cfd9000-0x7cffffff]
> [    0.000000] Initmem setup node 4 [mem 0x7d000000-0x9c3fffff]
> [    0.000000]   NODE_DATA [mem 0x9c3d9000-0x9c3fffff]
> [    0.000000] Initmem setup node 5 [mem 0x9c800000-0x10f5d7fff]
> [    0.000000]   NODE_DATA [mem 0x10f5b1000-0x10f5d7fff]
> [    0.000000] Initmem setup node 6 [mem 0x10f800000-0x12e9d7fff]
> [    0.000000]   NODE_DATA [mem 0x12e9b1000-0x12e9d7fff]
> [    0.000000] Initmem setup node 7 [mem 0x12f000000-0x14ddd7fff]
> [    0.000000]   NODE_DATA [mem 0x14ddad000-0x14ddd3fff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
> [    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
> [    0.000000]   Normal   [mem 0x100000000-0x14ddd7fff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x00001000-0x0009ffff]
> [    0.000000]   node   0: [mem 0x00100000-0x1f3fffff]
> [    0.000000]   node   1: [mem 0x1f400000-0x3e7fffff]
> [    0.000000]   node   2: [mem 0x3e800000-0x5dbfffff]
> [    0.000000]   node   3: [mem 0x5dc00000-0x7cffffff]
> [    0.000000]   node   4: [mem 0x7d000000-0x9c3fffff]
> [    0.000000]   node   5: [mem 0x9c400000-0xac227fff]
> [    0.000000]   node   5: [mem 0x100000000-0x10f5d7fff]
> [    0.000000]   node   6: [mem 0x10f5d8000-0x12e9d7fff]
> [    0.000000]   node   7: [mem 0x12e9d8000-0x14ddd7fff]
> [    0.000000] On node 0 totalpages: 127903
> [    0.000000]   DMA zone: 64 pages used for memmap
> [    0.000000]   DMA zone: 21 pages reserved
> [    0.000000]   DMA zone: 3999 pages, LIFO batch:0
> [    0.000000]   DMA32 zone: 1936 pages used for memmap
> [    0.000000]   DMA32 zone: 123904 pages, LIFO batch:31
> [    0.000000] On node 1 totalpages: 128000
> [    0.000000]   DMA32 zone: 2000 pages used for memmap
> [    0.000000]   DMA32 zone: 128000 pages, LIFO batch:31
> [    0.000000] On node 2 totalpages: 128000
> [    0.000000]   DMA32 zone: 2000 pages used for memmap
> [    0.000000]   DMA32 zone: 128000 pages, LIFO batch:31
> [    0.000000] On node 3 totalpages: 128000
> [    0.000000]   DMA32 zone: 2000 pages used for memmap
> [    0.000000]   DMA32 zone: 128000 pages, LIFO batch:31
> [    0.000000] On node 4 totalpages: 128000
> [    0.000000]   DMA32 zone: 2000 pages used for memmap
> [    0.000000]   DMA32 zone: 128000 pages, LIFO batch:31
> [    0.000000] On node 5 totalpages: 128000
> [    0.000000]   DMA32 zone: 1017 pages used for memmap
> [    0.000000]   DMA32 zone: 65064 pages, LIFO batch:15
> [    0.000000]   Normal zone: 984 pages used for memmap
> [    0.000000]   Normal zone: 62936 pages, LIFO batch:15
> [    0.000000] On node 6 totalpages: 128000
> [    0.000000]   Normal zone: 2000 pages used for memmap
> [    0.000000]   Normal zone: 128000 pages, LIFO batch:31
> [    0.000000] On node 7 totalpages: 128000
> [    0.000000]   Normal zone: 2000 pages used for memmap
> [    0.000000]   Normal zone: 128000 pages, LIFO batch:31
> [    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
> [    0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
> [    0.000000] nr_irqs_gsi: 16
> [    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac228000-0xac26bfff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac26c000-0xac57ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac580000-0xac5a0fff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5a1000-0xac5bbfff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5bc000-0xac5bdfff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5be000-0xac5befff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5bf000-0xac5cafff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5cb000-0xac5d9fff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5da000-0xac5fafff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac5fb000-0xac6b5fff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac6b6000-0xac7fafff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac7fb000-0xac80efff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac80f000-0xac80ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac810000-0xac810fff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac811000-0xac812fff]
> [    0.000000] PM: Registered nosave memory: [mem 0xac813000-0xad7fffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xad800000-0xafffffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xb0000000-0xb3ffffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xb4000000-0xfed1ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfed20000-0xfed3ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfed40000-0xfed4ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfed50000-0xfed8ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfed90000-0xfedfffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfee00000-0xfeefffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xfef00000-0xff9fffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xffa00000-0xffa3ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0xffa40000-0xffffffff]
> [    0.000000] e820: [mem 0xb4000000-0xfed1ffff] available for PCI devices
> [    0.000000] Booting paravirtualized kernel on Xen
> [    0.000000] Xen version: 4.5-unstable (preserve-AD)
> [    0.000000] setup_percpu: NR_CPUS:20 nr_cpumask_bits:20 nr_cpu_ids:8 
> nr_node_ids:8
> [    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88001e800000 s85888 r8192 
> d20608 u2097152
> [    0.000000] pcpu-alloc: s85888 r8192 d20608 u2097152 alloc=1*2097152
> [    0.000000] pcpu-alloc: [0] 0 [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 [6] 6 [7] 7
> [    0.000000] xen: PV spinlocks enabled
> [    0.000000] Built 8 zonelists in Node order, mobility grouping on.  Total 
> pages: 1007881
> [    0.000000] Policy zone: Normal
> [    0.000000] Kernel command line: root=/dev/xvda1 ro console=hvc0 debug  
> kgdboc=hvc0 nokgdbroundup  initcall_debug debug
> [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Memory: 3976748K/4095612K available (4022K kernel code, 769K 
> rwdata, 1744K rodata, 1532K init, 1472K bss, 118864K reserved)
>
> root@heatpipe:~# numactl --ha
> maxn: 7
> available: 8 nodes (0-7)
> node 0 cpus: 0
> node 0 size: 458 MB
> node 0 free: 424 MB
> node 1 cpus: 1
> node 1 size: 491 MB
> node 1 free: 481 MB
> node 2 cpus: 2
> node 2 size: 491 MB
> node 2 free: 482 MB
> node 3 cpus: 3
> node 3 size: 491 MB
> node 3 free: 485 MB
> node 4 cpus: 4
> node 4 size: 491 MB
> node 4 free: 485 MB
> node 5 cpus: 5
> node 5 size: 491 MB
> node 5 free: 484 MB
> node 6 cpus: 6
> node 6 size: 491 MB
> node 6 free: 486 MB
> node 7 cpus: 7
> node 7 size: 476 MB
> node 7 free: 471 MB
> node distances:
> node   0   1   2   3   4   5   6   7
>   0:  10  40  40  40  40  40  40  40
>   1:  40  10  40  40  40  40  40  40
>   2:  40  40  10  40  40  40  40  40
>   3:  40  40  40  10  40  40  40  40
>   4:  40  40  40  40  10  40  40  40
>   5:  40  40  40  40  40  10  40  40
>   6:  40  40  40  40  40  40  10  40
>   7:  40  40  40  40  40  40  40  10
>
> root@heatpipe:~# numastat
>                            node0           node1           node2           
> node3
> numa_hit                  182203           14574           23800           
> 17017
> numa_miss                      0               0               0              
>  0
> numa_foreign                   0               0               0              
>  0
> interleave_hit              1016            1010            1051            
> 1030
> local_node                180995           12906           23272           
> 15338
> other_node                  1208            1668             528            
> 1679
>
>                            node4           node5           node6           
> node7
> numa_hit                   10621           15346            3529            
> 3863
> numa_miss                      0               0               0              
>  0
> numa_foreign                   0               0               0              
>  0
> interleave_hit              1026            1017            1031            
> 1029
> local_node                  8941           13680            1855            
> 2184
> other_node                  1680            1666            1674            
> 1679
>
> root@superpipe:~# xl debug-keys u
>
> (XEN) Domain 6 (total: 1024000):
> (XEN)     Node 0: 321064
> (XEN)     Node 1: 702936
> (XEN)     Domain has 8 vnodes, 8 vcpus
> (XEN)         vnode 0 - pnode 1, 500 MB, vcpu nums: 0
> (XEN)         vnode 1 - pnode 0, 500 MB, vcpu nums: 1
> (XEN)         vnode 2 - pnode 1, 500 MB, vcpu nums: 2
> (XEN)         vnode 3 - pnode 1, 500 MB, vcpu nums: 3
> (XEN)         vnode 4 - pnode 0, 500 MB, vcpu nums: 4
> (XEN)         vnode 5 - pnode 0, 1841 MB, vcpu nums: 5
> (XEN)         vnode 6 - pnode 1, 500 MB, vcpu nums: 6
> (XEN)         vnode 7 - pnode 1, 500 MB, vcpu nums: 7
>
> Current problems:
>
> This was marked as separate porblem but leaving it here for reference.
> Warning on CPU bringup on other node
>
>     The cpus in guest wich belong to different NUMA nodes are configured
>     to chare same l2 cache and thus considered to be siblings and cannot
>     be on the same node. One can see following WARNING during the boot time:
>
> [    0.022750] SMP alternatives: switching to SMP code
> [    0.004000] ------------[ cut here ]------------
> [    0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smpboot.c:303 
> topology_sane.isra.8+0x67/0x79()
> [    0.004000] sched: CPU #1's smt-sibling CPU #0 is not on the same node! 
> [node: 1 != 0]. Ignoring dependency.
> [    0.004000] Modules linked in:
> [    0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc8+ #43
> [    0.004000]  0000000000000000 0000000000000009 ffffffff813df458 
> ffff88007abe7e60
> [    0.004000]  ffffffff81048963 ffff88007abe7e70 ffffffff8102fb08 
> ffffffff00000100
> [    0.004000]  0000000000000001 ffff8800f6e13900 0000000000000000 
> 000000000000b018
> [    0.004000] Call Trace:
> [    0.004000]  [<ffffffff813df458>] ? dump_stack+0x41/0x51
> [    0.004000]  [<ffffffff81048963>] ? warn_slowpath_common+0x78/0x90
> [    0.004000]  [<ffffffff8102fb08>] ? topology_sane.isra.8+0x67/0x79
> [    0.004000]  [<ffffffff81048a13>] ? warn_slowpath_fmt+0x45/0x4a
> [    0.004000]  [<ffffffff8102fb08>] ? topology_sane.isra.8+0x67/0x79
> [    0.004000]  [<ffffffff8102fd2e>] ? set_cpu_sibling_map+0x1c9/0x3f7
> [    0.004000]  [<ffffffff81042146>] ? numa_add_cpu+0xa/0x18
> [    0.004000]  [<ffffffff8100b4e2>] ? cpu_bringup+0x50/0x8f
> [    0.004000]  [<ffffffff8100b544>] ? cpu_bringup_and_idle+0x1d/0x28
> [    0.004000] ---[ end trace 0e2e2fd5c7b76da5 ]---
> [    0.035371] x86: Booted up 2 nodes, 2 CPUs
>
> The workaround is to specify cpuid in config file and not use SMT. But soon I 
> will come up
> with some other acceptable solution.
>
> Incorrect amount of memory for nodes in debug-keys output
>
>     Since the node ranges per domain are saved in guest addresses, the memory
>     calculated is incorrect due to the guest e820 memory holes for some nodes.
>
> TODO:
>     - some modifications to automatic vnuma placement may be needed;
>     - vdistance extended configuration parser will need to be in place;
>     - SMT siblings problem (see above) will need a solution (different 
> series);
>
> Changes since v10:
>     - excluded patches 5-9 as they need some work done; Will be posted in 
> next version;
>
> Changes since v9:
>     - in XENMEM_get_vnumainfo adds condition for successful hypercall
>     completion if vnuma topology was changed while allocation temp arrays
>     and allocated arrays are big enough to hold new values;
>
> Changes since v8:
>     - added support in hypervisor for multi-range nodes for non-PV guests;
>     - added padding to structures, made sure they set to zero and checked;
>
> Changes since v7:
>     - Extended interface to support multi-regions vnuma nodes; Currently 
> toolstack
>     does not use regions and they are being treated as vnuma nodes with one 
> region;
>     - Added explicit padding for vnuma_topology_info for proper alignment;
>     - Added copying back to client of required values on failure path in 
> XENDOMCTL_set_vnumainfo
>     so caller can restart;
>     - Coding style fixes and other recommended minor changes;
>     - Changed the minimum vnume node size to match linux x86 arch limit;
>
> Changes since v6:
>     - added limit on number of vNUMA nodes per domain (32) on Xen side.
>     This will be increased in next version as this limit seem to be not
>     bug enough;
>     - added read_write lock to synchronize access to vnuma structure to 
> domain structure;
>     - added copy back of actual number of vcpus back to guest;
>     - added xsm example policies;
>     - reorganized series the way that xl implementation goes after libxl
>     definitions;
>     - changed the idl names for vnuma structure members in libxc;
>     - changed the failure path in Xen when setting vnuma topology to dont 
> create default
>     node, but fail instead and not introduce different views on vnuma between 
> toolstack
>     and Xen;
>     - changed failure path when parsing vnuma config to just fail instead of 
> creating single
>     default node;
>
> Changes since v5:
>     - reorganized patches;
>     - modified domctl hypercall and added locking;
>     - added XSM hypercalls with basic policies;
>     - verify 32bit compatibility;
>
> Elena Ufimtseva (4):
>   xen: vnuma topology and subop hypercalls
>   xsm bits for vNUMA hypercalls
>   vnuma hook to debug-keys u
>   libxc: Introduce xc_domain_setvnuma to set vNUMA
>
>  tools/flask/policy/policy/modules/xen/xen.if |    3 +-
>  tools/flask/policy/policy/modules/xen/xen.te |    2 +-
>  tools/libxc/xc_domain.c                      |   68 +++++++++++++
>  tools/libxc/xenctrl.h                        |   10 ++
>  xen/arch/x86/numa.c                          |   35 ++++++-
>  xen/common/domain.c                          |    3 +
>  xen/common/domctl.c                          |  136 
> ++++++++++++++++++++++++++
>  xen/common/memory.c                          |  133 +++++++++++++++++++++++++
>  xen/include/public/domctl.h                  |   31 ++++++
>  xen/include/public/memory.h                  |   52 +++++++++-
>  xen/include/xen/domain.h                     |   12 +++
>  xen/include/xen/sched.h                      |    4 +
>  xen/include/xsm/dummy.h                      |    6 ++
>  xen/include/xsm/xsm.h                        |    7 ++
>  xen/xsm/dummy.c                              |    1 +
>  xen/xsm/flask/hooks.c                        |   10 ++
>  xen/xsm/flask/policy/access_vectors          |    4 +
>  17 files changed, 513 insertions(+), 4 deletions(-)
>
> --
> 1.7.10.4
>



-- 
Elena

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.