[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On Thu, Jul 16, 2015 at 12:32:42PM +0200, Dario Faggioli wrote: > Hey, > > This started on IRC, but it's actually appropriate to have the > conversation here. > > I just discovered an issue with vNUMA, when PV guests are used. In fact, > creating a 4 vCPUs PV guest, and making up things so that all the 4 > vCPUs should be busy, I see this: > > root@Zhaman:~# xl vcpu-list test > Name ID VCPU CPU State Time(s) Affinity > (Hard / Soft) > test 4 0 5 r-- 1481.9 all / 0-7 > test 4 1 2 r-- 1479.4 all / 0-7 > test 4 2 15 -b- 7.5 all / 8-15 > test 4 3 10 -b- 1324.8 all / 8-15 > > Going checking inside the guest, confirms that *everything* runs on > vCPUs 0 and 1. However, using schedtool or taskset, I can force tasks to > execute on vCPUs 2 and 3. > > Inspecting the guest's dmesg, I've seen this: > > [ 0.128416] ------------[ cut here ]------------ > [ 0.128416] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 > topology_sane.isra.2+0x74/0x88() > [ 0.128416] sched: CPU #2's smt-sibling CPU #0 is not on the same node! > [node: 1 != 0]. Ignoring dependency. > [ 0.128416] Modules linked in: > [ 0.128416] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1 > [ 0.128416] 0000000000000009 ffff88001ee3bdd0 ffffffff81657c7b > ffffffff810bbd2c > [ 0.128416] ffff88001ee3be20 ffff88001ee3be10 ffffffff81081510 > ffff88001ee3bea0 > [ 0.128416] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 > ffff88001f20a040 > [ 0.128416] Call Trace: > [ 0.128416] [<ffffffff81657c7b>] dump_stack+0x4f/0x7b > [ 0.128416] [<ffffffff810bbd2c>] ? up+0x39/0x3e > [ 0.128416] [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb > [ 0.128416] [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88 > [ 0.128416] [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48 > [ 0.128416] [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19 > [ 0.128416] [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88 > [ 0.128416] [<ffffffff8103ac70>] set_cpu_sibling_map+0x21a/0x444 > [ 0.128416] [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f > [ 0.128416] [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8 > [ 0.128416] [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a > [ 0.128416] ---[ end trace 95bff1aef57ee1b1 ]--- > > So, basically, Linux is complaining that we're trying to put two vCPUs, > that looks to be SMT siblings, on different NUMA nodes. And, yes, I > think this is quite disruptive for the Linux's scheduler internal logic. > > The vnuma bits of the guest config are these: > > vnuma = [ [ "pnode=0","size=512","vcpus=0-1","vdistances=10,20" ], > [ "pnode=1","size=512","vcpus=2-3","vdistances=20,10" ] ] > > From inside the guest, the topology looks to be like this: > > root@test:~# numactl --hardware > available: 2 nodes (0-1) > node 0 cpus: 0 1 > node 0 size: 475 MB > node 0 free: 382 MB > node 1 cpus: 2 3 > node 1 size: 495 MB > node 1 free: 475 MB > node distances: > node 0 1 > 0: 10 10 > 1: 20 10 > > root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list > 0-1 > root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list > 0-3 > root@test:~# cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list > 2-3 > root@test:~# cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list > 0-3 > > So the complain during boot seems to be against 'core_siblings' (which > was not what I expected, but perhaps I misremember the meaning of > "core_siblings" VS. "thread_siblings" VS. smt-siblings in Linux; I'll > double check). > > Anyway, is there anything we can do to fix or workaround things? > IIRC Linux already consumes some bits returned by cpuid anyway, is it possible to generate a "dummy" layout in Linux kernel according to vNUMA information? I had this idea long ago but wasn't quite sure if it's dumb or not. Wei. > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |