[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On Fri, 2015-07-17 at 14:17 -0400, Boris Ostrovsky wrote: > On 07/17/2015 03:27 AM, Dario Faggioli wrote: > > In the meanwhile, what should we do? Document this? How? "don't use > > vNUMA with PV guest in SMT enabled systems" seems a bit harsh... Is > > there a workaround we can put in place/suggest? > > I haven't been able to reproduce this on my Intel box because I think I > have different core enumeration. > Yes, most likely, that's highly topology dependant. :-( > Can you try adding > cpuid=['0x1:ebx=xxxxxxxx00000001xxxxxxxxxxxxxxxx'] > to your config file? > Done (sorry for the delay, the testbox was busy doing other stuff). Still no joy (.101 is the IP address of the guest, domain id 3): root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# xl vcpu-list 3 Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) test 3 0 4 r-- 23.6 all / 0-7 test 3 1 9 r-- 19.8 all / 0-7 test 3 2 8 -b- 0.4 all / 8-15 test 3 3 4 -b- 0.2 all / 8-15 *HOWEVER* it seems to have an effect. In fact, now, topology as it is shown in /sys/... is different: root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list 0 (it was 0-1) This, OTOH, is still the same: root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list 0-3 Also, I now see this: [ 0.150560] ------------[ cut here ]------------ [ 0.150560] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 topology_sane.isra.2+0x74/0x88() [ 0.150560] sched: CPU #2's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.150560] Modules linked in: [ 0.150560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1 [ 0.150560] 0000000000000009 ffff88001ee2fdd0 ffffffff81657c7b ffffffff810bbd2c [ 0.150560] ffff88001ee2fe20 ffff88001ee2fe10 ffffffff81081510 ffff88001ee2fea0 [ 0.150560] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 ffff88001f20a040 [ 0.150560] Call Trace: [ 0.150560] [<ffffffff81657c7b>] dump_stack+0x4f/0x7b [ 0.150560] [<ffffffff810bbd2c>] ? up+0x39/0x3e [ 0.150560] [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb [ 0.150560] [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88 [ 0.150560] [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48 [ 0.150560] [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19 [ 0.150560] [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88 [ 0.150560] [<ffffffff8103acd0>] set_cpu_sibling_map+0x27a/0x444 [ 0.150560] [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f [ 0.150560] [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8 [ 0.150560] [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a [ 0.150560] ---[ end trace 63d204896cce9f68 ]--- Notice that it now says 'llc-sibling', while, before, it was saying 'smt-sibling'. > On AMD, BTW, we fail a different test so some other bits probably need > to be tweaked. You may fail it too (the LLC sanity check). > Yep, that's the one I guess. Should I try something more/else? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |