[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On 07/20/2015 10:43 AM, Boris Ostrovsky wrote: On 07/20/2015 10:09 AM, Dario Faggioli wrote:On Fri, 2015-07-17 at 14:17 -0400, Boris Ostrovsky wrote:On 07/17/2015 03:27 AM, Dario Faggioli wrote:In the meanwhile, what should we do? Document this? How? "don't use vNUMA with PV guest in SMT enabled systems" seems a bit harsh... Is there a workaround we can put in place/suggest?I haven't been able to reproduce this on my Intel box because I think I have different core enumeration.Yes, most likely, that's highly topology dependant. :-(Can you try adding cpuid=['0x1:ebx=xxxxxxxx00000001xxxxxxxxxxxxxxxx'] to your config file?Done (sorry for the delay, the testbox was busy doing other stuff). Still no joy (.101 is the IP address of the guest, domain id 3): root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &" root@Zhaman:~# xl vcpu-list 3Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)test 3 0 4 r-- 23.6 all / 0-7 test 3 1 9 r-- 19.8 all / 0-7 test 3 2 8 -b- 0.4 all / 8-15 test 3 3 4 -b- 0.2 all / 8-15 *HOWEVER* it seems to have an effect. In fact, now, topology as it is shown in /sys/... is different:root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list0 (it was 0-1) This, OTOH, is still the same:root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list0-3 Also, I now see this: [ 0.150560] ------------[ cut here ]------------[ 0.150560] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 topology_sane.isra.2+0x74/0x88() [ 0.150560] sched: CPU #2's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.[ 0.150560] Modules linked in: [ 0.150560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1[ 0.150560] 0000000000000009 ffff88001ee2fdd0 ffffffff81657c7b ffffffff810bbd2c [ 0.150560] ffff88001ee2fe20 ffff88001ee2fe10 ffffffff81081510 ffff88001ee2fea0 [ 0.150560] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 ffff88001f20a040[ 0.150560] Call Trace: [ 0.150560] [<ffffffff81657c7b>] dump_stack+0x4f/0x7b [ 0.150560] [<ffffffff810bbd2c>] ? up+0x39/0x3e [ 0.150560] [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb [ 0.150560] [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88 [ 0.150560] [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48 [ 0.150560] [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19 [ 0.150560] [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88 [ 0.150560] [<ffffffff8103acd0>] set_cpu_sibling_map+0x27a/0x444 [ 0.150560] [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f [ 0.150560] [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8 [ 0.150560] [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a [ 0.150560] ---[ end trace 63d204896cce9f68 ]--- Notice that it now says 'llc-sibling', while, before, it was saying 'smt-sibling'.Exactly. You are now passing the first topology test which was to see that threads are on the same node. And since each processor has only one thread (as evidenced by thread_siblings_list) we are good.The second test checks that cores (i.e. things that share last level cache) are on the same node. And they are not.On AMD, BTW, we fail a different test so some other bits probably need to be tweaked. You may fail it too (the LLC sanity check).Yep, that's the one I guess. Should I try something more/else?I'll need to see how LLC IDs are calculated, probably also from some CPUID bits. No, can't do this: LLC is calculated from CPUID leaf 4 (on Intel) which use indexes in ECX register and xl syntax doesn't allow you to override CPUIDs for such leaves. -boris The question though will be --- what do we do with how cache sizes (and TLB sizes for that matter) are presented to the guests. Do we scale them down per thread?-boris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |