[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
On Tue, Jul 22, 2014 at 04:03:44PM +0200, Dario Faggioli wrote: > On ven, 2014-07-18 at 12:48 +0100, Wei Liu wrote: > > On Fri, Jul 18, 2014 at 12:13:36PM +0200, Dario Faggioli wrote: > > > On ven, 2014-07-18 at 10:53 +0100, Wei Liu wrote: > > > > > I've also encountered this. I suspect that even if you disble SMT with > > > > cpuid in config file, the cpu topology in guest might still be wrong. > > > > > > > Can I ask why? > > > > > > > Because for a PV guest (currently) the guest kernel sees the real "ID"s > > for a cpu. See those "ID"s I change in my hacky patch. > > > Right, now I see/remember it. Well, this is, I think, something we > should try to fix _independently_ from vNUMA, isn't it? > > I mean, even right now, PV guests see completely random cache-sharing > topology, and that does (at least potentially) affect performance, as > the guest scheduler will make incorrect/inconsistent assumptions. > Correct. It's just that it might be more obvious to see the problem with vNUMA. > I'm not sure what the correct fix is. Probably something similar to what > you're doing in your hack... but, indeed, I think we should do something > about this! > > > > > What do hwloc-ls and lscpu show? Do you see any weird topology like one > > > > core belongs to one node while three belong to another? > > > > > > > Yep, that would be interesting to see. > > > > > > > (I suspect not > > > > because your vcpus are already pinned to a specific node) > > > > > > > Sorry, I'm not sure I follow here... Are you saying that things probably > > > works ok, but that is (only) because of pinning? > > > > Yes, given that you derive numa memory allocation from cpu pinning or > > use combination of cpu pinning, vcpu to vnode map and vnode to pnode > > map, in those cases those IDs might reflect the right topology. > > > Well, pinning does (should?) not always happen, as a consequence of a > virtual topology being used. > That's true. I was just referring to the current status of the patch series. AIUI that's how it is implemented now, not necessary the way it has to be. > So, again, I don't think we should rely on pinning to have a sane and, > more important, consistent SMT and cache sharing topology. > > Linux maintainers, any ideas? > > > BTW, I tried a few examples, on the following host: > > root@benny:~# xl info -n > ... > nr_cpus : 8 > max_cpu_id : 15 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 2 > cpu_mhz : 3591 > ... > cpu_topology : > cpu: core socket node > 0: 0 0 0 > 1: 0 0 0 > 2: 1 0 0 > 3: 1 0 0 > 4: 2 0 0 > 5: 2 0 0 > 6: 3 0 0 > 7: 3 0 0 > numa_info : > node: memsize memfree distances > 0: 34062 31029 10 > > With the following guest configuration, in terms of vcpu pinning: > > 1) 2 vCPUs ==> same pCPUs 4 vcpus, I think. > root@benny:~# xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > debian.guest.osstest 9 0 0 -b- 2.7 0 > debian.guest.osstest 9 1 0 -b- 5.2 0 > debian.guest.osstest 9 2 7 -b- 2.4 7 > debian.guest.osstest 9 3 7 -b- 4.4 7 > > 2) no SMT > root@benny:~# xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > debian.guest.osstest 11 0 0 -b- 0.6 0 > debian.guest.osstest 11 1 2 -b- 0.4 2 > debian.guest.osstest 11 2 4 -b- 1.5 4 > debian.guest.osstest 11 3 6 -b- 0.5 6 > > 3) Random > root@benny:~# xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > debian.guest.osstest 12 0 3 -b- 1.6 all > debian.guest.osstest 12 1 1 -b- 1.4 all > debian.guest.osstest 12 2 5 -b- 2.4 all > debian.guest.osstest 12 3 7 -b- 1.5 all > > 4) yes SMT > root@benny:~# xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > debian.guest.osstest 14 0 1 -b- 1.0 1 > debian.guest.osstest 14 1 2 -b- 1.8 2 > debian.guest.osstest 14 2 6 -b- 1.1 6 > debian.guest.osstest 14 3 7 -b- 0.8 7 > > And, in *all* these 4 cases, here's what I see: > > root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/core_siblings_list > 0-3 > 0-3 > 0-3 > 0-3 > > root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list > 0-3 > 0-3 > 0-3 > 0-3 > > root@debian:~# lstopo > Machine (488MB) + Socket L#0 + L3 L#0 (8192KB) + L2 L#0 (256KB) + L1 L#0 > (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#1) > PU L#2 (P#2) > PU L#3 (P#3) > I won't be surprised if guest builds up a wrong topology, as what real "ID"s it sees depends very much on what pcpus you pick. Have you tried pinning vcpus to pcpus [0, 1, 2, 3]? That way you should be able to see the same topology as the one you saw in Dom0? > root@debian:~# lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Thread(s) per core: 4 > Core(s) per socket: 1 > Socket(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 60 > Stepping: 3 > CPU MHz: 3591.780 > BogoMIPS: 7183.56 > Hypervisor vendor: Xen > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 8192K > > I.e., no matter how I pin the vcpus, the guest sees the 4 vcpus as if > they were all SMT siblings, within the same core, sharing all cache > levels. > > This is not the case for dom0 where (I booted with dom0_max_vcpus=4 on > the xen command line) I see this: > I guess this is because you're basically picking pcpu 0-3 for Dom0. It doesn't matter if you pin them or not. Wei. > root@benny:~# lstopo > Machine (422MB) > Socket L#0 + L3 L#0 (8192KB) > L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#1) > L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 > PU L#2 (P#2) > PU L#3 (P#3) > > root@benny:~# lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Thread(s) per core: 2 > Core(s) per socket: 2 > Socket(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 60 > Stepping: 3 > CPU MHz: 3591.780 > BogoMIPS: 7183.56 > Hypervisor vendor: Xen > Virtualization type: none > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 8192K > > What am I doing wrong, or what am I missing? > > Thanks and Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |