[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Tue, 22 Jul 2014 15:48:46 +0100
Cc: keir@xxxxxxx, Ian.Campbell@xxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx, george.dunlap@xxxxxxxxxxxxx, msw@xxxxxxxxx, lccycc123@xxxxxxxxx, ian.jackson@xxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxx, JBeulich@xxxxxxxx, Elena Ufimtseva <ufimtseva@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>
Delivery-date: Tue, 22 Jul 2014 14:49:04 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Jul 22, 2014 at 04:03:44PM +0200, Dario Faggioli wrote:
> On ven, 2014-07-18 at 12:48 +0100, Wei Liu wrote:
> > On Fri, Jul 18, 2014 at 12:13:36PM +0200, Dario Faggioli wrote:
> > > On ven, 2014-07-18 at 10:53 +0100, Wei Liu wrote:
> 
> > > > I've also encountered this. I suspect that even if you disble SMT with
> > > > cpuid in config file, the cpu topology in guest might still be wrong.
> > > >
> > > Can I ask why?
> > > 
> > 
> > Because for a PV guest (currently) the guest kernel sees the real "ID"s
> > for a cpu. See those "ID"s I change in my hacky patch.
> > 
> Right, now I see/remember it. Well, this is, I think, something we
> should try to fix _independently_ from vNUMA, isn't it?
> 
> I mean, even right now, PV guests see completely random cache-sharing
> topology, and that does (at least potentially) affect performance, as
> the guest scheduler will make incorrect/inconsistent assumptions.
> 

Correct. It's just that it might be more obvious to see the problem with
vNUMA.

> I'm not sure what the correct fix is. Probably something similar to what
> you're doing in your hack... but, indeed, I think we should do something
> about this!
> 
> > > > What do hwloc-ls and lscpu show? Do you see any weird topology like one
> > > > core belongs to one node while three belong to another?
> > > >
> > > Yep, that would be interesting to see.
> > > 
> > > >  (I suspect not
> > > > because your vcpus are already pinned to a specific node)
> > > > 
> > > Sorry, I'm not sure I follow here... Are you saying that things probably
> > > works ok, but that is (only) because of pinning?
> > 
> > Yes, given that you derive numa memory allocation from cpu pinning or
> > use combination of cpu pinning, vcpu to vnode map and vnode to pnode
> > map, in those cases those IDs might reflect the right topology.
> > 
> Well, pinning does (should?) not always happen, as a consequence of a
> virtual topology being used.
> 

That's true. I was just referring to the current status of the patch
series. AIUI that's how it is implemented now, not necessary the way it
has to be.

> So, again, I don't think we should rely on pinning to have a sane and,
> more important, consistent SMT and cache sharing topology.
> 
> Linux maintainers, any ideas?
> 
> 
> BTW, I tried a few examples, on the following host:
> 
> root@benny:~# xl info -n
> ...
> nr_cpus                : 8
> max_cpu_id             : 15
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 2
> cpu_mhz                : 3591
> ...
> cpu_topology           :
> cpu:    core    socket     node
>   0:       0        0        0
>   1:       0        0        0
>   2:       1        0        0
>   3:       1        0        0
>   4:       2        0        0
>   5:       2        0        0
>   6:       3        0        0
>   7:       3        0        0
> numa_info              :
> node:    memsize    memfree    distances
>    0:     34062      31029      10
> 
> With the following guest configuration, in terms of vcpu pinning:
> 
> 1) 2 vCPUs ==> same pCPUs

4 vcpus, I think.

> root@benny:~# xl vcpu-list 
> Name                                ID  VCPU   CPU State   Time(s) CPU 
> Affinity
> debian.guest.osstest                 9     0    0   -b-       2.7  0
> debian.guest.osstest                 9     1    0   -b-       5.2  0
> debian.guest.osstest                 9     2    7   -b-       2.4  7
> debian.guest.osstest                 9     3    7   -b-       4.4  7
> 
> 2) no SMT
> root@benny:~# xl vcpu-list 
> Name                                ID  VCPU   CPU State   Time(s) CPU
> Affinity
> debian.guest.osstest                11     0    0   -b-       0.6  0
> debian.guest.osstest                11     1    2   -b-       0.4  2
> debian.guest.osstest                11     2    4   -b-       1.5  4
> debian.guest.osstest                11     3    6   -b-       0.5  6
> 
> 3) Random
> root@benny:~# xl vcpu-list 
> Name                                ID  VCPU   CPU State   Time(s) CPU
> Affinity
> debian.guest.osstest                12     0    3   -b-       1.6  all
> debian.guest.osstest                12     1    1   -b-       1.4  all
> debian.guest.osstest                12     2    5   -b-       2.4  all
> debian.guest.osstest                12     3    7   -b-       1.5  all
> 
> 4) yes SMT
> root@benny:~# xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) CPU
> Affinity
> debian.guest.osstest                14     0    1   -b-       1.0  1
> debian.guest.osstest                14     1    2   -b-       1.8  2
> debian.guest.osstest                14     2    6   -b-       1.1  6
> debian.guest.osstest                14     3    7   -b-       0.8  7
> 
> And, in *all* these 4 cases, here's what I see:
> 
> root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/core_siblings_list
> 0-3
> 0-3
> 0-3
> 0-3
> 
> root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list
> 0-3
> 0-3
> 0-3
> 0-3
> 
> root@debian:~# lstopo
> Machine (488MB) + Socket L#0 + L3 L#0 (8192KB) + L2 L#0 (256KB) + L1 L#0 
> (32KB) + Core L#0
>   PU L#0 (P#0)
>   PU L#1 (P#1)
>   PU L#2 (P#2)
>   PU L#3 (P#3)
> 

I won't be surprised if guest builds up a wrong topology, as what real
"ID"s it sees depends very much on what pcpus you pick.

Have you tried pinning vcpus to pcpus [0, 1, 2, 3]? That way you should
be able to see the same topology as the one you saw in Dom0?

> root@debian:~# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    4
> Core(s) per socket:    1
> Socket(s):             1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 60
> Stepping:              3
> CPU MHz:               3591.780
> BogoMIPS:              7183.56
> Hypervisor vendor:     Xen
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              8192K
> 
> I.e., no matter how I pin the vcpus, the guest sees the 4 vcpus as if
> they were all SMT siblings, within the same core, sharing all cache
> levels.
> 
> This is not the case for dom0 where (I booted with dom0_max_vcpus=4 on
> the xen command line) I see this:
> 

I guess this is because you're basically picking pcpu 0-3 for Dom0. It
doesn't matter if you pin them or not.

Wei.

> root@benny:~# lstopo
> Machine (422MB)
>   Socket L#0 + L3 L#0 (8192KB)
>     L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
>       PU L#0 (P#0)
>       PU L#1 (P#1)
>     L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
>       PU L#2 (P#2)
>       PU L#3 (P#3)
> 
> root@benny:~# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    2
> Core(s) per socket:    2
> Socket(s):             1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 60
> Stepping:              3
> CPU MHz:               3591.780
> BogoMIPS:              7183.56
> Hypervisor vendor:     Xen
> Virtualization type:   none
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              8192K
> 
> What am I doing wrong, or what am I missing?
> 
> Thanks and Regards,
> Dario
> 
> -- 
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Dario Faggioli

References:
- [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Elena Ufimtseva
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Wei Liu
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Wei Liu
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Dario Faggioli

Prev by Date: Re: [Xen-devel] [RFC PATCH 2/2] xen/arm: introduce XENFEAT_grant_map_11
Next by Date: [Xen-devel] [PATCH v4] add support for libvirt-like channels
Previous by thread: Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
Next by thread: Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.