[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Fri, 18 Jul 2014 12:48:34 +0100
Cc: keir@xxxxxxx, Ian.Campbell@xxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx, george.dunlap@xxxxxxxxxxxxx, msw@xxxxxxxxx, lccycc123@xxxxxxxxx, ian.jackson@xxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxx, JBeulich@xxxxxxxx, Elena Ufimtseva <ufimtseva@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>
Delivery-date: Fri, 18 Jul 2014 11:48:45 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Jul 18, 2014 at 12:13:36PM +0200, Dario Faggioli wrote:
> On ven, 2014-07-18 at 10:53 +0100, Wei Liu wrote:
> > Hi! Another new series!
> > 
> :-)
> 
> > On Fri, Jul 18, 2014 at 01:49:59AM -0400, Elena Ufimtseva wrote:
> 
> > > The workaround is to specify cpuid in config file and not use SMT. But 
> > > soon I will come up
> > > with some other acceptable solution.
> > > 
> > 
> For Elena, workaround like what?
> 
> > I've also encountered this. I suspect that even if you disble SMT with
> > cpuid in config file, the cpu topology in guest might still be wrong.
> >
> Can I ask why?
> 

Because for a PV guest (currently) the guest kernel sees the real "ID"s
for a cpu. See those "ID"s I change in my hacky patch.

> > What do hwloc-ls and lscpu show? Do you see any weird topology like one
> > core belongs to one node while three belong to another?
> >
> Yep, that would be interesting to see.
> 
> >  (I suspect not
> > because your vcpus are already pinned to a specific node)
> > 
> Sorry, I'm not sure I follow here... Are you saying that things probably
> works ok, but that is (only) because of pinning?

Yes, given that you derive numa memory allocation from cpu pinning or
use combination of cpu pinning, vcpu to vnode map and vnode to pnode
map, in those cases those IDs might reflect the right topology.

> 
> I may be missing something here, but would it be possible to at least
> try to make sure that the virtual topology and the topology related
> content of CPUID actually agree? And I mean doing it automatically (if

This is what I'm doing in my hack. :-)

> only one of the two is specified) and to either error or warn if that is
> not possible (if both are specified and they disagree)?
> 
> I admit I'm not a CPUID expert, but I always thought this could be a
> good solution...
> 
> > What I did was to manipulate various "id"s in Linux kernel, so that I
> > create a topology like 1 core : 1 cpu : 1 socket mapping. 
> >
> And how this topology maps/interact with the virtual topology we want
> the guest to have?
> 

Say you have a two nodes guest, with 4 vcpus, you now have two sockets
per node, each socket has one cpu, each cpu has one core.

Node 0:
  Socket 0:
    CPU0:
      Core 0
  Socket 1:
    CPU 1:
      Core 1
Node 1:
  Socket 2:
    CPU 2:
      Core 2
  Socket 3:
    CPU 3:
      Core 3

> > In that case
> > guest scheduler won't be able to make any assumption on individual CPU
> > sharing caches with each other.
> > 
> And, apart from SMT, what topology does the guest see then?
> 

See above.

> In any case, if this only alter SMT-ness (where "alter"="disable"), I
> think that is fine too. What I'm failing at seeing is whether and why
> this approach is more powerful than manipulating CPUID from config file.
> 
> I'm insisting because, if they'd be equivalent, in terms of results, I
> think it's easier, cleaner and more correct to deal with CPUID in xl and
> libxl (automatically or semi-automatically).
> 

SMT is just one aspect of the story that easily surfaces.

In my opinion, if we don't manually create some kind of topology for the
guest, the guest might end up with something weird. For example, if you
have a 2 nodes, 4 sockets, 8 cpus, 8 cores system, you might have

Node 0:
  Socket 0
    CPU0
  Socket 1
    CPU1
Node 1:
  Socket 2
    CPU 3
    CPU 4

which all stems from guest having knowledge of real CPU "ID"s.

And this topology is just wrong, it might just be the one during guest
creation. Xen is free to schedule vcpus to different pcpus, so guest
scheduler will make wrong decision based on errnous information.

That's why I chose to have 1 core : 1 cpu : 1 socket mapping, so that
guest makes no assumption on cache sharing etc. It's suboptimal but
should provide predictable average performance. What do you think?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Elena Ufimtseva

References:
- [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Elena Ufimtseva
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Wei Liu
- Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
  - From: Dario Faggioli

Prev by Date: Re: [Xen-devel] [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
Next by Date: Re: [Xen-devel] [PATCH RFC 17/18] OvmfPkg/XenPvBlkDxe: Add BlockFront client.
Previous by thread: Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
Next by thread: Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.