[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 21/21] xl: vNUMA support



On Thu, Jan 29, 2015 at 11:10:39AM +0000, Ian Campbell wrote:
> On Wed, 2015-01-28 at 22:52 +0000, Wei Liu wrote:
> > > guests, is preballooning allowed there too?
> > 
> > I need to check PV boot sequence to have a definite answer.
> > 
> > Currently memory allocation in libxc only deals with a chunk of
> > contiguous memory. Not sure if I change that will break some assumptions
> > that guest kernel makes.
> 
> Please do check, and if it doesn't work today we really ought to have
> plan on how to integrate in the future, in case (as seems likely) it
> requires cooperation between tools and kernel -- so we can think about
> steps now to make it easier on ourselves then...
> 

I only look at Linux kernel so this is very Linux centric -- though I
wonder if there are any other PV kernels in the wild.

Libxc allocates contiguous chunk of memory and then guest kernel will
remap memory inside a non-ram region. (This leads me to think I need to
rewrite the patch that allocates memory to also take into account memory
hole, but that is another matter)

Speaking of pre-ballooned PV guest, in theory if we still allocate
memory in contiguous trunk, it should work. But something more complex
like partially populating multiple vnodes might not, because code in
Linux kernel assumes that those pre-ballooned pages are appended to the
end of populated memory.

> > > > +=item B<vnuma_pnode_map=[ NUMBER, NUMBER, ... ]>
> > > > +
> > > > +Specifiy which physical NUMA node a specific virtual NUMA node maps 
> > > > to. The
> > > 
> > > "Specify" again.
> > > 
> > > > +number of elements in this list should be equal to the number of 
> > > > virtual
> > > > +NUMA nodes defined in B<vnuma_memory=>.
> > > 
> > > Would it make sense to instead have a single array or e.g. "NODE:SIZE"
> > > or something?
> > > 
> > 
> > Or "PNODE:SIZE:VCPUS"?
> 
> That seems plausible.
> 
> One concern would be future expansion, perhaps foo=bar,baz=wibble?
> 

I'm fine with that. We can use nested list

vnuma = [ [node=0,size=1024,vcpus=...] [ ...] ]

> > > > +=item B<vnuma_vdistance=[ [NUMBER, ..., NUMBER], [NUMBER, ..., 
> > > > NUMBER], ... ]>
> > > > +
> > > > +Two dimensional list to specify distances among nodes.
> > > > +
> > > > +The number of elements in the first dimension list equals the number 
> > > > of virtual
> > > > +nodes. Each element in position B<i> is a list that specifies the 
> > > > distances
> > > > +from node B<i> to other nodes.
> > > > +
> > > > +For example, for a guest with 2 virtual nodes, user can specify:
> > > > +
> > > > +  vnuma_vdistance = [ [10, 20], [20, 10] ]
> > > 
> > > Any guidance on how a user should choose these numbers?
> > > 
> > 
> > I think using the number from numactl is good enough.
> 
> Worth mentioning in the docs I think.
> 
> > > Do we support a mode where something figures this out based on the
> > > underlying distances between the pnode to which a vnode is assigned?
> > > 
> > 
> > Dario is working on that.
> 
> I thought he was working on automatic numa placement (i.e. figuring out
> the best set of pnodes to map the vnodes to), whereas what I was
> suggesting was that given the user has specified a vnode->pnode mapping
> it should be possible to construct a distances table pretty trivially
> from that. Is Dario working on that too?
> 

Right. That's trivial inside libxl.

What I meant was Dario was about to touch all these automation stuffs it
might be trivial for him to just do it all in one go. Of course if he
has not done that I can add the logic myself.

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.