[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vNUMA for PV guest: kernel and toolstack interaction regarding e820_host=1



On Fri, Feb 06, 2015 at 07:42:15PM +0000, David Vrabel wrote:
> On 06/02/15 19:32, Wei Liu wrote:
> > Hi all
> > 
> > I encounter a problem that I would like to get some advice. It's PV
> > specific because of the P2M manipulation is only required by PV.
> > 
> > Current scheme of memory allocation scheme:
> > 
> > 1. Libxc populate contiguous chunk of pages and fill in initial P2M. The
> >    holes in e820 map are in fact filled with pages.
> > 
> > 2. Guest kernel reads e820 map from Xen and remap pages in e820 holes if
> >    there are holes, update P2M as it sees fit. (That is normally true when
> >    e820_host=1 is set)
> > 
> > This is not very ideal for PV vNUMA, because those pages remapped may
> > end up in the wrong vnode.
> > 
> > What I have in mind is:
> > 
> > 1. Libxc populates pages, but skips e820 holes. The initial P2M is the
> >    final P2M guest sees.
> > 2. Guest kernel skips remapping. But Linux still needs to setup 1-1
> >    mapping for holes.
> > 
> > In order to avoid misconfiguration, we would need to introduce a new
> > feature flag to indicate guest has the ability to skip remapping. Libxc
> > will check that feature flag when building domain.
> > 
> > Does the above scheme make sense?
> 
> I really not keen on any additional complexity in PV guest memory setup.

I agree. I would like to avoid as much complexity as possible. That's
why I ask before implementing anything on guest side.

FWIW the tool stack side already makes sense to me (sans the new feature
flag). It's Linux that I'm not very sure of what to do.

>  Particularly as I don't see a long term future for x86 PV guests.  I
> also don't think we should bake into the Xen ABI the behaviour of one
> particular guest.
> 
> Consider how you can fix this purely in the guest.
> 

Yeah, I guess I can make the memory movement more sensible inside Linux.
That is, take into consideration vNUMA information. There might be other
complex interactions, I will need to get my hands dirty first.

Now my conclusion is that I should proceed with my toolstack side patches
first and fix Linux later.

Wei.

> David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.