[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domain creation vs querying free memory (xend and xl)

> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx]
> Sent: Friday, October 05, 2012 5:40 AM
> To: Dan Magenheimer
> Cc: Andres Lagar-Cavilla; Ian Campbell; Tim (Xen.org); Olaf Hering; Keir 
> (Xen.org); Konrad Wilk; Kurt
> Hackel; Ian Jackson; xen-devel@xxxxxxxxxxxxx; George Shuklin; Dario Faggioli
> Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and xl)

Hi George --

Thanks for your thoughts!
> On 04/10/12 17:54, Dan Magenheimer wrote:
> >>
> > Scanning through the archived message I am under the impression
> > that the focus is on a single server... i.e. "punt if actor is
> > not xl", i.e. it addressed "balloon-to-fit" and only tries to avoid
> > stepping on other memory overcommit technologies.  That makes it
> > almost orthogonal, I think, to the problem I originally raised.
> No, the idea was to allow the flexibility of different actors in
> different situations.  The plan was to start with a simple actor, but to
> add new ones as necessary.  But on reflection, it seems like the whole
> "actor" thing was actually something completely separate to what we're
> talking about here.  The idea behind the actor (IIRC) was that you could
> tell the toolstack, "Make VM A use X amount of host memory"; and the
> actor would determine the best way to do that -- either by only
> ballooning, or ballooning first and then swapping.  But it doesn't
> decide how to get the value X.

OK, so if the actor stuff is orthogonal, let's go back to the
original problem.  We do want to ensure the solution doesn't _break_
the actor idea... but IMHO any assumption that there is an actor
that can always sufficiently "control" memory allocation is suspect.

> This thread has been very hard to follow for some reason, so let me see
> if I can understand everything:
> * You are concerned about being able to predictably start VMs in the
> face of:
>   - concurrent requests, and
>   - dynamic memory technologies (including PoD, ballooning, paging, page
> sharing, and tmem)
> Any of which may change the amount of free memory between the time a
> decision is made and the time memory is actually allocated.
> * You have proposed a hypervisor-based solution that allows the
> toolstack to "reserve" a specific amount of memory to a VM that will not
> be used for something else; this allocation is transactional -- it will
> either completely succeed, or completely fail, and do it quickly.
> Is that correct?

Yes, good summary.

> The problem with that solution, it seems to me, is that the hypervisor
> does not (and I think probably should not) have any insight into the
> policy for allocating or freeing memory as a result of other activities,
> such as ballooning or page sharing.  Suppose someone were ballooning
> down domain M to get 8GiB in order to start domain A; and at some point
> , another process looks and says, "Oh look, there's 4GiB free, that's
> enough to start domain B" and asks Xen to reserve that memory.  Xen has
> no way of knowing that the memory freed by domain M was "earmarked" for
> domain A, and so will happily give it to domain B, causing domain A's
> creation to fail (potentially).

I agree completely that the hypervisor shouldn't have any insight into
the _policy_ (though see below).  I'm just proposing an extension to the
existing mechanism and I am quite convinced that the hypervisor must
be involved (e.g. a new hypercall) for the extension to work properly.

In your example, the "someone" ballooning down domain M to get 8GiB
for domain M would need somehow to "reserve" the memory for domain M.
I didn't foresee the use of the proposed reservation mechanism beyond
domain creation, but it could probably be used for large ballooning
quantities as well.

> So it seems like we need to have the idea of a memory controller -- one
> central process (per host, as you say) that would know about all of the
> knobs -- ballooning, paging, page sharing, tmem, whatever -- that could
> be in charge of knowing where all the memory was coming from and where
> it was going.  So if xl wanted to start a new VM, it can ask the memory
> controller for 3GiB, and the controller could decide, "I'll take 1GiB
> from domain M and 2 from domain N, and give it to the new domain", and
> respond when it has the memory that it needs.  Similarly, it can know
> that it should try to keep X megabytes for un-sharing of pages, and it
> can be responsible for freeing up more memory if that memory becomes
> exhausted.

First, let me quibble about the term you used.  It's especially
important for you, George, because I know your previous Xen contributions.

IMHO, we are not talking about a "memory controller", we are talking
about a "memory scheduler".  In a CPU scheduler, one would never
assume that all demands for CPU time should be reviewed and granted
by some userland process in dom0 (and certainly not by some grand
central data center manager).  That would be silly.  Instead, we
provide some policy parameters and let each hypervisor make intelligent
dynamic decisions thousands of times every second based on those parameters.

IMHO, the example you give for asking a memory controller for GiB
of memory is equally silly.  Outside of some geek with a handful
of VMs on a single machine, there is inadequate information from
any VM to drive automatic memory allocation decisions and, even if
there was, it just doesn't scale.  It doesn't scale either up, to
many VMs across many physical machines, or down, to instantaneous
needs of one-page-at-a-time requests for unsharing or for tmem.

(Also see my previous comments to Tim about memory-overcommit-by-
undercommit:  There isn't sufficient information to size any
emergency buffer for unsharing either... too big and you waste
memory, too little and it doesn't solve the underlying problem.)

> At the moment, the administrator himself (or the cloud orchestration
> layer) needs to be his own memory controller; that is, he needs to
> manually decide if there's enough free memory to start a VM; if there's
> not, he needs to figure out how to get that memory (either by ballooning
> or swapping).  Ballooning and swapping are both totally under his
> control; the only thing he doesn't control is the unsharing of pages.
> But as long as there was a way to tell the page sharing daemon not to
> allocate an amount of free memory, then this
> "administrator-as-memory-controller" should work just fine.
> Does that make sense?  Or am I still confused? :-)

It mostly makes sense until you get to host-swapping/unsharing,
see comments above.  And tmem takes the "doesn't control" to a
whole new level.  Meaning tmem (IMHO) completely eliminates
the possibility of a "memory controller" and begs for a
"memory scheduler".

Tmem really is a breakthrough on memory management in a virtualized
system.  I realize that many people are in the "if it doesn't
work on Windows, I don't care" camp.  And others never thought
it would make it into upstream Linux (or don't care because it isn't
completely functional in any distros yet... other than Oracle's..
but since all parts are now upstream, it will be soon).  But there
probably are also many that just don't understand it... I guess I need
to work on fixing that.  Any thoughts on how to start?

In any case, though the reservation proposal is intended to cover
tmem as well, I think it is still needed for page-sharing and
domain-creation "races".


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.