[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domain creation vs querying free memory (xend and xl)

> From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx]
> Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and xl)
> On Mon, Oct 8, 2012 at 2:02 AM, Dan Magenheimer
> <dan.magenheimer@xxxxxxxxxx> wrote:
> > Tmem really is a breakthrough on memory management in a virtualized
> > system.  I realize that many people are in the "if it doesn't
> > work on Windows, I don't care" camp.  And others never thought
> > it would make it into upstream Linux (or don't care because it isn't
> > completely functional in any distros yet... other than Oracle's..
> > but since all parts are now upstream, it will be soon).  But there
> > probably are also many that just don't understand it... I guess I need
> > to work on fixing that.  Any thoughts on how to start?
> Well, I'm sorry to say this, but to start I think you need to work on
> your communication.  I had read this entire thread 2 or 3 times before
> writing my last response; and I have now read this e-mail half a dozen
> times, and I'm still don't have a good idea what it is you're talking
> about.  If I didn't respect you, I would have just given up on the 2nd
> try.
>   :
> If I still haven't understood where you're coming from, then I am
> sorry; but I have tried pretty hard, and I'm not the only one having
> that problem.

Hi George --

Thanks for the honest direct feedback.  I had no idea.  I have
been buried in this memory stuff since April 2008 and it is easy
for me to assume that people understand what I am talking about,
have read everything I've written about it, seen/remember my
presentations etc.  Further, the conversational delays due to timezone
differences and the fact that we all are juggling many different
deliverables makes it difficult to maintain all the context necessary
to drive/converge a complex discussion.

So I am truly sorry and I really appreciate that you've stuck with me.

Let me ponder how to improve, but try to maintain some forward
progress in the interim by continuing this thread.

There are two things being mixed here: (A) The very general concepts
of  how to deal with RAM capacity as a resource and how to best "control"
"sharing" of the resource among virtual machines; and (B) how to solve a
very specific known problem that occurs due to "races" for memory capacity.
Solving (B) requires some assumptions about (A) which is why (A)
keeps coming up.

I'll mark my comments below with (A) and (B) to make it clear
which is being discussed.

> In my summary, I mentioned just 2 things: the problem of domain
> creation, and the solution of a hypercall to allocate a big chunk of
> memory to a domain.  You answered by saying it was a good summary.
> But then you said:
> > I'm just proposing an extension to the
> > existing mechanism and I am quite convinced that the hypervisor must
> > be involved (e.g. a new hypercall) for the extension to work properly.
> Now you're talking about an extension...

This is (B)

Extension == new hypercall.  (It's an extension to the way memory
has previously been allocated by the hypervisor.)

> then you mention a "memory
> scheduler" (which we don't yet have), and say:
> > ...there is inadequate information from
> > any VM to drive automatic memory allocation decisions and, even if
> > there was, it just doesn't scale.
> But you don't say where or who *could* have adequate information;
> which again hints at something else which you have in mind, but you
> haven't actually talked about very explicitly yet.  If you have been
> trying to talk about it, and it wasn't in my summary, why didn't you
> say something about it, instead of saying, "Yes that's right"?  And if
> you haven't talked about it, why are you speaking as though we all
> know already what you're talking about?


My bad.  The premise of tmem (and IMHO the thorn in the side of
all memory capacity management in virtualized systems) is that *nobody*
has adequate information.  The guest OS has some "demand" information,
though not in any externally-communicable form, and the host/hypervisor
has "supply" information.  Tmem uses a small handful of kernel changes
and some hypercalls to tie these together in a surprisingly useful way.

> Furthermore, you say things like this:
> > IMHO, the example you give for asking a memory controller for GiB
> > of memory is equally silly.  Outside of some geek with a handful
> > of VMs on a single machine, there is inadequate information from
> > any VM to drive automatic memory allocation decisions and, even if
> > there was, it just doesn't scale.  It doesn't scale either up, to
> > many VMs across many physical machines, or down, to instantaneous
> > needs of one-page-at-a-time requests for unsharing or for tmem.
> What do you mean, "doesn't scale up or across"?  Why not?  Why is
> there inadequate information inside dom0 for a toolstack-based memory
> controller?  And if there's not enough information there, who *does*
> have the information?  It's just a bunch of vague assertions with no
> justification and no alternative proposed.  It doesn't bring any light
> to the discussion (which is no doubt why the thread has died without
> conclusion).


There is inadequate information period.  OS's have forever been
designed to manage a fixed amount of RAM, not to communicate very
well about if and when the OS needs more RAM (and how much) or can
get by with less RAM (and how much).  So any external "memory controller"
is (IMHO) doomed to failure, limited to approximations based on pieces of
guest-OS-externally-visible usually-out-of-date information collected
at a relatively low frequency.  Collecting/analyzing/acting-on the
information across hundreds/thousands of guests is very difficult
(doesn't "scale up"), collecting/analyzing/acting-on the information
across hundreds of machines -- each with hundreds/thousands of
guests has exponential communication and bin-packing problems
(doesn't scale "across") and, if the memory-demand is a high-frequency
stream of single pages (i.e. with page-unsharing), sampling by
the memory controller can't possibly keep up (doesn't "scale down").

This is only slightly better than a bunch of vague assertions, but if
you disagree, let's take it down a level in a separate thread.

My proposed alternative is tmem. which is why it may appear that I
haven't proposed anything... tmem already exists today.

> Nor does saying "see above" and "see below", when "above" and "below"
> are still equally unenlightening.

Oops, sorry. :-}  Just trying to avoid repeating myself.

> Maybe your grand designs for a "memory scheduler", where memory pages
> hop back and forth at millisecond quanta based on instantaneous data,
> between page sharing, paging, tmem, and so on, is a good one.  But
> that's not what we have now.


Tmem *is* essentially a memory scheduler.  A grand design is implemented,
works, and all the parts are upstream in open source.

> And that's not even what you're trying
> to promote.  Instead, you're trying to push a single hypercall that
> you think will be necessary for such a scheduler.


Strangely, tmem doesn't really need this hypercall.  It already has
a solution working in xm create called "tmem freeze/thaw".  But this
solution is a half-assed very heavy hammer.

The single "memory reservation" hypercall is intended to help solve a
known problem (IanJ said early in this thread: "This is a real problem")
with any environment where the amount of RAM used by a guest can change
dynamically without the knowledge of a not-in-hypervisor "memory controller",
and the toolstack then wishes to launch a new domain.  The problem can even
occur with multiple toolstack threads simultaneously launching domains.

After further thought, it appeared that the "memory reservation" hypercall
also eliminates the need for the half-assed tmem freeze/thaw as well.

> Doesn't it make sense to *first* talk about your grand vision and come
> up with a reasonable plan for it, *then* propose an implementation?
> If in the course of your 15-patch series introducing a "memory
> scheduler", you also introduce a "reservation" hypercall, then
> everyone can see exactly what it accomplishes, and actually see if
> it's necessary, or if some other design would work better.
> Does that make sense?

If you reread my last response with the assumption in mind:

  "tmem == an instance of a memory scheduler == grand vision"

then does the discussion of the "memory reservation" hypercall
make more sense?

Thanks again for the pointed communication feedback.  Hopefully this
is a bit better and I will continue to ponder more communication


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.