[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domain creation vs querying free memory (xend and xl)

On Oct 17, 2012, at 1:35 PM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:

> [Sorry, forgot to reply-to-all]
> On Tue, Oct 16, 2012 at 6:51 PM, Dan Magenheimer
> <dan.magenheimer@xxxxxxxxxx> wrote:
>> If you reread my last response with the assumption in mind:
>>  "tmem == an instance of a memory scheduler == grand vision"
>> then does the discussion of the "memory reservation" hypercall
>> make more sense?
> Sort of. :-) Unfortunately, I think it shows a bit of confusion, which
> is perhaps why it was hard to understand.
> But let's go back for a minute to the problem at hand: you're afraid
> of free memory disappearing between a toolstack checking for the
> memory, and the toolstack actually creating the VM.
> There are two ways this could happen:
> 1. Another admin command (perhaps by another administrator) has caused
> the memory to go away -- i.e,. another admin has called "xl create",
> or has instructed a VM to balloon up to a higher amount of memory.
> 2. One of the self-directed processes in the system has allocated the
> memory: a balloon driver has ballooned up, or the swapper has swapped
> something in, or the page sharing daemon has had to un-share pages.
> In the case of #1, I think the right answer to that is, "Don't do
> that."  :-) The admins should co-ordinate with each other about what
> to start where; if they both want to use a bit of memory, that's a
> human interaction problem, not a technological one.  Alternately, if
> we're talking a cloud orchestration layer, the cloud orchestration
> should have an idea how much memory is available on each node, and not
> allow different users to issue commands which would violate those.
> In the case of #2, I think the answer is, "self-directed processes
> should not be allowed to consume free memory without permission from
> the toolstack".  The pager should not increase the memory footprint of
> a VM unless either told to by an admin or a memory controller which
> has been given authority by an admin.  (Yes, memory controller, not
> scheduler -- more on that in another e-mail.)  A VM should be given a
> fixed amount of memory above which the balloon driver cannot go.  The
> page-sharing daemon should have a small amount set aside to handle
> un-sharing requests; but this should be immediately replenished by
> other methods (preferably by ballooning a VM down, or if necessary by
> swapping pages out).  It should not be able to make arbitrarily large
> allocations without permission from the toolstack.

Something that I struggle with here is the notion that we need to extend the 
hypervisor for any aspect of the discussion we've had so far. I just don't see 
that. The toolstack has (or should definitely have) a non-racy view of the 
memory of the host. Reservations are therefore notions the toolstack manages. 

Domains can be cajoled into obedience via the max_pages tweak -- which I 
profoundly dislike. If anything we should change the hypervisor to have a 
"current_allowance" or similar field with a more obvious meaning. The abuse of 
max_pages makes me cringe. Not to say I disagree with its usefulness.

Once you guarantee no "ex machina" entities fudging the view of the memory the 
toolstack has, then all known methods can be bounded in terms of their capacity 
to allocate memory unsupervised.

Note that this implies as well, I don't see the need for a pool of "unshare" 
pages. It's all in the heap. The toolstack ensures there is something set 

I further think the pod cache could be converted to this model. Why have 
specific per-domain lists of cached pages in the hypervisor? Get them back from 
the heap! Obviously places a decoupled requirement of certain toolstack 
features. But allows to throw away a lot of complex code.

My two cents for the new iteration


> I was chatting with Konrad yesterday, and he brought up
> "self-ballooning" VMs, which apparently vonluntarily choose to balloon
> down to *below* their toolstack-dictated balloon target, in order to
> induce Linux to swap some pages out to tmem, and will then balloon up to
> the toolstack-dictated target later.
> It seems to me that the Right Thing in this case is for the toolstack
> to know that this "free" memory isn't really free -- that if your 2GiB
> VM is only using 1.5GiB, you nonetheless don't touch that 0.5GiB,
> because you know it may use it later.  This is what xapi does.
> Alternately, if you don't want to do that accounting, and just want to
> use Xen's free memory to determine if you can start a VM, then you
> could just have your "self-ballooning" processes *not actually free
> the memory*.  That way the free memory would be an accurate
> representation of how much memory is actually present on a system.
> In all of this discussion, I don't see any reason to bring up tmem at
> all (except to note the reason why a VM may balloon down).  It's just
> another area to which memory can be allocated (along with Xen or a
> domain).  It also should not be allowed to allocate free Xen memory to
> itself without being specifically instructed by the toolstack, so it can't
> cause the problem you're talking about.
> Any system that follows the rules I've set above won't have to worry
> about free memory disappearing half-way through domain creation.
> I'm not fundamentally opposed to the idea of an "allocate memory to a
> VM" hypercall; but the arguments adduced to support this seem
> hopelessly confused, which does not bode well for the usefulness or
> maintainability of such a hypercall.
> -George

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.