Xen project Mailing List

Re: [Xen-devel] domain creation vs querying free memory (xend and xl)

To: Tim Deegan <tim@xxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Thu, 4 Oct 2012 09:36:25 -0700 (PDT)

Cc: Olaf Hering <olaf@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Kurt Hackel <kurt.hackel@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, George Shuklin <george.shuklin@xxxxxxxxx>, Dario Faggioli <raistlin@xxxxxxxx>, Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>

Delivery-date: Thu, 04 Oct 2012 16:37:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

> From: Tim Deegan [mailto:tim@xxxxxxx] > Sent: Thursday, October 04, 2012 4:07 AM > To: Dan Magenheimer > Cc: Olaf Hering; Keir Fraser; Konrad Wilk; George Dunlap; Kurt Hackel; Ian > Jackson; xen- > devel@xxxxxxxxxxxxx; George Shuklin; Dario Faggioli; Andres Lagar-Cavilla > Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and xl) Hi Tim -- Good discussion! > At 14:56 -0700 on 02 Oct (1349189817), Dan Magenheimer wrote: > > > AIUI xapi uses the domains' maximum allocations, centrally controlled, > > > to place an upper bound on the amount of guest memory that can be in > > > use. Within those limits there can be ballooning activity. But TBH I > > > don't know the details. > > > > Yes, that's the same as saying there is no memory-overcommit. > > I'd say there is - but it's all done by ballooning, and it's centrally > enforced by lowering each domain's maxmem to its balloon target, so a > badly behaved guest can't balloon up and confuse things. While I agree this conceivably is a form of memory overcommit, I discarded it as a workable overcommit solution in 2008. The short reason is: EVERY guest is badly behaved in that they all want to suck up as much memory as possible and they all need it _now_. This observation actually is what led to tmem. > > The original problem occurs only if there are multiple threads > > of execution that can be simultaneously asking the hypervisor > > to allocate memory without the knowledge of a single centralized > > "controller". > > Absolutely. > > > Tmem argues that doing "memory capacity transfers" at a page granularity > > can only be done efficiently in the hypervisor. This is true for > > page-sharing when it breaks a "share" also... it can't go ask the > > toolstack to approve allocation of a new page every time a write to a shared > > page occurs. > > > > Does that make sense? > > Yes. The page-sharing version can be handled by having a pool of > dedicated memory for breaking shares, and the toolstack asynchronously > replenish that, rather than allowing CoW to use up all memory in the > system. This is really just overcommit-by-undercommit. IMHO, any attempt to set aside a chunk of memory for a specific purpose just increases memory pressure on all the other memory users. Nobody has any clue a priori what the size of that dedicated memory pool should be; if it is too big, you are simply wasting memory and if it is too small, you haven't solved the real problem. Workloads vary too dramatically, instantaneously, and unpredictably across time in their need for memory. Sharing makes it even more complex. > > (rough proposed design re-attached below) > > Thanks for that. It describes a sensible-looking hypervisor interface, > but my question was really: what should xl do, in the presence of > ballooning, sharing, paging and tmem, to > - decide whether a VM can be started at all; > - control those four systems to shuffle memory around; and > - resolve races sensibly to avoid small VMs deferring large ones. > (AIUI, xl already has some logic to handle the case of balloon-to-fit.) > > The second of those three is the interesting one. It seems to me that > if the tools can't force all other actors to give up memory (and not > immediately take it back) then they can't guarantee to be able to start > a new VM, even with the new reservation hypercalls. I agree the second one is interesting but the only real solution is for the controller to be an oracle for all the guests. That makes it less interesting to me, so balloon-to-fit is less interesting to me (even if it is the only overcommit option for legacy guests). IMHO, the problem is the same as for guest OS's that compute pi in the kernel when there are no runnable tasks, i.e. a virtualization environment is sometimes forced to partition resources, not virtualize those guests. IOW, don't overcommit "unenlightened" legacy guests. [1] So I don't think the design I wrote up solves the second one, nor do I think it makes it any worse. The design I wrote up is intended to solve the first and third. I _think_ the reservation-transaction model described (X1 and X2) should work for libxl, in the presence of ballooning, sharing, paging, and tmem. And it neither helps nor hurts balloon-to-fit. Given that, can you shoot holes in the design? Or are there parts that aren't clear? Or (admitting that I am a libxl idiot) is it unworkable for xl/libxl? Thanks, Dan [1] By "unenlightened" here, I mean guests that are still under the notion that they "own" all of a fixed amount of RAM. A balloon driver makes them "semi-enlightened" :-) > > > From: Dan Magenheimer > > > Sent: Monday, October 01, 2012 2:04 PM > > > : > > > : > > > Back to design brainstorming: > > > > > > The way I am thinking about it, the tools need to be involved > > > to the extent that they would need to communicate to the > > > hypervisor the following facts (probably via new hypercall): > > > > > > X1) I am launching a domain X and it is eventually going to > > > consume up to a maximum of N MB. Please tell me if > > > there is sufficient RAM available AND, if so, reserve > > > it until I tell you I am done. ("AND" implies transactional > > > semantics) > > > X2) The launch of X is complete and I will not be requesting > > > the allocation of any more RAM for it. Please release > > > the reservation, whether or not I've requested a total > > > of N MB. > > > > > > The calls may be nested or partially ordered, i.e. > > > X1...Y1...Y2...X2 > > > X1...Y1...X2...Y2 > > > and the hypervisor must be able to deal with this. > > > > > > Then there would need to be two "versions" of "xm/xl free". > > > We can quibble about which should be the default, but > > > they would be: > > > > > > - "xl --reserved free" asks the hypervisor how much RAM > > > is available taking into account reservations > > > - "xm --raw free" asks the hypervisor for the instantaneous > > > amount of RAM unallocated, not counting reservations > > > > > > When the tools are not launching a domain (that is there > > > has been a matching X2 for all X1), the results of the > > > above "free" queries are always identical. > > > > > > So, IanJ, does this match up with the design you were thinking > > > about? > > > > > > Thanks, > > > Dan > > > > > > [1] I think the core culprits are (a) the hypervisor accounts for > > > memory allocation of pages strictly on a first-come-first-served > > > basis and (b) the tools don't have any form of need-this-much-memory > > > "transaction" model _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.