Xen project Mailing List

Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature

> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx] > Subject: Re: Proposed new "memory capacity claim" hypercall/feature > > On 30/10/12 15:43, Dan Magenheimer wrote: > > a) Truly free memory (each free page is on the hypervisor free list) > > b) Freeable memory ("ephmeral" memory managed by tmem) > > c) Owned memory (pages allocated by the hypervisor or for a domain) > > > > The sum of these three is always a constant: The total number of > > RAM pages in the system. However, when tmem is active, the values > > of all _three_ of these change constantly. So if at the start of a > > domain launch, the sum of free+freeable exceeds the intended size > > of the domain, the domain allocation/launch can start. > (And please don't start another rant about the bold new world of peace > and love. Give me a freaking *technical* answer.) <grin> /Me removes seventies-style tie-dye tshirt with peace logo and sadly withdraws single daisy previously extended to George. > Why free+freeable, rather than just "free"? A free page is a page that is not used for anything at all. It is on the hypervisor's free list. A freeable page contains tmem ephemeral data stored on behalf of a domain (or, if dedup'ing is enabled, on behalf of one or more domains). More specifically for a tmem-enabled Linux guest, a freeable page contains a clean page cache page that the Linux guest OS has asked the hypervisor (via the tmem ABI) to hold if it can for as long as it can. The specific clean page cache pages are chosen and the call is done on the Linux side via "cleancache". So, when tmem is working optimally, there are few or no free pages and many many freeable pages (perhaps half of physical RAM or more). Freeable pages across all tmem-enabled guests are kept in a single LRU queue. When a request is made to the hypervisor allocator for a free page and its free list is empty, the allocator will force tmem to relinquish an ephemeral page (in LRU order). Because this is entirely up to the hypervisor and can happen at any time, freeable pages are not counted as "owned" by a domain but still have some value to a domain. So, in essence, a "free" page has zero value and a "freeable" page has a small, but non-zero value that decays over time. So it's useful for a toolstack to know both quantities. (And, since this thread has gone in many directions, let me reiterate that all of this has been working in the hypervisor since 4.0 in 2009, and cleancache in Linux since mid-2011.) > > But then > > if "owned" increases enough, there may no longer be enough memory > > and the domain launch will fail. > > Again, "owned" would not increase at all if the guest weren't handing > memory back to Xen. Why is that necessary, or even helpful? The guest _is_ handing memory back to Xen. This is the other half of the tmem functionality, persistent pages. Answering your second question is going to require a little more background. Since nobody, not even the guest kernel, can guess the future needs of its workload, there are two choices: (1) allocate enough RAM so that the supply always exceeds max-demand, or (2) aggressively reduce RAM to a reasonable guess for a target and prepare for the probability that, sometimes, available RAM won't be enough. Tmem does choice #2; self-ballooning aggressively drives RAM (or "current memory" as the hypervisor sees it) to a target level: in Linux, to Committed_AS modified by a formula similar to the one Novell derived for a minimum ballooning safety level. The target level changes constantly, but the selfballooning code samples and adjusts only periodically. If, during the time interval between samples, memory demand spikes, Linux has a memory shortage and responds as it must, namely by swapping. The frontswap code in Linux "intercepts" this swapping so that, in most cases, it goes to a Xen tmem persistent pool instead of to a (virtual or physical) swap disk. Data in persistent pools, unlike ephemeral pools, are guaranteed to be maintained by the hypervisor until the guest invalidates it or until the guest dies. As a result, pages allocated for persistent pools increase the count of pages "owned" by the domain that requested the pages, until the guest explicitly invalidates them (or dies). The accounting also ensures that malicious domains can't absorb memory beyond the toolset-specified limit ("maxmem"). Note that, if compression is enabled, a domain _may_ "logically" exceed maxmem, as long as it does not physically exceed it. (And, again, all of this too has been in Xen since 4.0 in 2009, and selfballooning has been in Linux since mid-2011, but frontswap finally was accepted into Linux earlier in 2012.) Ok, George, does that answer your questions, _technically_? I'll be happy to answer any others. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.