[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature

> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx]
> Subject: Re: Proposed new "memory capacity claim" hypercall/feature
> On 30/10/12 15:43, Dan Magenheimer wrote:
> > a) Truly free memory (each free page is on the hypervisor free list)
> > b) Freeable memory ("ephmeral" memory managed by tmem)
> > c) Owned memory (pages allocated by the hypervisor or for a domain)
> >
> > The sum of these three is always a constant: The total number of
> > RAM pages in the system.  However, when tmem is active, the values
> > of all _three_ of these change constantly.  So if at the start of a
> > domain launch, the sum of free+freeable exceeds the intended size
> > of the domain, the domain allocation/launch can start.

> (And please don't start another rant about the bold new world of peace
> and love.  Give me a freaking *technical* answer.)

<grin> /Me removes seventies-style tie-dye tshirt with peace logo
and sadly withdraws single daisy previously extended to George.

> Why free+freeable, rather than just "free"?

A free page is a page that is not used for anything at all.
It is on the hypervisor's free list.  A freeable page contains tmem
ephemeral data stored on behalf of a domain (or, if dedup'ing
is enabled, on behalf of one or more domains).  More specifically
for a tmem-enabled Linux guest, a freeable page contains a clean
page cache page that the Linux guest OS has asked the hypervisor
(via the tmem ABI) to hold if it can for as long as it can.
The specific clean page cache pages are chosen and the call is
done on the Linux side via "cleancache".

So, when tmem is working optimally, there are few or no free
pages and many many freeable pages (perhaps half of physical
RAM or more).

Freeable pages across all tmem-enabled guests are kept in a single
LRU queue.  When a request is made to the hypervisor allocator for
a free page and its free list is empty, the allocator will force
tmem to relinquish an ephemeral page (in LRU order).  Because
this is entirely up to the hypervisor and can happen at any
time, freeable pages are not counted as "owned" by a domain but
still have some value to a domain.

So, in essence, a "free" page has zero value and a "freeable"
page has a small, but non-zero value that decays over time.
So it's useful for a toolstack to know both quantities.

(And, since this thread has gone in many directions, let me
reiterate that all of this has been working in the hypervisor
since 4.0 in 2009, and cleancache in Linux since mid-2011.)
> >   But then
> > if "owned" increases enough, there may no longer be enough memory
> > and the domain launch will fail.
> Again, "owned" would not increase at all if the guest weren't handing
> memory back to Xen.  Why is that necessary, or even helpful?

The guest _is_ handing memory back to Xen.  This is the other half
of the tmem functionality, persistent pages.

Answering your second question is going to require a little more

Since nobody, not even the guest kernel, can guess the future
needs of its workload, there are two choices: (1) allocate enough
RAM so that the supply always exceeds max-demand, or (2) aggressively
reduce RAM to a reasonable guess for a target and prepare for the
probability that, sometimes, available RAM won't be enough.  Tmem does
choice #2; self-ballooning aggressively drives RAM (or "current memory"
as the hypervisor sees it) to a target level: in Linux, to Committed_AS
modified by a formula similar to the one Novell derived for a minimum
ballooning safety level.  The target level changes constantly, but the
selfballooning code samples and adjusts only periodically.  If, during
the time interval between samples, memory demand spikes, Linux
has a memory shortage and responds as it must, namely by swapping.

The frontswap code in Linux "intercepts" this swapping so that,
in most cases, it goes to a Xen tmem persistent pool instead of
to a (virtual or physical) swap disk.  Data in persistent pools,
unlike ephemeral pools, are guaranteed to be maintained by the
hypervisor until the guest invalidates it or until the guest dies.
As a result, pages allocated for persistent pools increase the count
of pages "owned" by the domain that requested the pages, until the guest
explicitly invalidates them (or dies).  The accounting also ensures
that malicious domains can't absorb memory beyond the toolset-specified
limit ("maxmem").

Note that, if compression is enabled, a domain _may_ "logically"
exceed maxmem, as long as it does not physically exceed it.

(And, again, all of this too has been in Xen since 4.0 in 2009,
and selfballooning has been in Linux since mid-2011, but frontswap
finally was accepted into Linux earlier in 2012.)

Ok, George, does that answer your questions, _technically_?  I'll
be happy to answer any others.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.