[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Memory Sharing on HVM guests

On Mon, Aug 12, 2013 at 10:58 AM, Andres Lagar-Cavilla
<andreslc@xxxxxxxxxxxxxx> wrote:
> > > ... The "naive page scanner" as you put it, would scan, find a zero page,
> > > nominate for sharing, scan again to prevent TOCTTOU race between end
> > > of scan and nomination, and then attempt to share. Note that the sharing
> > > hyper call will automatically fail in the case of TOCTTOU between nominate
> > > and share actions.

Given that the hypervisor will reject the TOCTTOU case, doesn't that
eliminate any race hazard?  I saw the second scan mentioned in a paper
as well, but if it only shrinks, but does not close, the TOCTTOU
window, why bother?  Are you trying to avoid the expense of a
hypercall?  If that is an issue, maybe it is worth having versions of
xc_memshr_nominate_gfn() and xc_memshr_share_gfns() that handle more
than one pair pair pages at a time.

> > In general, I probably would find it more useful if (most of) the
> > excess pages were available for other domains via sharing.  For
> > example, maybe it would be better for the excess pages to be part of
> > the page cache for a storage domain common to the other domains,
> > allowing it to make more holistic caching decisions and hopefully
> > already have the more active blocks in its cache - perhaps affording
> > some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much
> > anything other than Linux?).
> That whole description really seems like TMEM.

As best as I understand it, this would provide only very limited
aspects of TMEM, and perhaps a substitute for ballooning.
Nevertheless, maybe there is a win by giving these more limited, but
still significant, benefits to non-tmem and/or non-ballooning aware

TMEM-like aspect: The common caching storage domain is perhaps
something like cleancache.  However, this cache may not be readily
resizeable to respond to host-wide memory pressures.

Balloon-like aspect: If guest operating systems can be persuaded to
(1) limit the size of their page cache, and (2) zero out the remaining
free pages, zero-page merging might be an alternative to ballooning,
and could even avoid issues arising from the ballooning driver not
meeting kernel memory demands quickly enough (eg, allocate 8GB of
memory, of which 6GB is typically free, zeroed, and shared).  However,
Linux does not seem to be tunable to provide either (1) or (2), and
would have to, and I think can, be patched to do so.  Windows provides
SetSystemFileCacheSize(), which appears to often be used to reduce VM
memory footprint and provides (1), and my understanding is that (2) is
default Windows behavior.

>> The question was mainly: if I lazily/conservatively overallocate
>> excess memory to domains, and hope page sharing will automagically
>> minimize their footprint, will the use and dirtying of excess pages by
>> the page cache cripple their sharing?  If so, I am curious if would
>> make sense to cap the page cache, if possible, to say 100MB.  I
>> suspect total disabling of the page cache is impossible or destroys
>> performance.
> You just can't do that in Linux. Psge cache is so intimately baked in, there 
> is no notion of "turning it off"

It is not a surprise that it cannot be turned off.  Even if it were, I
imagine trying to run with uncached block I/O, even if a storage
domain held the data in its cache, would drop performance by several
orders of magnitude.

However, setting a maximum number of pages for the page cache seems
doable.  There is a clearly identifiable value (or sum of the value
across zones) in the kernel that I would like to put a ceiling on,
nr_file_pages, which can be seen in /proc/zoneinfo and directly
corresponds to the "buffers" and "cached" values indicted in free
(specifically, nr_file_pages * 4 = buffers + cached).  Perhaps all
that is needed is an additional sysctl value, max_nr_file_pages, which
would establish a ceiling for nr_file_pages, and to get the page cache
to respect it.  As the benefit of this is then realized by zeroing out
the remaining free pages for merging, zeroing would also ideally occur
right after pages were freed.  Exactly how this happens seems to be a
more delicate operation, since zeroing a page probably takes a
nontrivial amount of time as far as VMM is concerned.  Does Linux
follow an MRU policy for reallocating free pages?  It would be helpful
not to waste time zeroing the next n pages up for allocation.

- Eric

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.