Xen project Mailing List

Re: [Xen-devel] Memory Sharing on HVM guests

To: Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>

From: Eric Shelton <eshelton@xxxxxxxxx>

Date: Wed, 14 Aug 2013 14:06:18 -0400

Cc: waitxie <xiexw24@xxxxxxxxx>, Nai Xia <nai.xia@xxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 14 Aug 2013 18:07:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Aug 12, 2013 at 10:58 AM, Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx> wrote: > > > ... The "naive page scanner" as you put it, would scan, find a zero page, > > > nominate for sharing, scan again to prevent TOCTTOU race between end > > > of scan and nomination, and then attempt to share. Note that the sharing > > > hyper call will automatically fail in the case of TOCTTOU between nominate > > > and share actions. Given that the hypervisor will reject the TOCTTOU case, doesn't that eliminate any race hazard? I saw the second scan mentioned in a paper as well, but if it only shrinks, but does not close, the TOCTTOU window, why bother? Are you trying to avoid the expense of a hypercall? If that is an issue, maybe it is worth having versions of xc_memshr_nominate_gfn() and xc_memshr_share_gfns() that handle more than one pair pair pages at a time. > > In general, I probably would find it more useful if (most of) the > > excess pages were available for other domains via sharing. For > > example, maybe it would be better for the excess pages to be part of > > the page cache for a storage domain common to the other domains, > > allowing it to make more holistic caching decisions and hopefully > > already have the more active blocks in its cache - perhaps affording > > some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much > > anything other than Linux?). > That whole description really seems like TMEM. As best as I understand it, this would provide only very limited aspects of TMEM, and perhaps a substitute for ballooning. Nevertheless, maybe there is a win by giving these more limited, but still significant, benefits to non-tmem and/or non-ballooning aware domains. TMEM-like aspect: The common caching storage domain is perhaps something like cleancache. However, this cache may not be readily resizeable to respond to host-wide memory pressures. Balloon-like aspect: If guest operating systems can be persuaded to (1) limit the size of their page cache, and (2) zero out the remaining free pages, zero-page merging might be an alternative to ballooning, and could even avoid issues arising from the ballooning driver not meeting kernel memory demands quickly enough (eg, allocate 8GB of memory, of which 6GB is typically free, zeroed, and shared). However, Linux does not seem to be tunable to provide either (1) or (2), and would have to, and I think can, be patched to do so. Windows provides SetSystemFileCacheSize(), which appears to often be used to reduce VM memory footprint and provides (1), and my understanding is that (2) is default Windows behavior. >> The question was mainly: if I lazily/conservatively overallocate >> excess memory to domains, and hope page sharing will automagically >> minimize their footprint, will the use and dirtying of excess pages by >> the page cache cripple their sharing? If so, I am curious if would >> make sense to cap the page cache, if possible, to say 100MB. I >> suspect total disabling of the page cache is impossible or destroys >> performance. > You just can't do that in Linux. Psge cache is so intimately baked in, there > is no notion of "turning it off" It is not a surprise that it cannot be turned off. Even if it were, I imagine trying to run with uncached block I/O, even if a storage domain held the data in its cache, would drop performance by several orders of magnitude. However, setting a maximum number of pages for the page cache seems doable. There is a clearly identifiable value (or sum of the value across zones) in the kernel that I would like to put a ceiling on, nr_file_pages, which can be seen in /proc/zoneinfo and directly corresponds to the "buffers" and "cached" values indicted in free (specifically, nr_file_pages * 4 = buffers + cached). Perhaps all that is needed is an additional sysctl value, max_nr_file_pages, which would establish a ceiling for nr_file_pages, and to get the page cache to respect it. As the benefit of this is then realized by zeroing out the remaining free pages for merging, zeroing would also ideally occur right after pages were freed. Exactly how this happens seems to be a more delicate operation, since zeroing a page probably takes a nontrivial amount of time as far as VMM is concerned. Does Linux follow an MRU policy for reallocating free pages? It would be helpful not to waste time zeroing the next n pages up for allocation. - Eric _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.