 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: [PATCH v2] [linux-3.0+ for xen] tmem: self-ballooning and frontswap-selfshrinking
 On Mon, Jun 20, 2011 at 03:57:18PM -0700, Dan Magenheimer wrote: > Here is V2 of the Xen tmem selfballooning/frontswap-selfshrinking > patch for review. Many thanks to the reviewers of the V1 > for your excellent comments and feedback. > > Major changes since V1 (most suggested by Ian Campbell): > - nearly all code moved from *balloon.c to xen-selfballoon.c > - accounting fixes found by Daniel Kiper > - alternate access to vm_committed_as; now no VM core changes > - macros for repetitive sysfs code > - rebased to 3.0-rc1 > - passes checkpatch.pl :-} > > This code complements the cleancache and frontswap patchsets to > optimize support for Xen Transcendent Memory. The policy it > implements is rudimentary, so will almost certainly need to > improve over time, but it does work well enough today. > > Thanks, > Dan > > P.S. Anybody who actually uses this patch is encouraged to read > http://oss.oracle.com/projects/tmem/dist/documentation/internals/linuxpatch > from which the following brief documentation was extracted, > which will probably be the git commit comment. > > = > > [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and > frontswap-selfshrinking Get rid of the 'linux-2.6.39..'. > > Selfballooning creates memory pressure by using the Xen balloon driver > to decrease and increase available kernel memory, tracking a target value > of "Committed_AS" (see /proc/meminfo). Sysfs tunables are provided to > adjust the frequency at which memory size is adjusted, the rate at > which ballooning is used to approach the target, and to disable > selfballooning completely. As of 100827, selfballooning runs in a > separate kernel thread, "selfballooning". As of 100827? That is quite a precise number :-) Is it necccessary to even include this in the patch? It is not like this patch had been in the upstream kernel since 2010. Why not just say: The kernel thread runs as "seflballooning". > > Frontswap shrinking accounts for the fact that pages swapped to disk > may sit on disk for a very long time, even if very rarely used or if > the kernel knows they will never be used again. This is because > the kernel need not reclaim disk space because the disk space costs > very little and can be overwritten when necessary. When the same > pages are in frontswap, however, they are taking up valuable RAM > real estate. So when frontswap activity is otherwise stable and the > guest kernel is not under memory pressure, the frontswap shrinker > removes some pages from frontswap and returns them to kernel memory. OK, and who pokes the shrinker? Is it this patch? If so you might want to mention that? IF not this thread, is the thread just pulling some levers to cause the shrinker to activate? If it is the first, perhaps a better name is 'shrinker' for the thread? Or 'kick_shrinker' since it would actually use an shrinker API that is widely implemented by other subsystems too. > Sysfs tunables are provided to adjust the frequency of shrinking > opportunities and the shrinking rate, and to create more "inertia". >From the description it sounds like this could be a generic shrinker thread that is not limited to only frontswap usage pattern? What are the other "API's that kick the shrinker into action? Would it be worth mentioning them and why they are not suitable or not good enough for this? > > Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> > > Diffstat: > drivers/xen/Kconfig | 10 + > drivers/xen/Makefile | 1 > drivers/xen/xen-balloon.c | 2 > drivers/xen/xen-selfballoon.c | 350 > ++++++++++++++++++++++++++++++++++++++++++ > include/xen/balloon.h | 10 + > include/xen/tmem.h | 4 > 6 files changed, 377 insertions(+) > > diff -Napur -X linux-3.0-rc1/Documentation/dontdiff -x > scripts/selinux/mdp/mdp linux-3.0-rc1-frontswap/drivers/xen/Kconfig > linux-3.0-rc1-frontswap-selfballoon/drivers/xen/Kconfig > --- linux-3.0-rc1-frontswap/drivers/xen/Kconfig 2011-05-31 > 17:09:07.606910896 -0600 > +++ linux-3.0-rc1-frontswap-selfballoon/drivers/xen/Kconfig 2011-06-16 > 11:39:14.235875542 -0600 > @@ -9,6 +9,16 @@ config XEN_BALLOON > the system to expand the domain's memory allocation, or alternatively > return unneeded memory to the system. > > +config XEN_SELFBALLOONING > + bool "dynamically self-balloon kernel memory to target" > + depends on XEN && XEN_BALLOON && CLEANCACHE > + default y Heheheh.. That needs to be 'n'. But we can fix that in the next posting. > + help > + Self-ballooning dynamically balloons available kernel memory driven > + by the current usage of anonymous memory ("committed AS") and > + controlled by various sysfs-settable parameters. May be overridden > + by the noselfballooning kernel boot parameter > + > config XEN_SCRUB_PAGES > bool "Scrub pages before returning them to system" > depends on XEN_BALLOON > diff -Napur -X linux-3.0-rc1/Documentation/dontdiff -x > scripts/selinux/mdp/mdp linux-3.0-rc1-frontswap/drivers/xen/Makefile > linux-3.0-rc1-frontswap-selfballoon/drivers/xen/Makefile > --- linux-3.0-rc1-frontswap/drivers/xen/Makefile 2011-05-31 > 17:09:57.006875306 -0600 > +++ linux-3.0-rc1-frontswap-selfballoon/drivers/xen/Makefile 2011-06-16 > 11:39:06.123852289 -0600 > @@ -8,6 +8,7 @@ obj-$(CONFIG_BLOCK) += biomerge.o > obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o > obj-$(CONFIG_XEN_XENCOMM) += xencomm.o > obj-$(CONFIG_XEN_BALLOON) += xen-balloon.o > +obj-$(CONFIG_XEN_SELFBALLOONING) += xen-selfballoon.o > obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o > obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o > obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o > diff -Napur -X linux-3.0-rc1/Documentation/dontdiff -x > scripts/selinux/mdp/mdp linux-3.0-rc1-frontswap/drivers/xen/xen-balloon.c > linux-3.0-rc1-frontswap-selfballoon/drivers/xen/xen-balloon.c > --- linux-3.0-rc1-frontswap/drivers/xen/xen-balloon.c 2011-05-29 > 18:43:36.000000000 -0600 > +++ linux-3.0-rc1-frontswap-selfballoon/drivers/xen/xen-balloon.c > 2011-06-20 15:37:30.405798859 -0600 > @@ -98,6 +98,8 @@ static int __init balloon_init(void) > > register_balloon(&balloon_sysdev); > > + register_xen_selfballooning(&balloon_sysdev); > + > target_watch.callback = watch_target; > xenstore_notifier.notifier_call = balloon_init_watcher; > > diff -Napur -X linux-3.0-rc1/Documentation/dontdiff -x > scripts/selinux/mdp/mdp linux-3.0-rc1-frontswap/drivers/xen/xen-selfballoon.c > linux-3.0-rc1-frontswap-selfballoon/drivers/xen/xen-selfballoon.c > --- linux-3.0-rc1-frontswap/drivers/xen/xen-selfballoon.c 1969-12-31 > 17:00:00.000000000 -0700 > +++ linux-3.0-rc1-frontswap-selfballoon/drivers/xen/xen-selfballoon.c > 2011-06-20 15:43:08.613818583 -0600 > @@ -0,0 +1,350 @@ > +/****************************************************************************** > + * Xen selfballoon driver > + * > + * Copyright (c) 2009-2011, Dan Magenheimer, Oracle Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License version 2 > + * as published by the Free Software Foundation; or, when distributed > + * separately from the Linux kernel or incorporated into other > + * software packages, subject to the following license: > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this source file (the "Software"), to deal in the Software without > + * restriction, including without limitation the rights to use, copy, modify, > + * merge, publish, distribute, sublicense, and/or sell copies of the > Software, > + * and to permit persons to whom the Software is furnished to do so, subject > to > + * the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > THE > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#include <linux/kernel.h> > +#include <linux/mm.h> > +#include <linux/mman.h> > + > +#include <xen/balloon.h> > + > +#ifdef CONFIG_FRONTSWAP > +#include <linux/frontswap.h> > +#endif > +#include <xen/tmem.h> > + > +static int xen_selfballooning_enabled __read_mostly; > +static int use_selfballooning __read_mostly = 1; These guys could be in the __initdata section. > +static unsigned int selfballoon_interval __read_mostly; > +static unsigned int selfballoon_downhysteresis __read_mostly; > +static unsigned int selfballoon_uphysteresis __read_mostly; You should provide a comment explaining what 'up and down' hysteresis means? Is it the watermark level at which point the thread kicks in to action? Is the interval in seconds? miliseconds? Is the interval how often the thread checks those watermarks values? Perhaps a better name would be selfballoon_wakup_time ? > + > +static void selfballoon_process(struct work_struct *work); > +DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process); static? > + > +#ifdef CONFIG_FRONTSWAP > +static bool frontswap_selfshrinking __read_mostly; > +static int use_frontswap_selfshrink __read_mostly = 1; __initdata? > +static unsigned int frontswap_hysteresis __read_mostly; > +static unsigned int frontswap_inertia __read_mostly; Ditto on the explanation. Not sure what 'frontswap_hysterisis' means. > + > +#define frontswap_selfshrinking_enabled frontswap_selfshrinking > + > +static unsigned long frontswap_inertia_counter; And what is the purpose of this fellow? > + > +static void frontswap_selfshrink(void) Hm, when I saw the name, combined with the #define: "frontswap_selfshrinking_enabled" I thought this function would return a boolean. But it looks as this function is actually kicking the shrinker in action. Perhaps: frontswap_shrink_now is a better name? > +{ > + static unsigned long cur_frontswap_pages; > + static unsigned long last_frontswap_pages; > + static unsigned long tgt_frontswap_pages; So you shrunk the 'current', 'target' to something smaller. Why not shrink frontswap too? Say f_swp ? tgt_fswp_pgs ? or tgt_frontswap_pgs ? > + > + if (!frontswap_selfshrinking) > + return; > + if (!frontswap_hysteresis) > + return; > + last_frontswap_pages = cur_frontswap_pages; > + cur_frontswap_pages = frontswap_curr_pages(); > + if (!cur_frontswap_pages || > + (cur_frontswap_pages > last_frontswap_pages)) { Is it possible for cur_frontswap_pages to return -1? Should the '||' be '&&' ? > + frontswap_inertia_counter = frontswap_inertia; > + return; > + } > + if (frontswap_inertia_counter && --frontswap_inertia_counter) > + return; No need for a spinlock on the frontswap_inertia_counter? Is that b/c there is only one caller for this function? > + if (cur_frontswap_pages <= frontswap_hysteresis) > + tgt_frontswap_pages = 0; > + else > + tgt_frontswap_pages = cur_frontswap_pages - > + (cur_frontswap_pages / frontswap_hysteresis); What if frontswap_hysteris is zero? > + frontswap_shrink(tgt_frontswap_pages); > +} > + > +static int __init no_frontswap_selfshrink_setup(char *s) > +{ > + use_frontswap_selfshrink = 0; > + return 1; > +} > + > +__setup("noselfshrink", no_frontswap_selfshrink_setup); In the patch description it was called 'noselfballooning' ? Which one should it be? [edit: ah, you have another option later on called noselfballooning. you should probably document the noselfshrink somwhere] > +#else /* !CONFIG_FRONTSWAP */ > +static inline void frontswap_selfshrink(void) > +{ > +} > +#define frontswap_selfshrinking_enabled 0 > +#endif > + > +static void selfballoon_process(struct work_struct *work) > +{ > + unsigned long cur_pages, goal_pages, tgt_pages; goal and target sound quite similar. Should tgt be just named 'pages'? > + int reset_timer = 0; How about just using a bool? > + > + if (xen_selfballooning_enabled) { > + tgt_pages = balloon_stats.current_pages; > + cur_pages = balloon_stats.current_pages; > + goal_pages = percpu_counter_read_positive(&vm_committed_as) + > + balloon_stats.current_pages - totalram_pages; Hm, you could just reference tgt_pages instead of re-reading the balloon_stats.current_pages as that value might change as you are reading it (since you are not using the balloon spinlock). Or is that the goal - you are trying to read it to have the latest value? If so, wouldn't it better to take the the balloon spinlock? Is there any way that goal_pages ends up being negative? > + if (cur_pages > goal_pages) > + tgt_pages = cur_pages - > + ((cur_pages - goal_pages) / > + selfballoon_downhysteresis); What if ..downhysterisis is zero? > + else if (cur_pages < goal_pages) > + tgt_pages = cur_pages + > + ((goal_pages - cur_pages) / > + selfballoon_uphysteresis); Ditto. No need to check for cur_pages == goal_pages? > + balloon_set_new_target(tgt_pages); > + reset_timer = 1; > + } > + if (frontswap_selfshrinking_enabled) { > + frontswap_selfshrink(); > + reset_timer = 1; > + } > + if (reset_timer) > + schedule_delayed_work(&selfballoon_worker, > + selfballoon_interval * HZ); > +} > + > +#ifdef CONFIG_SYSFS > +#define SELFBALLOON_SHOW(name, format, args...) > \ > + static ssize_t show_##name(struct sys_device *dev, \ > + struct sysdev_attribute *attr, \ > + char *buf) \ > + { \ > + return sprintf(buf, format, ##args); \ > + } > + > +SELFBALLOON_SHOW(selfballooning, "%d\n", xen_selfballooning_enabled); > + > +static ssize_t store_selfballooning(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + int was_enabled = xen_selfballooning_enabled; Make it a bool. > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + xen_selfballooning_enabled = !!memparse(buf, &endchar); > + > + if (!was_enabled && xen_selfballooning_enabled) > + schedule_delayed_work(&selfballoon_worker, > + selfballoon_interval * HZ); > + > + return count; > +} > + > +static SYSDEV_ATTR(selfballooning, S_IRUGO | S_IWUSR, > + show_selfballooning, store_selfballooning); > + > +SELFBALLOON_SHOW(selfballoon_interval, "%d\n", selfballoon_interval); > + > +static ssize_t store_selfballoon_interval(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + selfballoon_interval = memparse(buf, &endchar); No checking to see if the value is out of bounds? Say -1 (well, it would return zero from memparse) so the interval would be set to 0. Is that OK? > + return count; > +} > + > +static SYSDEV_ATTR(selfballoon_interval, S_IRUGO | S_IWUSR, > + show_selfballoon_interval, store_selfballoon_interval); > + > +SELFBALLOON_SHOW(selfballoon_downhys, "%d\n", selfballoon_downhysteresis); > + > +static ssize_t store_selfballoon_downhys(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + selfballoon_downhysteresis = memparse(buf, &endchar); Ditto. What if I did 'echo -1000 > seflballoon_downhys' .. Then this value ends up being zero and the worker might end up dividing by zero. > + return count; > +} > + > +static SYSDEV_ATTR(selfballoon_downhysteresis, S_IRUGO | S_IWUSR, > + show_selfballoon_downhys, store_selfballoon_downhys); > + > + > +SELFBALLOON_SHOW(selfballoon_uphys, "%d\n", selfballoon_uphysteresis); > + > +static ssize_t store_selfballoon_uphys(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + selfballoon_uphysteresis = memparse(buf, &endchar); Ditto. > + return count; > +} > + > +static SYSDEV_ATTR(selfballoon_uphysteresis, S_IRUGO | S_IWUSR, > + show_selfballoon_uphys, store_selfballoon_uphys); > + > +#ifdef CONFIG_FRONTSWAP > +SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n", frontswap_selfshrinking); > + > +static ssize_t store_frontswap_selfshrinking(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + int was_enabled = frontswap_selfshrinking; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + frontswap_selfshrinking = !!memparse(buf, &endchar); > + > + if (!was_enabled && !xen_selfballooning_enabled && > + frontswap_selfshrinking) > + schedule_delayed_work(&selfballoon_worker, > + selfballoon_interval * HZ); > + > + return count; > +} > + > +static SYSDEV_ATTR(frontswap_selfshrinking, S_IRUGO | S_IWUSR, > + show_frontswap_selfshrinking, store_frontswap_selfshrinking); > + > +SELFBALLOON_SHOW(frontswap_inertia, "%d\n", frontswap_inertia); > + > +static ssize_t store_frontswap_inertia(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + frontswap_inertia = memparse(buf, &endchar); > + frontswap_inertia_counter = frontswap_inertia; > + return count; > +} > + > +static SYSDEV_ATTR(frontswap_inertia, S_IRUGO | S_IWUSR, > + show_frontswap_inertia, store_frontswap_inertia); > + > +SELFBALLOON_SHOW(frontswap_hysteresis, "%d\n", frontswap_hysteresis); > + > +static ssize_t store_frontswap_hysteresis(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + char *endchar; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + frontswap_hysteresis = memparse(buf, &endchar); > + return count; > +} > + > +static SYSDEV_ATTR(frontswap_hysteresis, S_IRUGO | S_IWUSR, > + show_frontswap_hysteresis, store_frontswap_hysteresis); > + > +#endif /* CONFIG_FRONTSWAP */ > + > +static struct attribute *selfballoon_attrs[] = { > + &attr_selfballooning.attr, > + &attr_selfballoon_interval.attr, > + &attr_selfballoon_downhysteresis.attr, > + &attr_selfballoon_uphysteresis.attr, > +#ifdef CONFIG_FRONTSWAP > + &attr_frontswap_selfshrinking.attr, > + &attr_frontswap_hysteresis.attr, > + &attr_frontswap_inertia.attr, No NULL at the end? > +#endif > +}; > + > +static struct attribute_group selfballoon_group = { > + .name = "selfballoon", > + .attrs = selfballoon_attrs > +}; > +#endif > + > +int register_xen_selfballooning(struct sys_device *sysdev) > +{ > + int error = -1; > + > +#ifdef CONFIG_SYSFS > + error = sysfs_create_group(&sysdev->kobj, &selfballoon_group); > +#endif > + return error; > +} > +EXPORT_SYMBOL(register_xen_selfballooning); > + > +static int __init noselfballooning_setup(char *s) > +{ > + use_selfballooning = 0; > + return 1; > +} > + > +__setup("noselfballooning", noselfballooning_setup); > + > +static int __init xen_selfballoon_init(void) > +{ > + if (!xen_domain()) > + return -ENODEV; > + > + pr_info("xen/balloon: Initializing Xen selfballooning driver.\n"); > +#ifdef CONFIG_FRONTSWAP > + pr_info("xen/balloon: Initializing frontswap selfshrinking driver.\n"); > +#endif > + > + xen_selfballooning_enabled = tmem_enabled && use_selfballooning; > + selfballoon_interval = 5; > + selfballoon_downhysteresis = 8; > + selfballoon_uphysteresis = 1; And these values are ... how did you come about them? > +#ifdef CONFIG_FRONTSWAP > + frontswap_selfshrinking = use_frontswap_selfshrink && frontswap_enabled; > + frontswap_hysteresis = 20; > + frontswap_inertia = 3; Ditto. > +#endif > + schedule_delayed_work(&selfballoon_worker, selfballoon_interval * HZ); > + > + return 0; > +} > + > +subsys_initcall(xen_selfballoon_init); > + > +MODULE_LICENSE("GPL"); > diff -Napur -X linux-3.0-rc1/Documentation/dontdiff -x > scripts/selinux/mdp/mdp linux-3.0-rc1-frontswap/include/xen/balloon.h > linux-3.0-rc1-frontswap-selfballoon/include/xen/balloon.h > --- linux-3.0-rc1-frontswap/include/xen/balloon.h 2011-05-29 > 18:43:36.000000000 -0600 > +++ linux-3.0-rc1-frontswap-selfballoon/include/xen/balloon.h 2011-06-20 > 14:58:24.975176230 -0600 > @@ -23,3 +23,13 @@ void balloon_set_new_target(unsigned lon > > int alloc_xenballooned_pages(int nr_pages, struct page** pages); > void free_xenballooned_pages(int nr_pages, struct page** pages); > + > +struct sys_device; > +#ifdef CONFIG_XEN_SELFBALLOONING > +extern int register_xen_selfballooning(struct sys_device *sysdev); > +#else > +static inline int register_xen_selfballooning(struct sys_device *sysdev) > +{ > + return -1; > +} > +#endif > diff -Napur -X linux-3.0-rc1/Documentation/dontdiff -x > scripts/selinux/mdp/mdp linux-3.0-rc1-frontswap/include/xen/tmem.h > linux-3.0-rc1-frontswap-selfballoon/include/xen/tmem.h > --- linux-3.0-rc1-frontswap/include/xen/tmem.h 1969-12-31 > 17:00:00.000000000 -0700 > +++ linux-3.0-rc1-frontswap-selfballoon/include/xen/tmem.h 2011-06-16 > 10:15:44.514850956 -0600 > @@ -0,0 +1,4 @@ > +#ifndef _XEN_TMEM_H > +#define _XEN_TMEM_H > +extern int tmem_enabled; Can you include a comment saying where the 'tmem_enabled' is actuall present? As in in what file? > +#endif /* _XEN_TMEM_H */ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |