[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domains
This patch looks good, I'll test it tomorrow. I think you don't need to mess with reference counts *at all* however, since domain_kill()->tmem_destroy()->client_flush()->client_free()->tmh_client_des troy()->put_domain(). But domain_kill() itself holds a domain reference, hence tmem_destroy's put_domain() is never the last domain reference. IOW, since you have a hook on the domain destruction path, you basically get a callback before a domain's reference count falls to zero, and that's all you need. What you *do* need to do when setting up a new tmem client is check that the associated domain is not dying. Without that check the code is in fact currently buggy: you can end up with a zombie domain that is a client of tmem and will never stop being a client because it became a client after tmem_destroy() was called on it. Does that make sense? -- Keir On 10/06/2010 18:54, "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx> wrote: > Could you give the attached a try on your test case? If > it passes and Jan thinks it is OK (as I backed out most of > his patch at cs 20918), then: > > Tmem: fix domain refcount leak by returning to simpler model > which claims a ref once when the tmem client is first associated > with the domain, and puts it once when the tmem client is > destroyed. > > Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> > >> If you have a handle on a domain already I wonder why you need to >> continually look up by domid... > > Nearly all tmem uses are via current->domain. The remaining > (such as from the tools) are via a specified domid. I don't > keep a domid->domain lookup table around as the frequency is > very low and the existing mechanism works fine (or it does > if I use it correctly anyway ;-) > >> RCU locking >> is fully implemented already. It's highly unlikely to change in future >> and we can work out something else for your case if that happens. > > I guess I was confused by the fact that the rcu_lock/unlock macros > are no-ops. > > In any case, I think I understand the semantics well enough now > after your second reply pointing me to rcu_unlock_domain, so > I think the attached patch should avoid special cases in the > future. > >>> I'd like to do a get_domain_by_id() without doing a get_domain() >>> as the tmem code need only get_domain() once on first use >>> and put_domain() once when destroying, but frequently needs >>> to lookup a domain by id. >> >> If you have a handle on a domain already I wonder why you need to >> continually look up by domid... >> >>> It looks like rcu_lock_domain_by_id() does what I need, but >>> I don't need any rcu critical sections (outside of the domain >>> lookup itself) and am fearful that if rcu locking ever is fully >>> implemented, my use of rcu_lock_domain_by_id() would become >>> incorrect and I may have a problem. Should I create an equivalent >>> get_domain_by_id_no_ref()? Or am I misunderstanding something? >> >> If you really know what you're doing, I suggest just have your own >> tmh_lookup_domain() macro mapping onto rcu_lock_domain_by_id(). RCU >> locking >> is fully implemented already. It's highly unlikely to change in future >> and >> we can work out something else for your case if that happens. >> >> I'm not keen on providing an explicitly synchronisation-free version in >> common code. It just encourages people not to think about >> synchronisation at >> all. >> >> -- Keir >> >>> Semi-related, rcu_lock_domain_by_id() has a "return d" inside >>> the for loop without an rcu_read_unlock(). I see that this >>> is irrelevant now because rcu_read_unlock() is a no-op anyway, >>> but maybe this should be cleaned up for the same reason -- >>> in case rcu locking is ever fully implemented. >>> >>> Thanks, >>> Dan >>> >>>> -----Original Message----- >>>> From: Dan Magenheimer >>>> Sent: Thursday, June 10, 2010 7:08 AM >>>> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx >>>> Subject: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie >> saved >>>> domains >>>> >>>> OK, will take a look. >>>> >>>> IIRC, Jan's work to fix the domain reference stuff just >>>> before 4.0 shipped was a heavy hammer but since it seemed >>>> to work, I didn't want to mess with it so close to release... >>>> really there's only a need to take a reference once on >>>> first use and release it at shutdown, rather than >>>> take/release frequently. IIRC, I had used a macro that >>>> took references when they weren't really needed and >>>> Jan placed the matching macros that did the release. >>>> >>>> Dan >>>> >>>>> -----Original Message----- >>>>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] >>>>> Sent: Thursday, June 10, 2010 3:47 AM >>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx >>>>> Cc: Dan Magenheimer >>>>> Subject: Bug in tmem: refcount leak leaves zombie saved domains >>>>> >>>>> Dan, >>>>> >>>>> Just doing some save/restore testing on xen-unstable tip, I noticed >>>>> that: >>>>> # xm create ./pv_config >>>>> # xm save PV1 >>>>> >>>>> Would leave the saved guest as a zombie in the DOMDYING_dead state >>>> with >>>>> no >>>>> pages, yet with refcnt=1. This happens absolutely consistently. >> Just >>>> as >>>>> consistently, it does not happen when I boot Xen with no-tmem. My >>>>> conclusion >>>>> is that tmem is leaking a domain reference count during domain >> save. >>>>> This >>>>> doesn't happen if I merely "xm create ...; xm destroy ...". >>>>> >>>>> My pv_config file contains nothing exciting: >>>>> kernel = "/nfs/keir/xen/xen64.hg/dist/install/boot/vmlinuz- >> 2.6.18.8- >>>>> xenU" >>>>> memory = 750 >>>>> name = "PV1" >>>>> vcpus = 2 >>>>> vif = [ 'mac=00:1a:00:00:01:01' ] >>>>> disk = [ 'phy:/dev/VG/Suse10.1_64_1,sda1,w' ] >>>>> root = "/dev/sda1 ro xencons=tty" >>>>> extra = "" >>>>> tsc_native = 1 >>>>> on_poweroff = 'destroy' >>>>> on_reboot = 'restart' >>>>> on_crash = 'preserve' >>>>> >>>>> The dom{0,U} kernels are tip of linux-2.6.18-xen, default -xen{0,U} >>>>> configs. >>>>> >>>>> -- Keir >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel >> >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |