[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domains
Could you give the attached a try on your test case? If it passes and Jan thinks it is OK (as I backed out most of his patch at cs 20918), then: Tmem: fix domain refcount leak by returning to simpler model which claims a ref once when the tmem client is first associated with the domain, and puts it once when the tmem client is destroyed. Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> > If you have a handle on a domain already I wonder why you need to > continually look up by domid... Nearly all tmem uses are via current->domain. The remaining (such as from the tools) are via a specified domid. I don't keep a domid->domain lookup table around as the frequency is very low and the existing mechanism works fine (or it does if I use it correctly anyway ;-) > RCU locking > is fully implemented already. It's highly unlikely to change in future > and we can work out something else for your case if that happens. I guess I was confused by the fact that the rcu_lock/unlock macros are no-ops. In any case, I think I understand the semantics well enough now after your second reply pointing me to rcu_unlock_domain, so I think the attached patch should avoid special cases in the future. > > I'd like to do a get_domain_by_id() without doing a get_domain() > > as the tmem code need only get_domain() once on first use > > and put_domain() once when destroying, but frequently needs > > to lookup a domain by id. > > If you have a handle on a domain already I wonder why you need to > continually look up by domid... > > > It looks like rcu_lock_domain_by_id() does what I need, but > > I don't need any rcu critical sections (outside of the domain > > lookup itself) and am fearful that if rcu locking ever is fully > > implemented, my use of rcu_lock_domain_by_id() would become > > incorrect and I may have a problem. Should I create an equivalent > > get_domain_by_id_no_ref()? Or am I misunderstanding something? > > If you really know what you're doing, I suggest just have your own > tmh_lookup_domain() macro mapping onto rcu_lock_domain_by_id(). RCU > locking > is fully implemented already. It's highly unlikely to change in future > and > we can work out something else for your case if that happens. > > I'm not keen on providing an explicitly synchronisation-free version in > common code. It just encourages people not to think about > synchronisation at > all. > > -- Keir > > > Semi-related, rcu_lock_domain_by_id() has a "return d" inside > > the for loop without an rcu_read_unlock(). I see that this > > is irrelevant now because rcu_read_unlock() is a no-op anyway, > > but maybe this should be cleaned up for the same reason -- > > in case rcu locking is ever fully implemented. > > > > Thanks, > > Dan > > > >> -----Original Message----- > >> From: Dan Magenheimer > >> Sent: Thursday, June 10, 2010 7:08 AM > >> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx > >> Subject: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie > saved > >> domains > >> > >> OK, will take a look. > >> > >> IIRC, Jan's work to fix the domain reference stuff just > >> before 4.0 shipped was a heavy hammer but since it seemed > >> to work, I didn't want to mess with it so close to release... > >> really there's only a need to take a reference once on > >> first use and release it at shutdown, rather than > >> take/release frequently. IIRC, I had used a macro that > >> took references when they weren't really needed and > >> Jan placed the matching macros that did the release. > >> > >> Dan > >> > >>> -----Original Message----- > >>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] > >>> Sent: Thursday, June 10, 2010 3:47 AM > >>> To: xen-devel@xxxxxxxxxxxxxxxxxxx > >>> Cc: Dan Magenheimer > >>> Subject: Bug in tmem: refcount leak leaves zombie saved domains > >>> > >>> Dan, > >>> > >>> Just doing some save/restore testing on xen-unstable tip, I noticed > >>> that: > >>> # xm create ./pv_config > >>> # xm save PV1 > >>> > >>> Would leave the saved guest as a zombie in the DOMDYING_dead state > >> with > >>> no > >>> pages, yet with refcnt=1. This happens absolutely consistently. > Just > >> as > >>> consistently, it does not happen when I boot Xen with no-tmem. My > >>> conclusion > >>> is that tmem is leaking a domain reference count during domain > save. > >>> This > >>> doesn't happen if I merely "xm create ...; xm destroy ...". > >>> > >>> My pv_config file contains nothing exciting: > >>> kernel = "/nfs/keir/xen/xen64.hg/dist/install/boot/vmlinuz- > 2.6.18.8- > >>> xenU" > >>> memory = 750 > >>> name = "PV1" > >>> vcpus = 2 > >>> vif = [ 'mac=00:1a:00:00:01:01' ] > >>> disk = [ 'phy:/dev/VG/Suse10.1_64_1,sda1,w' ] > >>> root = "/dev/sda1 ro xencons=tty" > >>> extra = "" > >>> tsc_native = 1 > >>> on_poweroff = 'destroy' > >>> on_reboot = 'restart' > >>> on_crash = 'preserve' > >>> > >>> The dom{0,U} kernels are tip of linux-2.6.18-xen, default -xen{0,U} > >>> configs. > >>> > >>> -- Keir > >>> > >>> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@xxxxxxxxxxxxxxxxxxx > >> http://lists.xensource.com/xen-devel > > Attachment:
tmem-fix-refcount-leak.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |