[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domains
> I think both of these should be backported for Xen 4.0.1? Thanks for handling it. I agree it should go into Xen 4.0.1, but I won't be able to give it much testing before that releases (unless it takes a few more RC's) so if you are confident and Jan doesn't see any problems, go for it. > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] > Sent: Thursday, June 10, 2010 3:43 PM > To: Dan Magenheimer; xen-devel@xxxxxxxxxxxxxxxxxxx > Cc: JBeulich@xxxxxxxxxx > Subject: Re: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie > saved domains > > On 10/06/2010 21:06, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote: > > > This patch looks good, I'll test it tomorrow. > > It tests okay so I applied it as xen-unstable:21595. > > > What you *do* need to do when setting up a new tmem client is check > that the > > associated domain is not dying. Without that check the code is in > fact > > currently buggy: you can end up with a zombie domain that is a client > of > > tmem and will never stop being a client because it became a client > after > > tmem_destroy() was called on it. > > I implemented this as xen-unstable:21596. Take a look. It's pretty > straightforward. > > I think both of these should be backported for Xen 4.0.1? > > -- Keir > > > Does that make sense? > > > > -- Keir > > > > On 10/06/2010 18:54, "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx> > wrote: > > > >> Could you give the attached a try on your test case? If > >> it passes and Jan thinks it is OK (as I backed out most of > >> his patch at cs 20918), then: > >> > >> Tmem: fix domain refcount leak by returning to simpler model > >> which claims a ref once when the tmem client is first associated > >> with the domain, and puts it once when the tmem client is > >> destroyed. > >> > >> Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> > >> > >>> If you have a handle on a domain already I wonder why you need to > >>> continually look up by domid... > >> > >> Nearly all tmem uses are via current->domain. The remaining > >> (such as from the tools) are via a specified domid. I don't > >> keep a domid->domain lookup table around as the frequency is > >> very low and the existing mechanism works fine (or it does > >> if I use it correctly anyway ;-) > >> > >>> RCU locking > >>> is fully implemented already. It's highly unlikely to change in > future > >>> and we can work out something else for your case if that happens. > >> > >> I guess I was confused by the fact that the rcu_lock/unlock macros > >> are no-ops. > >> > >> In any case, I think I understand the semantics well enough now > >> after your second reply pointing me to rcu_unlock_domain, so > >> I think the attached patch should avoid special cases in the > >> future. > >> > >>>> I'd like to do a get_domain_by_id() without doing a get_domain() > >>>> as the tmem code need only get_domain() once on first use > >>>> and put_domain() once when destroying, but frequently needs > >>>> to lookup a domain by id. > >>> > >>> If you have a handle on a domain already I wonder why you need to > >>> continually look up by domid... > >>> > >>>> It looks like rcu_lock_domain_by_id() does what I need, but > >>>> I don't need any rcu critical sections (outside of the domain > >>>> lookup itself) and am fearful that if rcu locking ever is fully > >>>> implemented, my use of rcu_lock_domain_by_id() would become > >>>> incorrect and I may have a problem. Should I create an equivalent > >>>> get_domain_by_id_no_ref()? Or am I misunderstanding something? > >>> > >>> If you really know what you're doing, I suggest just have your own > >>> tmh_lookup_domain() macro mapping onto rcu_lock_domain_by_id(). RCU > >>> locking > >>> is fully implemented already. It's highly unlikely to change in > future > >>> and > >>> we can work out something else for your case if that happens. > >>> > >>> I'm not keen on providing an explicitly synchronisation-free > version in > >>> common code. It just encourages people not to think about > >>> synchronisation at > >>> all. > >>> > >>> -- Keir > >>> > >>>> Semi-related, rcu_lock_domain_by_id() has a "return d" inside > >>>> the for loop without an rcu_read_unlock(). I see that this > >>>> is irrelevant now because rcu_read_unlock() is a no-op anyway, > >>>> but maybe this should be cleaned up for the same reason -- > >>>> in case rcu locking is ever fully implemented. > >>>> > >>>> Thanks, > >>>> Dan > >>>> > >>>>> -----Original Message----- > >>>>> From: Dan Magenheimer > >>>>> Sent: Thursday, June 10, 2010 7:08 AM > >>>>> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx > >>>>> Subject: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie > >>> saved > >>>>> domains > >>>>> > >>>>> OK, will take a look. > >>>>> > >>>>> IIRC, Jan's work to fix the domain reference stuff just > >>>>> before 4.0 shipped was a heavy hammer but since it seemed > >>>>> to work, I didn't want to mess with it so close to release... > >>>>> really there's only a need to take a reference once on > >>>>> first use and release it at shutdown, rather than > >>>>> take/release frequently. IIRC, I had used a macro that > >>>>> took references when they weren't really needed and > >>>>> Jan placed the matching macros that did the release. > >>>>> > >>>>> Dan > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] > >>>>>> Sent: Thursday, June 10, 2010 3:47 AM > >>>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx > >>>>>> Cc: Dan Magenheimer > >>>>>> Subject: Bug in tmem: refcount leak leaves zombie saved domains > >>>>>> > >>>>>> Dan, > >>>>>> > >>>>>> Just doing some save/restore testing on xen-unstable tip, I > noticed > >>>>>> that: > >>>>>> # xm create ./pv_config > >>>>>> # xm save PV1 > >>>>>> > >>>>>> Would leave the saved guest as a zombie in the DOMDYING_dead > state > >>>>> with > >>>>>> no > >>>>>> pages, yet with refcnt=1. This happens absolutely consistently. > >>> Just > >>>>> as > >>>>>> consistently, it does not happen when I boot Xen with no-tmem. > My > >>>>>> conclusion > >>>>>> is that tmem is leaking a domain reference count during domain > >>> save. > >>>>>> This > >>>>>> doesn't happen if I merely "xm create ...; xm destroy ...". > >>>>>> > >>>>>> My pv_config file contains nothing exciting: > >>>>>> kernel = "/nfs/keir/xen/xen64.hg/dist/install/boot/vmlinuz- > >>> 2.6.18.8- > >>>>>> xenU" > >>>>>> memory = 750 > >>>>>> name = "PV1" > >>>>>> vcpus = 2 > >>>>>> vif = [ 'mac=00:1a:00:00:01:01' ] > >>>>>> disk = [ 'phy:/dev/VG/Suse10.1_64_1,sda1,w' ] > >>>>>> root = "/dev/sda1 ro xencons=tty" > >>>>>> extra = "" > >>>>>> tsc_native = 1 > >>>>>> on_poweroff = 'destroy' > >>>>>> on_reboot = 'restart' > >>>>>> on_crash = 'preserve' > >>>>>> > >>>>>> The dom{0,U} kernels are tip of linux-2.6.18-xen, default - > xen{0,U} > >>>>>> configs. > >>>>>> > >>>>>> -- Keir > >>>>>> > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Xen-devel mailing list > >>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx > >>>>> http://lists.xensource.com/xen-devel > >>> > >>> > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |