[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] [Xen-devel] bug in xenstored? No notification to subscription on @introduceDomain
On Fri, 2011-12-09 at 19:49 +0000, George Shuklin wrote: > Good day. > > I think I met some strange bug in xenstored. If you are using XCP then this will be using oxenstored. I've CC'd xen-api@ since that is the correct place for XCP discussions. It's also plausibly a bug in the C client library or the python bindings to that library (or indeed your application). > I using XCP for long time and all that time we have some funny bug we > was not able to debug enough due product environment and very low chance > to appear, now we was able to catch it in testing environment and done > some research. > > We have python application running in dom0 and waiting domain > appearance. This implemented this via subscription to @introduceDomain > xenstore key. Under some conditions we stops to receive notification on > subscription. If we ran application as second instance it will receive > that notification, if we restart application it will receive too. You lose both @introduce and @release notifications or just @introduce? Does the app do any other XS stuff, e.g. other watches or read/write? Do these stop working also? oxenstored (at least in XCP) logs to /var/log/xenstore-access.log -- do you see any activity in there? There is also /var/log/xenstored.log Does strace show the daemon writing (or trying to write) to the socket associated with this client? What about on the client side? (nb: libxenstore uses a thread to handle watches so be sure to use the appropriate options to strace.) Identifying the fd associated with the connection on either end might be tricky, /proc/<pid>/fd and/or netstat might help narrow it down. The app being python presumably makes it hard to attach gdb to and get anything sensible, likewise the daemon being ocaml. If anyone has any hints on attaching a debugging to an existing process of these types then that might be useful. Other than that I'm afraid I really don't have any idea what might be going wrong, or indeed what other next steps can be taken to diagnose the issue :-( Ian. > I unable to pinpoint exact condition for this, but this > a) Happens occasionally but consistently (about once a month in farm of > 50 hosts at least at one host) > b) Not related to xenstored uptime > c) Not related to load on xen or dom0 > d) Not related to amount of domains > e) Occur at least at XCP 0.5, 1.0 and 1.1 (I don't know how to get > version from xenstored) > > Last time I got that on two hosts in lab at same time (with single guest > domain without any high load) and done some experiments - so I can say > exactly I wrote above. > > The pieces from python code we ran: > > from xen.lowlevel.xs import xs > conn = xs.xs() > conn.watch("@introduceDomain", "+") > conn.watch("@releaseDomain", "-") > conn.read_watch() > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ xen-api mailing list xen-api@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |