[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenstored crashes with SIGSEGV

On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote:
> Hello,
> On 13.11.2014 10:12, Ian Campbell wrote:
> > On Thu, 2014-11-13 at 08:45 +0100, Philipp Hahn wrote:
> >> To me this looks like some memory corruption by some unknown code
> >> writing into some random memory space, which happens to be the tdb here.
> > 
> > I wonder if running xenstored under valgrind would be useful. I think
> > you'd want to stop xenstored from starting during normal boot and then
> > launch it with:
> >         valgrind /usr/local/sbin/xenstored -N
> > -N is to stay in the foreground, you might want to do this in a screen
> > session or something, alternatively you could investigate the --log-*
> > options in the valgrind manpage, together with the various
> > --trace-children* in order to follow the processes over its
> > daemonization.
> We did enable tracing and now have the xenstored-trace.log of one crash:
> It contains 1.6 billion lines and is 83 GiB.
> It just shows xenstored to crash on TRANSACTION_START.
> Is there some tool to feed that trace back into a newly launched xenstored?

Not that I know of I'm afraid.

Do you get a core dump when this happens? You might need to fiddle with
ulimits (some distros disable by default). IIRC there is also some /proc
nob which controls where core dumps go on the filesystem.

> My hope would be that xenstored crashes again, because then we could use
> all those other tools like valgrind more easily.

That would be handy. My fear would be that this bug is likely to be a
race condition of some sort, and the granularity/accuracy of the
playback would possibly need to be quite high to trigger the issue.
> > Do you rm the xenstore db on boot? It might have a persistent
> > corruption, aiui most folks using C xenstored are doing so or even
> > placing it on a tmpfs for performance reasons.
> We're using a tmpfs for /var/lib/xenstored/, as we had some sever
> performance problem with something updating
> /local/domain/0/backend/console/*/0/uuid too often, which put xenstored
> in permanent D state.

But this is just a process crashing and not the whole host so you still
have the db file at the point of the crash?

It might be interesting to see what happens if you preserve the db and
reboot arranging for the new xenstored to start with the old file. If
the corruption is part of the file then maybe it can be induced to crash
again more quickly.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.