[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenstored crashes with SIGSEGV


On 13.11.2014 10:12, Ian Campbell wrote:
> On Thu, 2014-11-13 at 08:45 +0100, Philipp Hahn wrote:
>> To me this looks like some memory corruption by some unknown code
>> writing into some random memory space, which happens to be the tdb here.
> I wonder if running xenstored under valgrind would be useful. I think
> you'd want to stop xenstored from starting during normal boot and then
> launch it with:
>         valgrind /usr/local/sbin/xenstored -N
> -N is to stay in the foreground, you might want to do this in a screen
> session or something, alternatively you could investigate the --log-*
> options in the valgrind manpage, together with the various
> --trace-children* in order to follow the processes over its
> daemonization.

We did enable tracing and now have the xenstored-trace.log of one crash:
It contains 1.6 billion lines and is 83 GiB.
It just shows xenstored to crash on TRANSACTION_START.

Is there some tool to feed that trace back into a newly launched xenstored?

My hope would be that xenstored crashes again, because then we could use
all those other tools like valgrind more easily.

>> 3. the crash happens rarely and the host run fine most of the time. The
>> crash mostly happens around midnight and seem to be guest-triggered, as
>> the logs on the host don't show any activity like starting new or
>> destroying running VMs. So far the problem only showed on host running
>> Linux VMs. Other host running Windows VMs so far never showed that crash.

Now we also observed a crash on a host running Windows VMs.

> If it is really mostly happening around midnight then it might be worth
> digging into the host and guest configs for cronjobs and the like, e.g.
> log rotation stuff like that which might be tweaking things somehow.
> Does this happen on multiple hosts, or just the one?

Multiple host in two different data centers.

> Do you rm the xenstore db on boot? It might have a persistent
> corruption, aiui most folks using C xenstored are doing so or even
> placing it on a tmpfs for performance reasons.

We're using a tmpfs for /var/lib/xenstored/, as we had some sever
performance problem with something updating
/local/domain/0/backend/console/*/0/uuid too often, which put xenstored
in permanent D state.

> If you are running 4.1.x then I think oxenstored isn't an option, but it
> might be something to consider when you upgrade.

Thank you for the hint, I'll have another look at the Ocaml version.

Thank you again.
Philipp Hahn

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.