[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
On Fri, 2014-12-12 at 17:58 +0000, Ian Campbell wrote: > (adding Ian J who knows a bit more about C xenstored than me...) > > On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote: > > Hello Ian, > > > > On 12.12.2014 17:56, Ian Campbell wrote: > > > On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote: > > >> On 12.12.2014 17:32, Ian Campbell wrote: > > >>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote: > > >>>> We did enable tracing and now have the xenstored-trace.log of one > > >>>> crash: > > >>>> It contains 1.6 billion lines and is 83 GiB. > > >>>> It just shows xenstored to crash on TRANSACTION_START. > > >>>> > > >>>> Is there some tool to feed that trace back into a newly launched > > >>>> xenstored? > > >>> > > >>> Not that I know of I'm afraid. > > >> > > >> Okay, then I have to continue with my own tool. > > > > > > If you do end up developing a tool to replay a xenstore trace then I > > > think that'd be something great to have in tree! > > > > I just need to figure out how to talk to xenstored on the wire: for some > > strange reason xenstored is closing the connection to the UNIX socket on > > the first write inside a transaction. > > Or switch to /usr/share/pyshared/xen/xend/xenstore/xstransact.py... > > > > >>> Do you get a core dump when this happens? You might need to fiddle with > > >>> ulimits (some distros disable by default). IIRC there is also some /proc > > >>> nob which controls where core dumps go on the filesystem. > > >> > > >> Not for that specific trace: We first enabled generating core files, but > > >> only then discovered that this is not enough. > > > > > > How wasn't it enough? You mean you couldn't use gdb to extract a > > > backtrace from the core file? Or was something else wrong? > > > > The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus. > > > > (gdb) bt full > > #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116 > > tc = <value optimized out> > > #1 0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551 > > tc = <value optimized out> > > #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 > > "/var/lib/xenstored/tdb.0x1935bb0", > > I've timed out for tonight will try and have another look next week. I've had another dig, and have instrumented all of the error paths from this function and I can't see any way for an invalid pointer to be produced, let alone freed. I've been running under valgrind which should have caught any uninitialised memory type errors. > > hash_size=<value optimized out>, tdb_flags=0, open_flags=<value > > optimized out>, mode=<value optimized out>, > > log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at > > tdb.c:1958 Please can you confirm what is at line 1958 of your copy of tdb.c. I think it will be tdb->locked, but I'd like to be sure. You are running a 64-bit dom0, correct? I've only just noticed that 0xff00000000 is >32bits. My testing so far was 32-bit, I don't think it should matter wrt use of uninitialised data etc. I can't help feeling that 0xff00000000 must be some sort of magic sentinel value to someone. I can't figure out what though. Have you observed the xenstored processes growing especially large before this happens? I'm wondering if there might be a leak somewhere which after a time is resulting a I'm about to send out a patch which plumbs tdb's logging into xenstored's logging, in the hopes that next time you see this it might say something as it dies. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |