[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote: > Hello Ian, > > On 15.12.2014 14:17, Ian Campbell wrote: > > On Fri, 2014-12-12 at 17:58 +0000, Ian Campbell wrote: > >> On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote: > >>> On 12.12.2014 17:56, Ian Campbell wrote: > >>>> On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote: > >>>>> On 12.12.2014 17:32, Ian Campbell wrote: > >>>>>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote: > ... > >>> The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus. > >>> > >>> (gdb) bt full > >>> #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116 > >>> tc = <value optimized out> > >>> #1 0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551 > >>> tc = <value optimized out> > >>> #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 > >>> "/var/lib/xenstored/tdb.0x1935bb0", > > I just noticed something strange: > > > #3 0x000000000040a684 in tdb_open (name=0xff00000000 <Address > > 0xff00000000 out of bounds>, hash_size=0, > > tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773 > > #4 0x000000000040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0 > > "/var/lib/xenstored/tdb.0x1935bb0") > > Why does gdb-7.0.1 print "name=0xff000000" here for frame 3, but for > frame 2 and 4 the pointers are correct again? > Verifying the values with an explicit "print" shows them as correct. I has just noticed that and was wondering about that same thing. I'm starting to worry that 0xff00000000 might just be a gdb thing, similar to <value optimized out>, but infinitely more misleading. I've also noticed in https://forge.univention.org/bugzilla/show_bug.cgi?id=35104 that the constant can be either 0xff000000, 0xff00000000 or 0xff0000000000 (6, 8 or 10 zeroes). > >>> hash_size=<value optimized out>, tdb_flags=0, open_flags=<value > >>> optimized out>, mode=<value optimized out>, > >>> log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at > >>> tdb.c:1958 > > > > Please can you confirm what is at line 1958 of your copy of tdb.c. I > > think it will be tdb->locked, but I'd like to be sure. > > Yes, that's the line: > # sed -ne 1958p tdb.c > SAFE_FREE(tdb->locked); Good, thanks. > > You are running a 64-bit dom0, correct? > > yes: x86_64 Thanks for confirming. I'm resurrecting the 64-bit root partition on my test box (which it turns out was still Debian Squeeze!) > > > I've only just noticed that > > 0xff00000000 is >32bits. My testing so far was 32-bit, I don't think it > > should matter wrt use of uninitialised data etc. > > > > I can't help feeling that 0xff00000000 must be some sort of magic > > sentinel value to someone. I can't figure out what though. > > 0xff is too much for bit flip errors. and also two crashes on different > machines in the same location very much rules out any HW error for me. > > My 2nd idea was that someone decremented 0 one too many, but then that > would have to be an 8 bit value - reading the code I didn't see anything > like that. I was wondering if it was an overflow or sign-extension thing, but it doesn't seem likely, not enough high bits set for one thing. > One more thing we noticed: /var/lib/xenstored/ contained the tdb file > and to bit-identical copies after the crash, so I would read that as two > transactions being in progress at the time of the crash. Might be that > this is important. It's certainly worth noting, thanks. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |