[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote: > Hello Ian, > > On 15.12.2014 18:45, Ian Campbell wrote: > > On Mon, 2014-12-15 at 14:50 +0000, Ian Campbell wrote: > >> On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote: > >>> I just noticed something strange: > >>> > >>>> #3 0x000000000040a684 in tdb_open (name=0xff00000000 <Address > >>>> 0xff00000000 out of bounds>, hash_size=0, > >>>> tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773 > ... > > I'm reasonably convinced now that this is just a weird artefact of > > running gdb on an optimised binary, probably a shortcoming in the debug > > info leading to gdb getting confused. > > > > Unfortunately this also calls into doubt the parameter to talloc_free, > > perhaps in that context 0xff0000000 is a similar artefact. > > > > Please can you print the entire contents of tdb in the second frame > > ("print *tdb" ought to do it). I'm curious whether it is all sane or > > not. > > (gdb) print *tdb > $1 = {name = 0x0, map_ptr = 0x0, fd = 47, map_size = 65280, read_only = > 16711680, > locked = 0xff0000000000, So it really does seem to be 0xff0000000000 in memory. > flags = 0, > travlocks = { > next = 0xff0000, off = 0, hash = 65280}, next = 0xff0000, > device = 280375465082880, inode = 16711680, log_fn = 0x4093b0 > <null_log_fn>, > hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2} And here we can see tdb->{flags,open_flags} == 0 and 2, contrary to what the stack trace says we were called with, which was nonsense. Since 0 and 2 are sensible and correspond to what the caller passes I think the stack trace is just confused. > (gdb) info registers > rax 0x0 0 > rbx 0x16bff70 23854960 > rcx 0xffffffffffffffff -1 > rdx 0x40ecd0 4254928 > rsi 0x0 0 > rdi 0xff0000000000 280375465082880 And here it is in the registers. > rbp 0x7fcaed6c96a8 0x7fcaed6c96a8 > rsp 0x7fff9dc86330 0x7fff9dc86330 > r8 0x7fcaece54c08 140509534571528 > r9 0xff00000000000000 -72057594037927936 > r10 0x7fcaed08c14c 140509536895308 > r11 0x246 582 > r12 0xd 13 > r13 0xff0000000000 280375465082880 And again. > r14 0x4093b0 4232112 > r15 0x167d620 23582240 > rip 0x4075c4 0x4075c4 <talloc_chunk_from_ptr+4> This must be the faulting address. > eflags 0x10206 [ PF IF RF ] > cs 0x33 51 > ss 0x2b 43 > ds 0x0 0 > es 0x0 0 > fs 0x0 0 > gs 0x0 0 > fctrl 0x0 0 > fstat 0x0 0 > ftag 0x0 0 > fiseg 0x0 0 > fioff 0x0 0 > foseg 0x0 0 > fooff 0x0 0 > fop 0x0 0 > mxcsr 0x0 [ ] > > (gdb) disassemble > Dump of assembler code for function talloc_chunk_from_ptr: > 0x00000000004075c0 <talloc_chunk_from_ptr+0>: sub $0x8,%rsp > 0x00000000004075c4 <talloc_chunk_from_ptr+4>: mov -0x8(%rdi),%edx This is the line corresponding to %rip above which is doing a read via % rdi, which is 0xff0000000000. It's reading tc->flags. It's been optimised, tc = pp - SIZE, so it is loading *(pp-SIZE+offsetof(flags)), which is pp-8 (flags is the last field in the struct). So rdi contains pp which == the ptr given as an argument to the function, so ptr was bogus. So it seems we really do have tdb->locked containing 0xff0000000000. This is only allocated in one place which is: tdb->locked = talloc_zero_array(tdb, struct tdb_lock_type, tdb->header.hash_size+1); midway through tdb_open_ex. It might be worth inserting a check+log for this returning 0xff, 0xff00, 0xff0000 ... 0xff0000000000 etc. > 0x00000000004075c7 <talloc_chunk_from_ptr+7>: lea -0x50(%rdi),%rax This is actually calculating tc, ready for return upon success. > 0x00000000004075cb <talloc_chunk_from_ptr+11>: mov %edx,%ecx > 0x00000000004075cd <talloc_chunk_from_ptr+13>: and > $0xfffffffffffffff0,%ecx > 0x00000000004075d0 <talloc_chunk_from_ptr+16>: cmp $0xe814ec70,%ecx > 0x00000000004075d6 <talloc_chunk_from_ptr+22>: jne 0x4075e2 > <talloc_chunk_from_ptr+34> (tc->flags & ~0xF) != TALLOC_MAGIC > 0x00000000004075d8 <talloc_chunk_from_ptr+24>: and $0x1,%edx > 0x00000000004075db <talloc_chunk_from_ptr+27>: jne 0x4075e2 > <talloc_chunk_from_ptr+34> tc->flags & TALLOC_FLAG_FREE > 0x00000000004075dd <talloc_chunk_from_ptr+29>: add $0x8,%rsp > 0x00000000004075e1 <talloc_chunk_from_ptr+33>: retq Success, return. > 0x00000000004075e2 <talloc_chunk_from_ptr+34>: nopw 0x0(%rax,%rax,1) > 0x00000000004075e8 <talloc_chunk_from_ptr+40>: callq 0x401b98 <abort@plt> The two TALLOC_ABORTS both end up here if the checks above fail. > > Can you also "p $_siginfo._sifields._sigfault.si_addr" (in frame 0). > > This ought to be the actual faulting address, which ought to give a hint > > on how much we can trust the parameters in the stack trace. > > Hmm, my gdb refused to access $_siginfo: > (gdb) show convenience > $_siginfo = Unable to read siginfo That's ok, I think I've convinced myself above what the crash is. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |