[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenstored crashes with SIGSEGV



2014-12-16 11:06 GMT+00:00 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> On Tue, 2014-12-16 at 10:45 +0000, Ian Campbell wrote:
>> On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
>> > > I notice in your bugzilla (for a different occurrence, I think):
>> > >> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip 
>> > >> 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000]
>> > >
>> > > Which appears to have faulted access 0xff000000000 too. It looks like
>> > > this process is a python thing, it's nothing to do with xenstored I
>> > > assume?
>> >
>> > Yes, that's one univention-config, which is completely independent of
>> > xen(stored).
>> >
>> > > It seems rather coincidental that it should be accessing the
>> > > same sort of address and be faulting.
>> >
>> > Yes, good catch. I'll have another look at those core dumps.
>>
>> With this in mind, please can you confirm what model of machines you've
>> seen this on, and in particular whether they are all the same class of
>> machine or whether they are significantly different.
>>
>> The reason being that randomly placed 0xff values in a field of 0x00
>> could possibly indicate hardware (e.g. a GPU) DMAing over the wrong
>> memory pages.
>
> Thanks for giving me access to the core files. This is very suspicious:
> (gdb) frame 2
> #2  0x000000000040a348 in tdb_open_ex (name=0x1941fb0 
> "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, 
> tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>,
>     log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at 
> tdb.c:1958
> 1958            SAFE_FREE(tdb->locked);
>
> (gdb) x/96x tdb
> 0x1921270:      0x00000000      0x00000000      0x00000000      0x00000000
> 0x1921280:      0x0000001f      0x000000ff      0x0000ff00      0x000000ff
> 0x1921290:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x19212a0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x19212b0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x19212c0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x19212d0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x19212e0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x19212f0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921300:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921310:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921320:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921330:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921340:      0x00000000      0x00000000      0x0000ff00      0x000000ff
> 0x1921350:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921360:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> 0x1921370:      0x004093b0      0x00000000      0x004092f0      0x00000000
> 0x1921380:      0x00000002      0x00000000      0x00000091      0x00000000
> 0x1921390:      0x0193de70      0x00000000      0x01963600      0x00000000
> 0x19213a0:      0x00000000      0x00000000      0x0193fbb0      0x00000000
> 0x19213b0:      0x00000000      0x00000000      0x00000000      0x00000000
> 0x19213c0:      0x00405870      0x00000000      0x0040e3e0      0x00000000
> 0x19213d0:      0x00000038      0x00000000      0xe814ec70      0x6f2f6567
> 0x19213e0:      0x01963650      0x00000000      0x0193dec0      0x00000000
>
> Something has clearly done a number on the ram of this process.
> 0x1921270 through 0x192136f is 256 bytes...
>
> Since it appears to be happening to other processes too I would hazard
> that this is not a xenstored issue.
>
> Ian.
>

Good catch Ian!

Strange corruption. Probably not related to xenstored as you
suggested. I would be curious to see what's before the tdb pointer and
where does the corruption starts. I also don't understand where the
"fd = 47" came from a previous mail. 0x1f is 31, not 47 (which is
0x2f).

I would not be surprised about a strange bug in libc or the kernel.

Frediano

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.