Xen project Mailing List

Re: [Xen-devel] [PATCH] xenstore: set READ_THREAD_STACKSIZE to a sane value

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Tue, 11 Mar 2014 16:03:44 +0000

Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Tue, 11 Mar 2014 16:06:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, 2014-03-11 at 16:55 +0100, Roger Pau MonnÃ wrote: > On 11/03/14 15:12, Ian Campbell wrote: > > On Tue, 2014-03-11 at 14:52 +0100, Roger Pau MonnÃ wrote: > >> On 11/03/14 14:24, Ian Campbell wrote: > >>> On Mon, 2014-03-10 at 17:12 +0000, Ian Jackson wrote: > >>>> Roger Pau Monne writes ("[PATCH] xenstore: set READ_THREAD_STACKSIZE to > >>>> a sane value"): > >>>>> On FreeBSD PTHREAD_STACK_MIN is 2048 by default, which is obviously > >>>>> too low. > > > > It occurs to me that 2048 is < PAGE_SIZE. Which makes this seem like an > > interesting choice of stack min, especially combined with the fact that > > the failure seems to involve malloc. > > > > Perhaps the stack is malloc'd (rather than coming from brk or an anon > > mmap), so overrunning would cause heap corruption which seems to be what > > you are seeing. > > > >>> How does this manifest itself? (I suppose this may be answered as part > >>> of answering Ian J) > >> > >> Yes, I'm still looking into it, this gdb output: > >> > >> Starting program: /usr/local/bin/xenstore-watch /foo > >> [New LWP 100169] > >> [New Thread 801406800 (LWP 100182/xenstore-watch)] > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> [Switching to Thread 801406800 (LWP 100182/xenstore-watch)] > >> 0x0000000800ac1258 in sbrk () from /lib/libc.so.7 > >> (gdb) bt > >> #0 0x0000000800ac1258 in sbrk () from /lib/libc.so.7 > >> #1 0x0000000800ac110e in sbrk () from /lib/libc.so.7 > >> #2 0x0000000800ac9ee8 in sbrk () from /lib/libc.so.7 > >> #3 0x0000000800ac456b in sbrk () from /lib/libc.so.7 > >> #4 0x0000000800ac447d in sbrk () from /lib/libc.so.7 > >> #5 0x0000000800aaf6ce in syscall () from /lib/libc.so.7 > >> #6 0x0000000800acb37b in malloc () from /lib/libc.so.7 > >> #7 0x00000008008202b9 in read_message (h=0x801417080, nonblocking=0) at > >> xs.c:313 > >> #8 0x0000000800820a06 in read_thread (arg=0x801417080) at xs.c:313 > >> #9 0x0000000800dc64a4 in pthread_create () from /lib/libthr.so.3 > >> #10 0x0000000000000000 in ?? () > > > > Does > > frame 1 ; print $sp > > frame 2 ; print $sp > > etc > > tell you anything useful about the stack usage at each level? > > Thanks, I've been able to get the stack pointer at each frame, here are > the results (from frame 0 to frame 10): > > 0x7fffffbfcff0 <-PAGE BOUNDARY HERE Hence the segfault I expct... > 0x7fffffbfd0a0 > 0x7fffffbfd0e0 > 0x7fffffbfd120 > 0x7fffffbfd160 > 0x7fffffbfd1a0 > 0x7fffffbfd1e0 > 0x7fffffbfd6a0 > 0x7fffffbfd7a0 > 0x7fffffbfd7c0 > 0x7fffffbfd800 > > Doing: > > 0x7fffffbfd800 - 0x7fffffbfcff0 = 0x810 > > Which is 2064 in decimal. The biggest culprit seems to be malloc, which > is using 1216 bytes of the stack. Wow! http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/stdlib/malloc.c?rev=1.54.10.1&content-type=text/x-cvsweb-markup I suppose? malloc itself looks fairly small, but there's a lot of inlining in that function... I don't see any large on stack allocations (e.g. arrays) but I suppose it all adds up. > >> I've also tried to debug it using valgrind, > > > > Under BSD? Did someone wire up the dom0 OS specific bit? If so: Neat! > > No, I don't think anyone has wired the Dom0 specific bits, maybe they > don't show up because this is just the xenstore client, which is not > using any ioctls? Oh yes, that makes sense, you'd be using the Unix domain socket. > >> and here's what I got: > >> > >> [root@loki ~/xen/xen]# valgrind xenstore-watch /foo > >> ==1901== Memcheck, a memory error detector > >> ==1901== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. > >> ==1901== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info > >> ==1901== Command: xenstore-watch /foo > >> ==1901== > >> ==1901== Syscall param socketcall.connect(serv_addr..sa_len) points to > >> uninitialised byte(s) > >> ==1901== at 0x152A14A: connect (in /lib/libc.so.7) > >> ==1901== by 0x1210B46: get_handle (xs.c:205) > >> ==1901== by 0x1210CEC: xs_open (xs.c:297) > >> ==1901== by 0x4027B1: main (xenstore_client.c:635) > >> ==1901== Address 0x7ff000a70 is on thread 1's stack > >> ==1901== > >> /foo > >> > >> Strangely enough, when running under valgrind it doesn't segfault, > > > > valgrind interposes it's own malloc and stuff which will change > > behaviour, and I wouldn't be all that surprised if it were gettings its > > fingers into some of the pthread stuff too. > > > >> and > >> I'm still trying to figure out why valgrind complains. > > > > It seems to be an unrelated issue though? > > I think so, it seems like valgrind doesn't really like the cast done in > connect from sockaddr_un to sockaddr. Not all that surprising I guess, it's a bit of an odd interface! Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.