[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xenstore: set READ_THREAD_STACKSIZE to a sane value



On Tue, 2014-03-11 at 16:55 +0100, Roger Pau Monnà wrote:
> On 11/03/14 15:12, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 14:52 +0100, Roger Pau Monnà wrote:
> >> On 11/03/14 14:24, Ian Campbell wrote:
> >>> On Mon, 2014-03-10 at 17:12 +0000, Ian Jackson wrote:
> >>>> Roger Pau Monne writes ("[PATCH] xenstore: set READ_THREAD_STACKSIZE to 
> >>>> a sane value"):
> >>>>> On FreeBSD PTHREAD_STACK_MIN is 2048 by default, which is obviously
> >>>>> too low.
> > 
> > It occurs to me that 2048 is < PAGE_SIZE. Which makes this seem like an
> > interesting choice of stack min, especially combined with the fact that
> > the failure seems to involve malloc.
> > 
> > Perhaps the stack is malloc'd (rather than coming from brk or an anon
> > mmap), so overrunning would cause heap corruption which seems to be what
> > you are seeing.
> > 
> >>> How does this manifest itself? (I suppose this may be answered as part
> >>> of answering Ian J)
> >>
> >> Yes, I'm still looking into it, this gdb output:
> >>
> >> Starting program: /usr/local/bin/xenstore-watch /foo
> >> [New LWP 100169]
> >> [New Thread 801406800 (LWP 100182/xenstore-watch)]
> >>
> >> Program received signal SIGSEGV, Segmentation fault.
> >> [Switching to Thread 801406800 (LWP 100182/xenstore-watch)]
> >> 0x0000000800ac1258 in sbrk () from /lib/libc.so.7
> >> (gdb) bt
> >> #0  0x0000000800ac1258 in sbrk () from /lib/libc.so.7
> >> #1  0x0000000800ac110e in sbrk () from /lib/libc.so.7
> >> #2  0x0000000800ac9ee8 in sbrk () from /lib/libc.so.7
> >> #3  0x0000000800ac456b in sbrk () from /lib/libc.so.7
> >> #4  0x0000000800ac447d in sbrk () from /lib/libc.so.7
> >> #5  0x0000000800aaf6ce in syscall () from /lib/libc.so.7
> >> #6  0x0000000800acb37b in malloc () from /lib/libc.so.7
> >> #7  0x00000008008202b9 in read_message (h=0x801417080, nonblocking=0) at 
> >> xs.c:313
> >> #8  0x0000000800820a06 in read_thread (arg=0x801417080) at xs.c:313
> >> #9  0x0000000800dc64a4 in pthread_create () from /lib/libthr.so.3
> >> #10 0x0000000000000000 in ?? ()
> > 
> > Does 
> > frame 1 ; print $sp
> > frame 2 ; print $sp
> > etc
> > tell you anything useful about the stack usage at each level?
> 
> Thanks, I've been able to get the stack pointer at each frame, here are
> the results (from frame 0 to frame 10):
> 
> 0x7fffffbfcff0

<-PAGE BOUNDARY HERE

Hence the segfault I expct...

> 0x7fffffbfd0a0
> 0x7fffffbfd0e0
> 0x7fffffbfd120
> 0x7fffffbfd160
> 0x7fffffbfd1a0
> 0x7fffffbfd1e0
> 0x7fffffbfd6a0
> 0x7fffffbfd7a0
> 0x7fffffbfd7c0
> 0x7fffffbfd800
> 
> Doing:
> 
> 0x7fffffbfd800 - 0x7fffffbfcff0 = 0x810
> 
> Which is 2064 in decimal. The biggest culprit seems to be malloc, which
> is using 1216 bytes of the stack.

Wow!

http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/stdlib/malloc.c?rev=1.54.10.1&content-type=text/x-cvsweb-markup
 I suppose? malloc itself looks fairly small, but there's a lot of inlining in 
that function... I don't see any large on stack allocations (e.g. arrays) but I 
suppose it all adds up.

> >> I've also tried to debug it using valgrind,
> > 
> > Under BSD? Did someone wire up the dom0 OS specific bit? If so: Neat!
> 
> No, I don't think anyone has wired the Dom0 specific bits, maybe they
> don't show up because this is just the xenstore client, which is not
> using any ioctls?

Oh yes, that makes sense, you'd be using the Unix domain socket.

> >>  and here's what I got:
> >>
> >> [root@loki ~/xen/xen]# valgrind xenstore-watch /foo
> >> ==1901== Memcheck, a memory error detector
> >> ==1901== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
> >> ==1901== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
> >> ==1901== Command: xenstore-watch /foo
> >> ==1901==
> >> ==1901== Syscall param socketcall.connect(serv_addr..sa_len) points to 
> >> uninitialised byte(s)
> >> ==1901==    at 0x152A14A: connect (in /lib/libc.so.7)
> >> ==1901==    by 0x1210B46: get_handle (xs.c:205)
> >> ==1901==    by 0x1210CEC: xs_open (xs.c:297)
> >> ==1901==    by 0x4027B1: main (xenstore_client.c:635)
> >> ==1901==  Address 0x7ff000a70 is on thread 1's stack
> >> ==1901==
> >> /foo
> >>
> >> Strangely enough, when running under valgrind it doesn't segfault,
> > 
> > valgrind interposes it's own malloc and stuff which will change
> > behaviour, and I wouldn't be all that surprised if it were gettings its
> > fingers into some of the pthread stuff too.
> > 
> >>  and 
> >> I'm still trying to figure out why valgrind complains.
> > 
> > It seems to be an unrelated issue though?
> 
> I think so, it seems like valgrind doesn't really like the cast done in
> connect from sockaddr_un to sockaddr.

Not all that surprising I guess, it's a bit of an odd interface!

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.