[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] Debugging XAPI daemon crash
Thanks! I have merged your patch. Cheers, Rob > -----Original Message----- > From: Ranjeet R [mailto:rranjeet@xxxxxxxxxxx] > Sent: 02 April 2014 4:14 AM > To: Rob Hoes; Dave Scott > Cc: xen-api@xxxxxxxxxxxxx > Subject: RE: [Xen-API] Debugging XAPI daemon crash > > Thanks Rob > > I have generated a pull request as you had mentioned. Let me know if you > have any review comments. > > -Ranjeet > > -----Original Message----- > From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx] > Sent: Thursday, March 27, 2014 3:16 AM > To: Ranjeet R; Dave Scott > Cc: xen-api@xxxxxxxxxxxxx > Subject: RE: [Xen-API] Debugging XAPI daemon crash > > Hi Ranjeet, > > That makes sense. > > I guess the same would apply to the ifa_netmask field. The following lines > would blow up if it is NULL: > > netmask = tmp->ifa_netmask; > [...] > netmaskstr = alloc_addr(netmask); > > because alloc_addr will try to access netmask->sa_family. > > So to be on the safe side, I think we should check for this as well. I > think defensive coding is the right way (I hate segfaults)! > > For the purpose of the stub_if_addr function, I think it is sufficient to > wrap the existing if-block with "if (sock && netmask)". This assumes that > we always want both the address and the netmask, and we ignore the > interface if either is undefined. > > The master branch for this code is here (since we split it off from xen- > api-libs): https://github.com/xapi- > project/netdev/blob/master/lib/addr_stubs.c. If you'd like to submit a > pull request there (as well as keeping the fix in your development branch > on clearwater), that would be great. > > Thanks, > Rob > > > -----Original Message----- > > From: Ranjeet R [mailto:rranjeet@xxxxxxxxxxx] > > Sent: 26 March 2014 11:08 PM > > To: Rob Hoes; Dave Scott > > Cc: xen-api@xxxxxxxxxxxxx > > Subject: RE: [Xen-API] Debugging XAPI daemon crash > > > > Hello Rob/Dave > > > > Thanks for the pointers. I figured out the issue. The reason my C stub > > was able to list out all interfaces without crashing is - > > > > if (getifaddrs(&ifaddr) == -1) { > > print ("getifaddr failed"); > > exit(1); > > } > > > > struct ifaddrs *ifa = ifaddr; > > for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) { > > --->>>> if (ifa->ifa_addr != NULL) { ------>> Check for ifa_addr > > int family = ifa->ifa_addr->sa_family; > > > > I was only looking into the ifaddrs structure only when the interface > > addr is set. > > > > In the stub_if_getaddr code, the code is as follows > > > > ret = getifaddrs(&ifaddrs); > > if (ret < 0) > > caml_failwith("cannot get interface address"); > > > > for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) { > > sock = tmp->ifa_addr; ------------------------------>Assigned > here > > netmask = tmp->ifa_netmask; > > > > if (sock->sa_family == AF_INET || sock->sa_family == AF_INET6) > > { -------------> Dereferenced here without checking > > name = caml_copy_string(tmp->ifa_name); <snip> > > > > In my case, there were two internal interfaces for which the interface > > address was not setup and while iterating through the list, there was > > a NULL pointer dereference. > > > > It might look like defensive coding but can we ignore the interfaces > > for which the ifa_addr is not set. I can open up a bug and fix it if > > there is consenus that this needs to be fixed. > > > > Ranjeet > > > > > > -----Original Message----- > > From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx] > > Sent: Wednesday, March 26, 2014 4:44 AM > > To: Ranjeet R; Dave Scott > > Cc: xen-api@xxxxxxxxxxxxx > > Subject: RE: [Xen-API] Debugging XAPI daemon crash > > > > Hi Ranjeet, > > > > > It seems to be crashing in the same point as you had mentioned. > > > Please find the SEGV backtrace attached. > > > > > > (gdb) c > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x085bc2d6 in stub_if_getaddr () > > > (gdb) bt > > > #0 0x085cca90 in segv_handler () > > > #1 <signal handler called> > > > #2 0x085bc2d6 in stub_if_getaddr () > > > #3 0x0850ef8c in camlNetdev__get_all_ipv4_1325 () > > > > > > You had mentioned that this could be because of a bad C function > binding. > > > I wrote a small C stub to see whether it works for the xenbr0 > > > interface and it seems to be working fine. How should I verify the > > binding. > > > > The function that is failing seems to be this one: > > https://github.com/xapi-project/xen-api- > > libs/blob/clearwater/netdev/addr_stubs.c#L74 > > > > It has: > > > > int ret; > > struct ifaddrs *ifaddrs, *tmp; > > [...] > > ret = getifaddrs(&ifaddrs); > > [...] > > for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) { > > sock = tmp->ifa_addr; > > netmask = tmp->ifa_netmask; > > [...] > > > > Could it be that the getifaddrs function does not set ifaddrs correctly? > > You should be able to test this with a small C program. Or is this > > what you have already done? > > > > Cheers, > > Rob > > > > > Appreciate your help. > > > > > > -Ranjeet > > > > > > -----Original Message----- > > > From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx] > > > Sent: Monday, March 24, 2014 3:46 AM > > > To: Ranjeet R > > > Cc: xen-api@xxxxxxxxxxxxx > > > Subject: Re: [Xen-API] Debugging XAPI daemon crash > > > > > > On 24/03/14 10:30, Ranjeet R wrote: > > > > Hello Dave > > > > > > > > The binaries did not have debug symbols but I managed to rebuild > > > > the > > > binaries with debug enabled. > > > > > > Great. > > > > > > > I tried starting the xapi process as it was started in the init.d > > > scripts under gdb. However, in gdb, the xapi process forks another > > > process and I am not able to debug it further (I tried setting > > > detach_on_fork to off in gdb, but the primary process just goes to > > > end > > of execution). > > > > > > > > I am using the following gdb command to debug > > > > > > > > gdb --args /usr/sbin/xapi -daemon -writeinitcomplete > > > /var/run/xapi_init_complete.cookie -writereadyfile > > > /var/run/xapi_startup.cookie -onsystemboot" > > > > > > > > Can you please help me in the steps that you use in debugging the > > > > XAPI > > > process. > > > > > > Ah, I think xapi forks a "watchdog" process near the start -- this > > > is probably what you're seeing. > > > > > > Try adding a "-nowatchdog" option to the command-line. > > > > > > Dave > > > > > > > > > > > Thanks for your help, > > > > > > > > -Ranjeet > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Dave Scott [mailto:Dave.Scott@xxxxxxxxxx] > > > > Sent: Saturday, March 22, 2014 12:36 PM > > > > To: Ranjeet R > > > > Cc: xen-api@xxxxxxxxxxxxx > > > > Subject: Re: [Xen-API] Debugging XAPI daemon crash > > > > > > > > Hi, > > > > > > > > I suspect the segfault is being caused by a bad C function binding. > > > > I've > > > seen a similar crash before when querying an interface IP via > > > getifaddrs (I think that was the function name) Could you run xapi > > > in gdb and reproduce the crash? Printing the call stack would help > > > to confirm this hypothesis. Provided the xapi binary still has debug > > > symbols (ie hasn't been stripped) the ocaml functions (with fairly > > > obvious mangled names) should also be on the stack too. > > > > > > > > Cheers, > > > > Dave > > > > > > > >> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@xxxxxxxxxxx> > > wrote: > > > >> > > > >> Hello all > > > >> > > > >> I am trying to bring a DevCloud setup which has an XCP Kronos > > > >> based > > > XAPI daemon. I had changed the underlying network implementation (it > > > is not a bridge, but an openvswitch-like network implementation) and > > > the XAPI daemon crashes during bootup. Please find the XAPI logs below. > > > >> > > > >> > > > >> starting up database engine D:72969b3eaf8e|redo_log] Flushing > > > >> database to all active redo-logs starting up database engine > > > >> D:72969b3eaf8e|xapi] About to flush database: > > > >> /var/lib/xcp/state.db starting up database engine > > > >> D:72969b3eaf8e|redo_log] Flushing database to all active > > > >> redo-logs starting up database engine D:72969b3eaf8e|xapi] > > > >> Performing initial DB GC thread_zero|dbsync > > > >> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost > > > >> dbsync > > > >> (update_env) D:fd0aec7399c9|dbsync] creating localhost > > > >> > > > >> dmesg logs seem to suggest that xapi is crashing during startup. > > > >> > > > >> [ 9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30 > > error > > > 4 in xapi[8048000+59f000] > > > >> [ 9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450 > > error > > > 4 in xapi[8048000+59f000] > > > >> > > > >> I looked the XAPI code to see where it fails and I don't see any > > > >> logs after the following code point in ocaml / xapi / > > > >> dbsync_slave.ml > > > >> > > > >> let create_localhost ~__context info = > > > >> let ip = get_my_ip_addr ~__context in > > > >> > > > >> I confirmed to see that "ifconfig xenbr0" has a valid management > > > >> IP > > > address and should not fail. > > > >> > > > >> How do I debug this crash further. Are there any ways to look at > > > >> the > > > stack trace where XAPI crashed. Any pointers to debug this further > > > will be very helpful. > > > >> > > > >> -Ranjeet > > > >> > > > >> > > > >> _______________________________________________ > > > >> Xen-api mailing list > > > >> Xen-api@xxxxxxxxxxxxx > > > >> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Xen-api mailing list > > > Xen-api@xxxxxxxxxxxxx > > > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api > > > > > > > _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |