[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] Debugging XAPI daemon crash
Hello Rob/Dave Thanks for the pointers. I figured out the issue. The reason my C stub was able to list out all interfaces without crashing is - if (getifaddrs(&ifaddr) == -1) { print ("getifaddr failed"); exit(1); } struct ifaddrs *ifa = ifaddr; for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) { --->>>> if (ifa->ifa_addr != NULL) { ------>> Check for ifa_addr int family = ifa->ifa_addr->sa_family; I was only looking into the ifaddrs structure only when the interface addr is set. In the stub_if_getaddr code, the code is as follows ret = getifaddrs(&ifaddrs); if (ret < 0) caml_failwith("cannot get interface address"); for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) { sock = tmp->ifa_addr; ------------------------------>Assigned here netmask = tmp->ifa_netmask; if (sock->sa_family == AF_INET || sock->sa_family == AF_INET6) { -------------> Dereferenced here without checking name = caml_copy_string(tmp->ifa_name); <snip> In my case, there were two internal interfaces for which the interface address was not setup and while iterating through the list, there was a NULL pointer dereference. It might look like defensive coding but can we ignore the interfaces for which the ifa_addr is not set. I can open up a bug and fix it if there is consenus that this needs to be fixed. Ranjeet -----Original Message----- From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx] Sent: Wednesday, March 26, 2014 4:44 AM To: Ranjeet R; Dave Scott Cc: xen-api@xxxxxxxxxxxxx Subject: RE: [Xen-API] Debugging XAPI daemon crash Hi Ranjeet, > It seems to be crashing in the same point as you had mentioned. Please > find the SEGV backtrace attached. > > (gdb) c > Program received signal SIGSEGV, Segmentation fault. > 0x085bc2d6 in stub_if_getaddr () > (gdb) bt > #0 0x085cca90 in segv_handler () > #1 <signal handler called> > #2 0x085bc2d6 in stub_if_getaddr () > #3 0x0850ef8c in camlNetdev__get_all_ipv4_1325 () > > You had mentioned that this could be because of a bad C function binding. > I wrote a small C stub to see whether it works for the xenbr0 > interface and it seems to be working fine. How should I verify the binding. The function that is failing seems to be this one: https://github.com/xapi-project/xen-api-libs/blob/clearwater/netdev/addr_stubs.c#L74 It has: int ret; struct ifaddrs *ifaddrs, *tmp; [...] ret = getifaddrs(&ifaddrs); [...] for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) { sock = tmp->ifa_addr; netmask = tmp->ifa_netmask; [...] Could it be that the getifaddrs function does not set ifaddrs correctly? You should be able to test this with a small C program. Or is this what you have already done? Cheers, Rob > Appreciate your help. > > -Ranjeet > > -----Original Message----- > From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx] > Sent: Monday, March 24, 2014 3:46 AM > To: Ranjeet R > Cc: xen-api@xxxxxxxxxxxxx > Subject: Re: [Xen-API] Debugging XAPI daemon crash > > On 24/03/14 10:30, Ranjeet R wrote: > > Hello Dave > > > > The binaries did not have debug symbols but I managed to rebuild the > binaries with debug enabled. > > Great. > > > I tried starting the xapi process as it was started in the init.d > scripts under gdb. However, in gdb, the xapi process forks another > process and I am not able to debug it further (I tried setting > detach_on_fork to off in gdb, but the primary process just goes to end of > execution). > > > > I am using the following gdb command to debug > > > > gdb --args /usr/sbin/xapi -daemon -writeinitcomplete > /var/run/xapi_init_complete.cookie -writereadyfile > /var/run/xapi_startup.cookie -onsystemboot" > > > > Can you please help me in the steps that you use in debugging the > > XAPI > process. > > Ah, I think xapi forks a "watchdog" process near the start -- this is > probably what you're seeing. > > Try adding a "-nowatchdog" option to the command-line. > > Dave > > > > > Thanks for your help, > > > > -Ranjeet > > > > > > > > -----Original Message----- > > From: Dave Scott [mailto:Dave.Scott@xxxxxxxxxx] > > Sent: Saturday, March 22, 2014 12:36 PM > > To: Ranjeet R > > Cc: xen-api@xxxxxxxxxxxxx > > Subject: Re: [Xen-API] Debugging XAPI daemon crash > > > > Hi, > > > > I suspect the segfault is being caused by a bad C function binding. > > I've > seen a similar crash before when querying an interface IP via > getifaddrs (I think that was the function name) Could you run xapi in > gdb and reproduce the crash? Printing the call stack would help to > confirm this hypothesis. Provided the xapi binary still has debug > symbols (ie hasn't been stripped) the ocaml functions (with fairly > obvious mangled names) should also be on the stack too. > > > > Cheers, > > Dave > > > >> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@xxxxxxxxxxx> wrote: > >> > >> Hello all > >> > >> I am trying to bring a DevCloud setup which has an XCP Kronos based > XAPI daemon. I had changed the underlying network implementation (it > is not a bridge, but an openvswitch-like network implementation) and > the XAPI daemon crashes during bootup. Please find the XAPI logs below. > >> > >> > >> starting up database engine D:72969b3eaf8e|redo_log] Flushing > >> database to all active redo-logs starting up database engine > >> D:72969b3eaf8e|xapi] About to flush database: /var/lib/xcp/state.db > >> starting up database engine D:72969b3eaf8e|redo_log] Flushing > >> database to all active redo-logs starting up database engine > >> D:72969b3eaf8e|xapi] Performing initial DB GC thread_zero|dbsync > >> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost > >> dbsync > >> (update_env) D:fd0aec7399c9|dbsync] creating localhost > >> > >> dmesg logs seem to suggest that xapi is crashing during startup. > >> > >> [ 9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30 error > 4 in xapi[8048000+59f000] > >> [ 9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450 error > 4 in xapi[8048000+59f000] > >> > >> I looked the XAPI code to see where it fails and I don't see any > >> logs after the following code point in ocaml / xapi / > >> dbsync_slave.ml > >> > >> let create_localhost ~__context info = > >> let ip = get_my_ip_addr ~__context in > >> > >> I confirmed to see that "ifconfig xenbr0" has a valid management IP > address and should not fail. > >> > >> How do I debug this crash further. Are there any ways to look at > >> the > stack trace where XAPI crashed. Any pointers to debug this further > will be very helpful. > >> > >> -Ranjeet > >> > >> > >> _______________________________________________ > >> Xen-api mailing list > >> Xen-api@xxxxxxxxxxxxx > >> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api > > > > > > > > > > > > _______________________________________________ > Xen-api mailing list > Xen-api@xxxxxxxxxxxxx > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |