[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] Debugging XAPI daemon crash



Hello Rob/Dave

Thanks for the pointers. I figured out the issue. The reason my C stub was able 
to list out all interfaces without crashing is - 

if (getifaddrs(&ifaddr) == -1) {
       print ("getifaddr failed");
        exit(1);
}

  struct ifaddrs *ifa = ifaddr;
  for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
--->>>>    if (ifa->ifa_addr != NULL) {    ------>> Check for ifa_addr
      int family = ifa->ifa_addr->sa_family;

I was only looking into the ifaddrs structure only when the interface addr is 
set.

In the stub_if_getaddr code, the code is as follows

ret = getifaddrs(&ifaddrs);
if (ret < 0)
      caml_failwith("cannot get interface address");

for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
      sock = tmp->ifa_addr;  ------------------------------>Assigned here
     netmask = tmp->ifa_netmask;

     if (sock->sa_family == AF_INET || sock->sa_family == AF_INET6) {     
-------------> Dereferenced here without checking
                      name = caml_copy_string(tmp->ifa_name);
<snip>

In my case, there were two internal interfaces for which the interface address 
was not setup and while iterating through the list, there was a NULL pointer 
dereference. 

It might look like defensive coding but can we ignore the interfaces for which 
the ifa_addr is not set. I can open up a bug and fix it if there is consenus 
that this needs to be fixed.

Ranjeet


-----Original Message-----
From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx] 
Sent: Wednesday, March 26, 2014 4:44 AM
To: Ranjeet R; Dave Scott
Cc: xen-api@xxxxxxxxxxxxx
Subject: RE: [Xen-API] Debugging XAPI daemon crash

Hi Ranjeet,

> It seems to be crashing in the same point as you had mentioned. Please 
> find the SEGV backtrace attached.
> 
> (gdb) c
> Program received signal SIGSEGV, Segmentation fault.
> 0x085bc2d6 in stub_if_getaddr ()
>  (gdb) bt
> #0  0x085cca90 in segv_handler ()
> #1  <signal handler called>
> #2  0x085bc2d6 in stub_if_getaddr ()
> #3  0x0850ef8c in camlNetdev__get_all_ipv4_1325 ()
> 
> You had mentioned that this could be because of a bad C function binding.
> I wrote a small C stub to see whether it works for the xenbr0 
> interface and it seems to be working fine. How should I verify the binding.

The function that is failing seems to be this one: 
https://github.com/xapi-project/xen-api-libs/blob/clearwater/netdev/addr_stubs.c#L74

It has:

    int ret;
    struct ifaddrs *ifaddrs, *tmp;
    [...]
    ret = getifaddrs(&ifaddrs);
    [...]
    for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
        sock = tmp->ifa_addr;
        netmask = tmp->ifa_netmask;
        [...]

Could it be that the getifaddrs function does not set ifaddrs correctly? You 
should be able to test this with a small C program. Or is this what you have 
already done?

Cheers,
Rob

> Appreciate your help.
> 
> -Ranjeet
> 
> -----Original Message-----
> From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx]
> Sent: Monday, March 24, 2014 3:46 AM
> To: Ranjeet R
> Cc: xen-api@xxxxxxxxxxxxx
> Subject: Re: [Xen-API] Debugging XAPI daemon crash
> 
> On 24/03/14 10:30, Ranjeet R wrote:
> > Hello Dave
> >
> > The binaries did not have debug symbols but I managed to rebuild the
> binaries with debug enabled.
> 
> Great.
> 
> > I tried starting the xapi process as it was started in the init.d
> scripts under gdb. However, in gdb, the xapi process forks another 
> process and I am not able to debug it further (I tried setting 
> detach_on_fork to off in gdb, but the primary process just goes to end of 
> execution).
> >
> > I am using the following gdb command to debug
> >
> > gdb --args /usr/sbin/xapi -daemon -writeinitcomplete
> /var/run/xapi_init_complete.cookie -writereadyfile 
> /var/run/xapi_startup.cookie -onsystemboot"
> >
> > Can you please help me in the steps that you use in debugging the 
> > XAPI
> process.
> 
> Ah, I think xapi forks a "watchdog" process near the start -- this is 
> probably what you're seeing.
> 
> Try adding a "-nowatchdog" option to the command-line.
> 
> Dave
> 
> >
> > Thanks for your help,
> >
> > -Ranjeet
> >
> >
> >
> > -----Original Message-----
> > From: Dave Scott [mailto:Dave.Scott@xxxxxxxxxx]
> > Sent: Saturday, March 22, 2014 12:36 PM
> > To: Ranjeet R
> > Cc: xen-api@xxxxxxxxxxxxx
> > Subject: Re: [Xen-API] Debugging XAPI daemon crash
> >
> > Hi,
> >
> > I suspect the segfault is being caused by a bad C function binding. 
> > I've
> seen a similar crash before when querying an interface IP via 
> getifaddrs (I think that was the function name) Could you run xapi in 
> gdb and reproduce the crash? Printing the call stack would help to 
> confirm this hypothesis. Provided the xapi binary still has debug 
> symbols (ie hasn't been stripped) the ocaml functions (with fairly 
> obvious mangled names) should also be on the stack too.
> >
> > Cheers,
> > Dave
> >
> >> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@xxxxxxxxxxx> wrote:
> >>
> >> Hello all
> >>
> >> I am trying to bring a DevCloud setup which has an XCP Kronos based
> XAPI daemon. I had changed the underlying network implementation (it 
> is not a bridge, but an openvswitch-like network implementation) and 
> the XAPI daemon crashes during bootup. Please find the XAPI logs below.
> >>
> >>
> >> starting up database engine D:72969b3eaf8e|redo_log] Flushing 
> >> database to all active redo-logs starting up database engine 
> >> D:72969b3eaf8e|xapi] About to flush database: /var/lib/xcp/state.db 
> >> starting up database engine D:72969b3eaf8e|redo_log] Flushing 
> >> database to all active redo-logs starting up database engine 
> >> D:72969b3eaf8e|xapi] Performing initial DB GC thread_zero|dbsync
> >> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost 
> >> dbsync
> >> (update_env) D:fd0aec7399c9|dbsync] creating localhost
> >>
> >> dmesg logs seem to suggest that xapi is crashing during startup.
> >>
> >> [    9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30 error
> 4 in xapi[8048000+59f000]
> >> [    9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450 error
> 4 in xapi[8048000+59f000]
> >>
> >> I looked the XAPI code to see where it fails and I don't see any 
> >> logs after the following code point in ocaml / xapi / 
> >> dbsync_slave.ml
> >>
> >> let create_localhost ~__context info =
> >>    let ip = get_my_ip_addr ~__context in
> >>
> >> I confirmed to see that "ifconfig xenbr0" has a valid management IP
> address and should not fail.
> >>
> >> How do I debug this crash further. Are there any ways to look at 
> >> the
> stack trace where XAPI crashed. Any pointers to debug this further 
> will be very helpful.
> >>
> >> -Ranjeet
> >>
> >>
> >> _______________________________________________
> >> Xen-api mailing list
> >> Xen-api@xxxxxxxxxxxxx
> >> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> >
> >
> >
> 
> 
> 
> 
> 
> _______________________________________________
> Xen-api mailing list
> Xen-api@xxxxxxxxxxxxx
> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api




_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.