[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] Debugging XAPI daemon crash



Thanks Rob

I have generated a pull request as you had mentioned.  Let me know if you have 
any review comments.

-Ranjeet

-----Original Message-----
From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx] 
Sent: Thursday, March 27, 2014 3:16 AM
To: Ranjeet R; Dave Scott
Cc: xen-api@xxxxxxxxxxxxx
Subject: RE: [Xen-API] Debugging XAPI daemon crash

Hi Ranjeet,

That makes sense.

I guess the same would apply to the ifa_netmask field. The following lines 
would blow up if it is NULL:

        netmask = tmp->ifa_netmask;
        [...]
                netmaskstr = alloc_addr(netmask);

because alloc_addr will try to access netmask->sa_family.

So to be on the safe side, I think we should check for this as well. I think 
defensive coding is the right way (I hate segfaults)!

For the purpose of the stub_if_addr function, I think it is sufficient to wrap 
the existing if-block with "if (sock && netmask)". This assumes that we always 
want both the address and the netmask, and we ignore the interface if either is 
undefined.

The master branch for this code is here (since we split it off from 
xen-api-libs): 
https://github.com/xapi-project/netdev/blob/master/lib/addr_stubs.c. If you'd 
like to submit a pull request there (as well as keeping the fix in your 
development branch on clearwater), that would be great.

Thanks,
Rob

> -----Original Message-----
> From: Ranjeet R [mailto:rranjeet@xxxxxxxxxxx]
> Sent: 26 March 2014 11:08 PM
> To: Rob Hoes; Dave Scott
> Cc: xen-api@xxxxxxxxxxxxx
> Subject: RE: [Xen-API] Debugging XAPI daemon crash
> 
> Hello Rob/Dave
> 
> Thanks for the pointers. I figured out the issue. The reason my C stub 
> was able to list out all interfaces without crashing is -
> 
> if (getifaddrs(&ifaddr) == -1) {
>        print ("getifaddr failed");
>         exit(1);
> }
> 
>   struct ifaddrs *ifa = ifaddr;
>   for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
> --->>>>    if (ifa->ifa_addr != NULL) {    ------>> Check for ifa_addr
>       int family = ifa->ifa_addr->sa_family;
> 
> I was only looking into the ifaddrs structure only when the interface 
> addr is set.
> 
> In the stub_if_getaddr code, the code is as follows
> 
> ret = getifaddrs(&ifaddrs);
> if (ret < 0)
>       caml_failwith("cannot get interface address");
> 
> for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
>       sock = tmp->ifa_addr;  ------------------------------>Assigned here
>      netmask = tmp->ifa_netmask;
> 
>      if (sock->sa_family == AF_INET || sock->sa_family == AF_INET6)
> {     -------------> Dereferenced here without checking
>                       name = caml_copy_string(tmp->ifa_name); <snip>
> 
> In my case, there were two internal interfaces for which the interface 
> address was not setup and while iterating through the list, there was 
> a NULL pointer dereference.
> 
> It might look like defensive coding but can we ignore the interfaces 
> for which the ifa_addr is not set. I can open up a bug and fix it if 
> there is consenus that this needs to be fixed.
> 
> Ranjeet
> 
> 
> -----Original Message-----
> From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx]
> Sent: Wednesday, March 26, 2014 4:44 AM
> To: Ranjeet R; Dave Scott
> Cc: xen-api@xxxxxxxxxxxxx
> Subject: RE: [Xen-API] Debugging XAPI daemon crash
> 
> Hi Ranjeet,
> 
> > It seems to be crashing in the same point as you had mentioned. 
> > Please find the SEGV backtrace attached.
> >
> > (gdb) c
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x085bc2d6 in stub_if_getaddr ()
> >  (gdb) bt
> > #0  0x085cca90 in segv_handler ()
> > #1  <signal handler called>
> > #2  0x085bc2d6 in stub_if_getaddr ()
> > #3  0x0850ef8c in camlNetdev__get_all_ipv4_1325 ()
> >
> > You had mentioned that this could be because of a bad C function binding.
> > I wrote a small C stub to see whether it works for the xenbr0 
> > interface and it seems to be working fine. How should I verify the
> binding.
> 
> The function that is failing seems to be this one:
> https://github.com/xapi-project/xen-api-
> libs/blob/clearwater/netdev/addr_stubs.c#L74
> 
> It has:
> 
>     int ret;
>     struct ifaddrs *ifaddrs, *tmp;
>     [...]
>     ret = getifaddrs(&ifaddrs);
>     [...]
>     for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
>         sock = tmp->ifa_addr;
>         netmask = tmp->ifa_netmask;
>         [...]
> 
> Could it be that the getifaddrs function does not set ifaddrs correctly?
> You should be able to test this with a small C program. Or is this 
> what you have already done?
> 
> Cheers,
> Rob
> 
> > Appreciate your help.
> >
> > -Ranjeet
> >
> > -----Original Message-----
> > From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx]
> > Sent: Monday, March 24, 2014 3:46 AM
> > To: Ranjeet R
> > Cc: xen-api@xxxxxxxxxxxxx
> > Subject: Re: [Xen-API] Debugging XAPI daemon crash
> >
> > On 24/03/14 10:30, Ranjeet R wrote:
> > > Hello Dave
> > >
> > > The binaries did not have debug symbols but I managed to rebuild 
> > > the
> > binaries with debug enabled.
> >
> > Great.
> >
> > > I tried starting the xapi process as it was started in the init.d
> > scripts under gdb. However, in gdb, the xapi process forks another 
> > process and I am not able to debug it further (I tried setting 
> > detach_on_fork to off in gdb, but the primary process just goes to 
> > end
> of execution).
> > >
> > > I am using the following gdb command to debug
> > >
> > > gdb --args /usr/sbin/xapi -daemon -writeinitcomplete
> > /var/run/xapi_init_complete.cookie -writereadyfile 
> > /var/run/xapi_startup.cookie -onsystemboot"
> > >
> > > Can you please help me in the steps that you use in debugging the 
> > > XAPI
> > process.
> >
> > Ah, I think xapi forks a "watchdog" process near the start -- this 
> > is probably what you're seeing.
> >
> > Try adding a "-nowatchdog" option to the command-line.
> >
> > Dave
> >
> > >
> > > Thanks for your help,
> > >
> > > -Ranjeet
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Dave Scott [mailto:Dave.Scott@xxxxxxxxxx]
> > > Sent: Saturday, March 22, 2014 12:36 PM
> > > To: Ranjeet R
> > > Cc: xen-api@xxxxxxxxxxxxx
> > > Subject: Re: [Xen-API] Debugging XAPI daemon crash
> > >
> > > Hi,
> > >
> > > I suspect the segfault is being caused by a bad C function binding.
> > > I've
> > seen a similar crash before when querying an interface IP via 
> > getifaddrs (I think that was the function name) Could you run xapi 
> > in gdb and reproduce the crash? Printing the call stack would help 
> > to confirm this hypothesis. Provided the xapi binary still has debug 
> > symbols (ie hasn't been stripped) the ocaml functions (with fairly 
> > obvious mangled names) should also be on the stack too.
> > >
> > > Cheers,
> > > Dave
> > >
> > >> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@xxxxxxxxxxx>
> wrote:
> > >>
> > >> Hello all
> > >>
> > >> I am trying to bring a DevCloud setup which has an XCP Kronos 
> > >> based
> > XAPI daemon. I had changed the underlying network implementation (it 
> > is not a bridge, but an openvswitch-like network implementation) and 
> > the XAPI daemon crashes during bootup. Please find the XAPI logs below.
> > >>
> > >>
> > >> starting up database engine D:72969b3eaf8e|redo_log] Flushing 
> > >> database to all active redo-logs starting up database engine 
> > >> D:72969b3eaf8e|xapi] About to flush database: 
> > >> /var/lib/xcp/state.db starting up database engine 
> > >> D:72969b3eaf8e|redo_log] Flushing database to all active 
> > >> redo-logs starting up database engine D:72969b3eaf8e|xapi] 
> > >> Performing initial DB GC thread_zero|dbsync
> > >> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost 
> > >> dbsync
> > >> (update_env) D:fd0aec7399c9|dbsync] creating localhost
> > >>
> > >> dmesg logs seem to suggest that xapi is crashing during startup.
> > >>
> > >> [    9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30
> error
> > 4 in xapi[8048000+59f000]
> > >> [    9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450
> error
> > 4 in xapi[8048000+59f000]
> > >>
> > >> I looked the XAPI code to see where it fails and I don't see any 
> > >> logs after the following code point in ocaml / xapi / 
> > >> dbsync_slave.ml
> > >>
> > >> let create_localhost ~__context info =
> > >>    let ip = get_my_ip_addr ~__context in
> > >>
> > >> I confirmed to see that "ifconfig xenbr0" has a valid management 
> > >> IP
> > address and should not fail.
> > >>
> > >> How do I debug this crash further. Are there any ways to look at 
> > >> the
> > stack trace where XAPI crashed. Any pointers to debug this further 
> > will be very helpful.
> > >>
> > >> -Ranjeet
> > >>
> > >>
> > >> _______________________________________________
> > >> Xen-api mailing list
> > >> Xen-api@xxxxxxxxxxxxx
> > >> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> > >
> > >
> > >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Xen-api mailing list
> > Xen-api@xxxxxxxxxxxxx
> > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> 
> 





_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.