[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] Debugging XAPI daemon crash


  • To: David Scott <dave.scott@xxxxxxxxxxxxx>
  • From: Ranjeet R <rranjeet@xxxxxxxxxxx>
  • Date: Wed, 26 Mar 2014 04:10:46 +0000
  • Accept-language: en-US
  • Cc: "xen-api@xxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxx>
  • Delivery-date: Wed, 26 Mar 2014 04:11:16 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>
  • Thread-index: Ac9FeZIEEGyvK3fqS5+yO9kiSTP+fwAjGnCAADsSOKAAFwBFAABOHF9w
  • Thread-topic: [Xen-API] Debugging XAPI daemon crash

Hello Dave

I was able to attach the debugger only when I executed xapi without any 
arguments.  The backtrace looks like this - 

  0x080693e0 in camlDbsync_slave__create_localhost_3057 ()
(gdb) bt
#0  0x080693e0 in camlDbsync_slave__create_localhost_3057 ()
#1  0x0806a16c in camlDbsync_slave__update_env_4937 ()
#2  0xb7c99f78 in ?? ()
(gdb) n
0x080693a0 in camlDbsync_slave__get_my_ip_addr_3055 ()

It seems to be crashing in the same point as you had mentioned. Please find the 
SEGV backtrace attached.

(gdb) c
Program received signal SIGSEGV, Segmentation fault.
0x085bc2d6 in stub_if_getaddr ()
 (gdb) bt
#0  0x085cca90 in segv_handler ()
#1  <signal handler called>
#2  0x085bc2d6 in stub_if_getaddr ()
#3  0x0850ef8c in camlNetdev__get_all_ipv4_1325 ()

You had mentioned that this could be because of a bad C function binding. I 
wrote a small C stub to see whether it works for the xenbr0 interface and it 
seems to be working fine. How should I verify the binding.

Appreciate your help.

-Ranjeet

-----Original Message-----
From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx] 
Sent: Monday, March 24, 2014 3:46 AM
To: Ranjeet R
Cc: xen-api@xxxxxxxxxxxxx
Subject: Re: [Xen-API] Debugging XAPI daemon crash

On 24/03/14 10:30, Ranjeet R wrote:
> Hello Dave
>
> The binaries did not have debug symbols but I managed to rebuild the binaries 
> with debug enabled.

Great.

> I tried starting the xapi process as it was started in the init.d scripts 
> under gdb. However, in gdb, the xapi process forks another process and I am 
> not able to debug it further (I tried setting detach_on_fork to off in gdb, 
> but the primary process just goes to end of execution).
>
> I am using the following gdb command to debug
>
> gdb --args /usr/sbin/xapi -daemon -writeinitcomplete 
> /var/run/xapi_init_complete.cookie -writereadyfile 
> /var/run/xapi_startup.cookie -onsystemboot"
>
> Can you please help me in the steps that you use in debugging the XAPI 
> process.

Ah, I think xapi forks a "watchdog" process near the start -- this is probably 
what you're seeing.

Try adding a "-nowatchdog" option to the command-line.

Dave

>
> Thanks for your help,
>
> -Ranjeet
>
>
>
> -----Original Message-----
> From: Dave Scott [mailto:Dave.Scott@xxxxxxxxxx]
> Sent: Saturday, March 22, 2014 12:36 PM
> To: Ranjeet R
> Cc: xen-api@xxxxxxxxxxxxx
> Subject: Re: [Xen-API] Debugging XAPI daemon crash
>
> Hi,
>
> I suspect the segfault is being caused by a bad C function binding. I've seen 
> a similar crash before when querying an interface IP via getifaddrs (I think 
> that was the function name) Could you run xapi in gdb and reproduce the 
> crash? Printing the call stack would help to confirm this hypothesis. 
> Provided the xapi binary still has debug symbols (ie hasn't been stripped) 
> the ocaml functions (with fairly obvious mangled names) should also be on the 
> stack too.
>
> Cheers,
> Dave
>
>> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@xxxxxxxxxxx> wrote:
>>
>> Hello all
>>
>> I am trying to bring a DevCloud setup which has an XCP Kronos based XAPI 
>> daemon. I had changed the underlying network implementation (it is not a 
>> bridge, but an openvswitch-like network implementation) and the XAPI daemon 
>> crashes during bootup. Please find the XAPI logs below.
>>
>>
>> starting up database engine D:72969b3eaf8e|redo_log] Flushing 
>> database to all active redo-logs starting up database engine 
>> D:72969b3eaf8e|xapi] About to flush database: /var/lib/xcp/state.db 
>> starting up database engine D:72969b3eaf8e|redo_log] Flushing 
>> database to all active redo-logs starting up database engine 
>> D:72969b3eaf8e|xapi] Performing initial DB GC thread_zero|dbsync
>> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost 
>> dbsync
>> (update_env) D:fd0aec7399c9|dbsync] creating localhost
>>
>> dmesg logs seem to suggest that xapi is crashing during startup.
>>
>> [    9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30 error 4 in 
>> xapi[8048000+59f000]
>> [    9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450 error 4 in 
>> xapi[8048000+59f000]
>>
>> I looked the XAPI code to see where it fails and I don't see any logs 
>> after the following code point in ocaml / xapi / dbsync_slave.ml
>>
>> let create_localhost ~__context info =
>>    let ip = get_my_ip_addr ~__context in
>>
>> I confirmed to see that "ifconfig xenbr0" has a valid management IP address 
>> and should not fail.
>>
>> How do I debug this crash further. Are there any ways to look at the stack 
>> trace where XAPI crashed. Any pointers to debug this further will be very 
>> helpful.
>>
>> -Ranjeet
>>
>>
>> _______________________________________________
>> Xen-api mailing list
>> Xen-api@xxxxxxxxxxxxx
>> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
>
>
>





_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.