[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen master: xl create hangs





On Tue, Jul 19, 2022 at 4:26 AM Mathieu Tarral <mathieu.tarral@xxxxxxxxxxxxxx> wrote:
Using gdb to debug the xl process, I get the following stacktrace:

(gdb) bt
#0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=8652, futex_word=0x7f6debd22a50) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=8652, futex_word=0x7f6debd22a50) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f6debd22a50, expected=8652, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
    private=private@entry=128) at ./nptl/futex-internal.c:139
#3  0x00007f6deba736a4 in __pthread_clockjoin_ex (threadid=140110084581248, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0,
    abstime=abstime@entry=0x0, block=block@entry=true) at ./nptl/pthread_join_common.c:105
#4  0x00007f6deba73543 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
#5  0x00007f6deb9a144b in xs_daemon_close (h=0x561db3bc5bc0) at xs.c:366
#6  0x00007f6deb9a145f in xs_close (xsh=<optimized out>) at xs.c:386
#7  0x00007f6debc43a36 in libxl_ctx_free (ctx=0x561db3bc52e0) at libxl.c:173
#8  0x0000561db33bf5a3 in xl_ctx_free () at xl.c:370
#9  0x00007f6deba22495 in __run_exit_handlers (status=0, listp=0x7f6debbf6838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
    run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:113
#10 0x00007f6deba22610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
#11 0x00007f6deba06d97 in __libc_start_call_main (main=main@entry=0x561db33c0425 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffeb2f263d8)
    at ../sysdeps/nptl/libc_start_call_main.h:74
#12 0x00007f6deba06e40 in __libc_start_main_impl (main=0x561db33c0425 <main>, argc=4, argv=0x7ffeb2f263d8, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7ffeb2f263c8) at ../csu/libc-start.c:392
#13 0x0000561db33bf425 in _start ()

Colorized version in a Github Gist:
https://gist.github.com/Wenzel/4da1e0a025954fac13a0ee57147cc44f

So looks like xs_daemon_close is waiting on a thread to join:
https://github.com/xen-project/xen/blob/a5fb66f4513c2c2d222dcc3753163b15690bd003/tools/libs/store/xs.c#L366

"A read thread which pulls messages off the comms channel and signals waiters."

Any ideas what could go wrong ?
How to debug this further ?

Unfortunately, the way the async stuff in libxl is implemented, the stack trace won't be particularly useful.  Can you run it with the `-vvvvvvvvvv` command line option?  (I don't know exactly how many v's; just put in a lot and you'll get all the debug info.)

The other thing to include would be the output of `xl dmesg`, and the information in /var/log/xen/ pertaining to that domain.

Another way to track this down would be to do a  'git bisect', between `master` and `RELEASE-4.16.0` to see where things stopped working; but unless you feel particularly motivated, it's probably something to try after we've looked at the other information to see if we can figure out what's gone wrong.

 -George 

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.