[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.



On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
> The result on Linux is that the process always deadlocks before
> returning from this function.
> 
> This is used by xl's console child.  So, the ultimate effect is that
> xl with pygrub does not manage to connect to the pygrub console.
> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.

"behaviour".

Michael reported this earlier on -rc2 as well but it fell through the
cracks because I failed to properly appreciate the severity. Sorry.

> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
> documented to suffice if called only on one ctx.  So deregistering the
> ctx it's called on is not sufficient.  Instead, we need a new approach
> which discards the whole sigchld_user list and unconditionally removes
> our SIGCHLD handler if we had one.
> 
> Prompted by this, clarify the semantics of
> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
> "quickly" by explaining what operations are not permitted; and
> document the fact that the function doesn't reclaim the resources in
> the ctxs.
> 
> And add a comment in libxl_postfork_child_noexec explaining the
> internal concurrency situation.
> 
> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> Reported-by: M A Young <m.a.young@xxxxxxxxxxxx>

Acked-by: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> ---
>  tools/libxl/libxl_event.h |   16 ++++++++++++++++
>  tools/libxl/libxl_fork.c  |   44 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 59 insertions(+), 1 deletion(-)

Impressive considering the real meat is -1/+6 ;-)

Not that I'm going to complain about lots of docs!

> 
> @@ -134,7 +150,33 @@ void libxl_postfork_child_noexec(libxl_ctx *ctx)
>      }
>      LIBXL_LIST_INIT(&carefds);
>  
> -    sigchld_user_remove(ctx);
> +    if (sigchld_installed) {
> +        defer_sigchld();
> +
> +        LIBXL_LIST_INIT(&sigchld_users);
> +        /* After this the ->sigchld_user_registered entries in the
> +         * now-obsolete contexts may be lies.  But that's OK because
> +         * no-one will look at them. */
> +
> +        release_sigchld();
> +        sigchld_removehandler_core();
> +    }
>  
>      atfork_unlock();
>  }



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.