[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
On Mon, Feb 24, 2014 at 3:47 PM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote: > On Mon, Feb 24, 2014 at 3:17 PM, George Dunlap > <george.dunlap@xxxxxxxxxxxxx> wrote: >> On 02/24/2014 02:19 PM, Ian Jackson wrote: >>> >>> libxl_postfork_child_noexec would nestedly reaquire the non-recursive >>> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove. >>> The result on Linux is that the process always deadlocks before >>> returning from this function. >>> >>> This is used by xl's console child. So, the ultimate effect is that >>> xl with pygrub does not manage to connect to the pygrub console. >>> This beahviour was reported by Michael Young in Xen 4.4.0 RC5. >>> >>> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is >>> not correct with SIGCHLD sharing. libxl_postfork_child_noexec is >>> documented to suffice if called only on one ctx. So deregistering the >>> ctx it's called on is not sufficient. Instead, we need a new approach >>> which discards the whole sigchld_user list and unconditionally removes >>> our SIGCHLD handler if we had one. >>> >>> Prompted by this, clarify the semantics of >>> libxl_postfork_child_noexec. Specifically, expand on the meaning of >>> "quickly" by explaining what operations are not permitted; and >>> document the fact that the function doesn't reclaim the resources in >>> the ctxs. >>> >>> And add a comment in libxl_postfork_child_noexec explaining the >>> internal concurrency situation. >>> >>> This is an important bugfix. IMO the bug is a blocker for Xen 4.4. >>> >>> Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> >>> Reported-by: M A Young <m.a.young@xxxxxxxxxxxx> >>> CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx> >>> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx> >> >> >> So it looks like this path gets called from a number of other places in xl: >> >> libxl_postfork_child_noexec() is called by xl.c:postfork(). >> >> postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(), >> autoconnect_console(), and do_daemonize(). >> >> do_daemonize() is called during "xl create", and "xl devd". >> >> Was this deadlock not triggered for those, or was it triggered and nobody >> noticed? > > In any case, I do think we need to fix this; the main question is, do > we need to delay the release a bit further to make sure it gets > sufficient testing? Also, it would be nice to get a Tested-by: from someone using it with libvirt (before the release at least, if not before the check-in). Jim / Dario? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |