[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
On Mon, Feb 24, 2014 at 3:17 PM, George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote: > On 02/24/2014 02:19 PM, Ian Jackson wrote: >> >> libxl_postfork_child_noexec would nestedly reaquire the non-recursive >> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove. >> The result on Linux is that the process always deadlocks before >> returning from this function. >> >> This is used by xl's console child. So, the ultimate effect is that >> xl with pygrub does not manage to connect to the pygrub console. >> This beahviour was reported by Michael Young in Xen 4.4.0 RC5. >> >> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is >> not correct with SIGCHLD sharing. libxl_postfork_child_noexec is >> documented to suffice if called only on one ctx. So deregistering the >> ctx it's called on is not sufficient. Instead, we need a new approach >> which discards the whole sigchld_user list and unconditionally removes >> our SIGCHLD handler if we had one. >> >> Prompted by this, clarify the semantics of >> libxl_postfork_child_noexec. Specifically, expand on the meaning of >> "quickly" by explaining what operations are not permitted; and >> document the fact that the function doesn't reclaim the resources in >> the ctxs. >> >> And add a comment in libxl_postfork_child_noexec explaining the >> internal concurrency situation. >> >> This is an important bugfix. IMO the bug is a blocker for Xen 4.4. >> >> Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> >> Reported-by: M A Young <m.a.young@xxxxxxxxxxxx> >> CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx> >> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > > > So it looks like this path gets called from a number of other places in xl: > > libxl_postfork_child_noexec() is called by xl.c:postfork(). > > postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(), > autoconnect_console(), and do_daemonize(). > > do_daemonize() is called during "xl create", and "xl devd". > > Was this deadlock not triggered for those, or was it triggered and nobody > noticed? In any case, I do think we need to fix this; the main question is, do we need to delay the release a bit further to make sure it gets sufficient testing? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |