[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
On 26/11/13 16:50, Konrad Rzeszutek Wilk wrote: > On Thu, Nov 21, 2013 at 05:52:28PM +0000, David Vrabel wrote: >> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote: >>> The 'read_reply' works with 'process_msg' to read of a reply in XenBus. >>> 'process_msg' is running from within the 'xenbus' thread. Whenever >>> a message shows up in XenBus it is put on a xs_state.reply_list list >>> and 'read_reply' picks it up. >>> >>> The problem is if the backend domain or the xenstored process is killed. >>> In which case 'xenbus' is still awaiting - and 'read_reply' if called - >>> stuck forever waiting for the reply_list to have some contents. >>> >>> This is normally not a problem - as the backend domain can come back >>> or the xenstored process can be restarted. However if the domain >>> is in process of being powered off/restarted/halted - there is no >>> point of waiting on it coming back - as we are effectively being >>> terminated and should not impede the progress. >>> >>> This patch solves this problem by checking the 'system_state' value >>> to see if we are in heading towards death. We also make the wait >>> mechanism a bit more asynchronous. >> >> This seems to be checking the wrong thing conceptually. We should abort >> the wait if xenstored is dead not if our domain is dying. >> >> I think you can consider xenstored as dead if: >> >> a) it's local and we're dying. > > OK. Not sure exactly how to do that but that should be possible. xen_store_domain_type == XS_LOCAL and looking at system_state? >> b) it's remote and the remote domain is dead. > > OK, any idea how to do that? As in check if a remote domain is dead? Let someone who cares about xenstore domains fix this -- this is not the most common use case. I'd be happy to have some thing like: bool xenbus_ok(void) { switch (xen_store_domain_type) { case XS_LOCAL: return system_state != dying; case XS_PV: case XS_HVM; /* FIXME: could check remote domain is alive, but it's normally dom0. */ return true; // ... default: return true; } } >>> Fixes-Bug: http://bugs.xenproject.org/xen/bug/8 >> >> This bug link has no useful information in it. And it now does, thanks Ian. >>> --- a/drivers/xen/xenbus/xenbus_xs.c >>> +++ b/drivers/xen/xenbus/xenbus_xs.c >>> @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type *type, >>> unsigned int *len) >>> >>> while (list_empty(&xs_state.reply_list)) { >>> spin_unlock(&xs_state.reply_lock); >>> - /* XXX FIXME: Avoid synchronous wait for response here. */ >>> - wait_event(xs_state.reply_waitq, >>> - !list_empty(&xs_state.reply_list)); >>> + wait_event_timeout(xs_state.reply_waitq, >>> + !list_empty(&xs_state.reply_list), >>> + msecs_to_jiffies(500)); >> >> This is still a synchronous wait. Is the removal of the FIXME comment >> correct? > > I thought that the comment was meant in terms of it blocking forever. > But perhaps that was not the intent of the comment? Ok. I don't anticipate a fully async interface here being sensible anyway. David _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |