[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On Thu, 2013-05-02 at 11:10 +0100, AurÃlien Chartier wrote: > On 02/05/13 10:24, Ian Campbell wrote: > > On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: > >>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: > >>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: > >>>> If the xenbus frontend is running in a domain running xenstored or in > >>>> dom0, > >>>> the device resume is hanging because it is happening before the process > >>>> resume. This patch adds extra logic to the resume code to check if we are > >>>> the domain running xenstored or dom0. > >>>> > >>>> The frontend will be reconnected later, when the backend resumes from S3. > >>>> This logic is working when xenstored is running in dom0, but has not been > >>>> tested with a xenstore stub domain. > >>>> --- > >>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > >>>> 1 file changed, 14 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > >>> b/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> index 3159a37..8583afe 100644 > >>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch > >>>> *watch, > >>>> xenbus_otherend_changed(watch, vec, len, 1); > >>>> } > >>>> > >>>> +static int xenbus_frontend_dev_resume(struct device *dev) > >>>> +{ > >>>> + /* > >>>> + * If xenstored is running in that domain, we cannot access the > >>>> backend > >>>> + * state at the moment. If we are running in dom0, the domain > >>>> running > >>>> + * xenstored is still suspended at that point > >>>> + */ > >>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > >>>> + return 0; > >>>> + > >>>> + return xenbus_dev_resume(dev); > >>> When or where does this eventually get called for the init domain or > >>> XS_LOCAL cases? > >> I was about to ask the same question. Plus I don't think the > >> description here or in the overview mail really makes clear how > >> specifically a deadlock would occur here. That's pretty relevant to > >> understand in the light that so far we had no indication of there > >> being any special treatment necessary here, and resume from S3 > >> had been working quite fine without that (at least as long as > >> xenstored is running in Dom0 and at least with the traditional/ > >> forward-port/non-pvops kernels). > > I think the unusual feature here is that dom0 has a netfront attached. > > Netfront resume is therefore hanging because it is trying to talk to the > > still frozen xenstored process in dom0. > > > > Ian. > > > Yes, the unusual feature of having a netfront driver in dom0 is > triggering the S3 issue I described. Ian made me realize this issue > could also happen in Xenstore stub domains. > > The root cause of the issue is the assomption that a xenstored process > is running in another domain when the xenbus frontend is being resumed > from S3. This assomption is incorrect if xenstored and the xenbus > frontend are running in the same domain. As Linux kernel is waiting for > all devices to be resumed before resuming userland tasks, the xenbus > frontend resume is blocking the userland process resume, waiting for > xenstored (which cannot run as it is a userland process). > > The xenbus_dev_resume function for frontend devices such as nefront will > not be called at all with that patch. I am relying on the fact that the > network backend domain will be resumed after dom0 resume is complete. > When that resume is happening, it will trigger a call to netback_changed > in dom0 netfront. This call will end up resuming xenbus states in netfront. > > That logic is working for a dom0 netfront, as we can safely rely on the > fact that the network backend domain will be resumed after dom0 resume > is complete. I don't have a Xen configuration with Xenstore stub domain, > but it would probably need some extra logic to reconnect the frontend > after xenstored is being resumed. The main goal of this patch is to fix > the S3 resume of domains running both a xenbus frontend and xenstored. Is the assumption that other domains are all suspended over S3 a valid one in the general case? In principal there is nothing stopping the toolstack from leaving domains running over S3, is there? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |