[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On 02/05/13 11:30, Ian Campbell wrote: > On Thu, 2013-05-02 at 11:10 +0100, AurÃlien Chartier wrote: >> On 02/05/13 10:24, Ian Campbell wrote: >>> On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: >>>>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: >>>>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >>>>>> If the xenbus frontend is running in a domain running xenstored or in >>>>>> dom0, >>>>>> the device resume is hanging because it is happening before the process >>>>>> resume. This patch adds extra logic to the resume code to check if we are >>>>>> the domain running xenstored or dom0. >>>>>> >>>>>> The frontend will be reconnected later, when the backend resumes from S3. >>>>>> This logic is working when xenstored is running in dom0, but has not been >>>>>> tested with a xenstore stub domain. >>>>>> --- >>>>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >>>>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>> b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> index 3159a37..8583afe 100644 >>>>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch >>>>>> *watch, >>>>>> xenbus_otherend_changed(watch, vec, len, 1); >>>>>> } >>>>>> >>>>>> +static int xenbus_frontend_dev_resume(struct device *dev) >>>>>> +{ >>>>>> + /* >>>>>> + * If xenstored is running in that domain, we cannot access the >>>>>> backend >>>>>> + * state at the moment. If we are running in dom0, the domain >>>>>> running >>>>>> + * xenstored is still suspended at that point >>>>>> + */ >>>>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >>>>>> + return 0; >>>>>> + >>>>>> + return xenbus_dev_resume(dev); >>>>> When or where does this eventually get called for the init domain or >>>>> XS_LOCAL cases? >>>> I was about to ask the same question. Plus I don't think the >>>> description here or in the overview mail really makes clear how >>>> specifically a deadlock would occur here. That's pretty relevant to >>>> understand in the light that so far we had no indication of there >>>> being any special treatment necessary here, and resume from S3 >>>> had been working quite fine without that (at least as long as >>>> xenstored is running in Dom0 and at least with the traditional/ >>>> forward-port/non-pvops kernels). >>> I think the unusual feature here is that dom0 has a netfront attached. >>> Netfront resume is therefore hanging because it is trying to talk to the >>> still frozen xenstored process in dom0. >>> >>> Ian. >>> >> Yes, the unusual feature of having a netfront driver in dom0 is >> triggering the S3 issue I described. Ian made me realize this issue >> could also happen in Xenstore stub domains. >> >> The root cause of the issue is the assomption that a xenstored process >> is running in another domain when the xenbus frontend is being resumed >> from S3. This assomption is incorrect if xenstored and the xenbus >> frontend are running in the same domain. As Linux kernel is waiting for >> all devices to be resumed before resuming userland tasks, the xenbus >> frontend resume is blocking the userland process resume, waiting for >> xenstored (which cannot run as it is a userland process). >> >> The xenbus_dev_resume function for frontend devices such as nefront will >> not be called at all with that patch. I am relying on the fact that the >> network backend domain will be resumed after dom0 resume is complete. >> When that resume is happening, it will trigger a call to netback_changed >> in dom0 netfront. This call will end up resuming xenbus states in netfront. >> >> That logic is working for a dom0 netfront, as we can safely rely on the >> fact that the network backend domain will be resumed after dom0 resume >> is complete. I don't have a Xen configuration with Xenstore stub domain, >> but it would probably need some extra logic to reconnect the frontend >> after xenstored is being resumed. The main goal of this patch is to fix >> the S3 resume of domains running both a xenbus frontend and xenstored. > Is the assumption that other domains are all suspended over S3 a valid > one in the general case? > > In principal there is nothing stopping the toolstack from leaving > domains running over S3, is there? > That seems a valid assumption for dom0. It is probably not for Xenstore stub domains, even if I fail to see a use case for that. Another solution would be to defer the call to xenbus_dev_resume until userland processes have been resumed. Any opinion on that ? Aurelien _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |