[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] libxl: error handling before xenstored runs
On Thursday 10 February 2011 12:43:47 Ian Campbell wrote: > On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote: > > On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > > > On 10/02/11 08:55, Ian Campbell wrote: > > > > > That's the underlying bug which the heuristic is trying to avoid... > > > > > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > > > someone is listening on the other end so you have no choice but to > > > > > try communicating and see if anyone responds. > > > > > > > > > > It's a pretty straightforward bug that the kernel does the waiting > > > > > to see if anyone responds bit with an uninterruptible sleep. I took > > > > > a quick look a little while ago but unfortunately it didn't look > > > > > straightforward to fix on the kernel side :-( I can't remember why > > > > > though. > > > > > > > > For starter, the protocol requires the messages to sit on the ring > > > > for a underdetermined amount of time (boot watches). > > > > > > > > > It might be simpler to support allowing the userspace client to > > > > > explicitly specify a timeout. I'm not sure what the impact on the > > > > > ring is of leaving unconsumed requests on the ring when the other > > > > > end does show up. Presumably the kernel driver just needs to be > > > > > prepared to swallow responses whose target has given up and gone > > > > > home. > > > > > > > > No, the simplest thing to do is to use the socket connection > > > > exclusively. Just how we're doing it in XCP and XCI. > > > > > > Right but this approach doesn't work with xenstored in a stubdomain. > > > Part of the point of using the ring protocol even when this isn't the > > > case is to help ensure that it is possible and help avoid regressions > > > etc. > > > > > > > The protocol is not design to do async either, so leaving unconsumed > > > > request, could be pretty disastrous if the other end show up. > > > > Providing the kernel doesn't detect it (i don't think it does [1]), > > > > it would imply spurious reply, for example the previous waiting read > > > > on "/abc/def" could reply to a next read on "/xyz/123". > > > > > > The wire protocol includes a req_id which is echoed in the response > > > which sh/could facilitate multiplexing this sort of thing. The pvops > > > kernel currently always sets it to zero but that's just an > > > implementation detail ;-) Currently the kernel does (roughly): > > > take_lock > > > write_request > > > wait_for_reply > > > release_lock > > > instead it should/could be: > > > take_lock(timeout) > > > write_request (++req_id) > > > while read_reply.req_id != req_id && not (timeout) > > > wait some more > > > release lock > > > > I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0. > > Fixing the NetBSD dom0 does though. > > Seriously, if kernels are lacking in functionality needed to make the > system work smoothly and correctly we should fix them, not just default > to adding hacks in userspace because it seems easier in the short term. > (Obviously if the userspace solution is the right thing to do and/or > more correct in its own right then fine lets do that). Does xl communicate with xenstored through a named socket ? If yes then 'connect()' should check for ECONNREFUSED. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |