[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] timing loops
Hopefully in the future the whole stack will support cancellation -- so the user can apply their own timeout values in their code instead of us doing it one-size-fits-all. A lot of the domain-level stuff can now be cancelled (which may cause the domain to crash if it happens at a bad time.. but this does at least cause things to unwind usually). Most of the storage interface is uncancellable, which is a big problem since it involves off-box RPCs. We either need to fix that directly or offer users the big red button labeled "driver domain restart" which will unstick things One bad thing about not supporting cancellation is that it encourages people to close connections and walk away, unaware that a large amount of resources (and locks) are still being consumed server-side. One good thing to do would be to send heartbeats to any running CLIs and auto-cancel when the connection is broken unless some "--async" option is given which would return immediately with a Task. In the meantime we always tune the timeouts to fail eventually if the system gets truly stuck under high load. This leads to fairly long timeouts, which isn't ideal for everyone. There's a tension between high timeouts for stress testing and low timeouts for user experience -- we can't do both :( Cheers, Dave > -----Original Message----- > From: Anil Madhavapeddy [mailto:anil@xxxxxxxxxx] > Sent: 10 July 2012 15:24 > To: Dave Scott > Cc: xen-api@xxxxxxxxxxxxx > Subject: Re: [Xen-API] timing loops > > How do you decide on a reasonable value of n, given that real timeouts > shift so dramatically with dom0 system load? Or rather, what areas of > xapi aren't fully event-driven and require such timeouts? > > I can imagine the device/udev layer being icky in this regard, but a > good way to wrap all such instances might be to have a single event- > dispatch daemon which combines all the system events and timeouts, and > coordinates the remainder of the xapi process cluster (which will not > need arbitrary timeouts as a result). Or it just too impractical since > there are so many places where such timeouts are required? > > -anil > > On 10 Jul 2012, at 15:18, Dave Scott wrote: > > > Hi, > > > > With all the recent xapi disaggregation work, are we now more > vulnerable to failures induced by moving the system clock around, > affecting timeout logic in our async-style interfaces where we wait for > 'n' seconds for an event notification? > > > > I've recently added 'oclock' as a dependency which gives us access to > a monotonic clock source, which is perfect (I believe) for reliably > 'timing out'. I started a patch to convert the whole codebase over but > it was getting much too big and hard to test because sometimes we > really do want a calendar date, and other times we really want a point > in time. > > > > Maybe I should make a subset of my patch which fixes all the new > timing loops that have been introduced. What do you think? Would you > like to confess to having written: > > > > let start = Unix.gettimeofday () in > > while (not p && (Unix.gettimeofday () -. start < timeout) do > Thread.delay 1. done > > > > I've got a nice higher-order function to replace this which does: > > > > let until p timeout interval = > > let start = Oclock.gettime Oclock.monotonic in > > while (not p && (Int64.(to_float (sub (Oclock.gettime > Oclock.monotonic) start) / 1e9) < timeout) do Thread.delay 1. Done > > > > I believe this is one of many things that lwt (and JS core) does a > nice job of. > > > > Cheers, > > Dave > > > > _______________________________________________ > > Xen-api mailing list > > Xen-api@xxxxxxxxxxxxx > > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api > > _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |