[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] FW: Cancelling asynchronous operations in libxl

I've been thinking about this some more and looking at the code.

I have the following sketch of an approach:

 * Somehow the ao_how API is changed to make it possible to return the
   ao to the caller (in the case of an asynchronous ao).

   (NB that there could be exciting races in the application between
   completing the ao and cancelling it; this means that the
   application can only use cancellation if it uses the callback
   mechanism and must make sure that the callback takes a lock and
   then makes changes to its data structurs which prevent the ao being
   cancelled.  As an alternative we could invent a separate
   "cancellation handle" which can be detached from the ao but which
   must always be explicitly destroyed by the application.)

   Suggestions for the API welcome.  The most obvious approach simply
   adds a new field to libxl_asyncop_how but that risks lots of
   existing code failing to initialise it.

 * Keep a list of cancellation hooks in the ao.  Anything which is
   using this ao can add itself to that set of hooks.

 * Cancellation involves repeatedly taking the front of that list off,
   and calling the hook.  (After the ao has been cancelled, its
   completion still needs to be awaited by the application, but it
   will hopefully complete earlier and return ERROR_CANCEL.)

 * The timeout registration facility is changed to take an ao and
   register a cancellation hook.  It is changed to provide an rc value
   to its callback, which will be FAIL or CANCEL.

 * The fork machinery is changed to take an ao, and register a
   cancellation hook.  A suitable-for-default-uses cancellation hook
   function is provided which sends SIGKILL to the child and makes a
   note that this has happened.  The child death callback provides an
   rc value (0 for status==0, or FAIL or CANCEL) for the convenience
   of the next layer up.

 * A new version of the xswatch event registration machinery is
   provided which takes an ao, registers itself as cancellation hooks,
   and provides an rc value to its callbacks.  This new facility could
   usefully do an xs_read on a predefined path.  The rc value will be
   OK or CANCEL.  (We need new versions of this because some xswatch
   callers are part of the infrastructure or libxl application event
   generation, not aos.)

 * Anything which uses the fd machinery directly needs to do
   cancellation itself (or ensure that it has a timeout, an xs watch,
   or a child).

A tricky question arises regarding cleanup: for example, if
libxl_domain_create_* were cancelled.  It would end up in
domcreate_complete with rc==CANCEL.

Should it now run the domain destruction ?  How would the caller say
they wanted that cancelled, if that too was taking too long ?  Perhaps
there should be a progress callback to say "we have finished
cancelling the first thing and are now cleaning up".

Or perhaps cancelling the operation should simply skip the destruction
and return the domid to the caller.  (But also, fiddly edge case:
consider what happens if a failed creation, which is already being
destroyed, is cancelled.)


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.