[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/9] libxl: Asynchronous/long-running operation infrastructure



On Fri, 2012-01-13 at 19:25 +0000, Ian Jackson wrote:
> Provide a new set of machinery for writing public libxl functions
> which may take a long time.  The application gets to decide whether
> they want the function to be synchronous, or whether they'd prefer to
> get a callback, or an event, when the operation is complete.
> 
> User(s) of this machinery will be introduced in later patch(es).

You've done device removal, do you have a list of other things which
should use this? (perhaps with an associated list of people's names...)

> 
> Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> ---
>  tools/libxl/libxl.h          |   50 ++++++++++++
>  tools/libxl/libxl_event.c    |  183 
> ++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h |  112 ++++++++++++++++++++++++++
>  tools/libxl/libxl_types.idl  |    4 +
>  4 files changed, 349 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index e32881b..416d6e8 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -235,8 +235,58 @@ enum {
>      ERROR_NOT_READY = -11,
>      ERROR_OSEVENT_REG_FAIL = -12,
>      ERROR_BUFFERFULL = -13,
> +    ERROR_ASYNC_INPROGRESS = -14,
>  };
> 
> +
> +/*
> + * Some libxl operations can take a long time.  These functions take a
> + * parameter to control their concurrency:
> + *     libxl_asyncop_how *ao_how
> + *
> + * If ao_how==NULL, the function will be synchronous.
> + *
> + * If ao_how!=NULL, the function will set the operation going, and
> + * if this is successful will return ERROR_ASYNCH_INPROGRESS.

There's an extra H here compared with the actual symbol name (I think
the symbol is right).

Is there a possibility that libxl might decide that the operation isn't
actually going to take all the long and do things synchronously,
returning normal success (e.g. 0)? Is that the reason for the separate
return code for this "I did what you asked me" case?

Can we drop the ERROR_ prefix? I know that's inconsistent with the other
return codes but those actually are errors.

> + *
> + * If ao_how->callback!=NULL, the callback will be called when the
> + * operation completes.  The same rules as for libxl_event_hooks
> + * apply, including the reentrancy rules and the possibility of

           ^ (see above/below) -- depending on how these comments end up
relative to each other.

> + * "disaster", except that libxl calls ao_how->callback instead of
> + * libxl_event_hooks.event_occurs.
> + *
> + * If ao_how->callback==NULL, a libxl_event will be generated which
> + * can be obtained from libxl_event_wait or libxl_event_check.

Or be delivered via event_occurs?

>   The
> + * event will have type OPERATION_COMPLETE (which is not used
> + * elsewhere).
> + *
> + * Note that it is possible for an asynchronous operation which is to
> + * result in a callback to complete during its initiating function
> + * call.  In this case the initating function will return

                              initiating

> + * ERROR_ASYNCH_INPROGRESS, even though by the time it returns the

Another stray H.

> + * operation is complete and the callback has already happened.
> + *
> + * The application must set and use ao_how->for_event (which will be
> + * copied into libxl_event.for_user) or ao_how->for_callback (passed
> + * to the callback) to determine which operation finished, and it must
> + * of course check the rc value for errors.
> + *
> + * *ao_how does not need to remain valid after the initiating function
> + * returns.
> + *
> + * Callbacks may occur on any thread in which the application calls
> + * libxl.
> + */
> +
> +typedef struct {
> +    void (*callback)(libxl_ctx *ctx, int rc, void *for_callback);
> +    union {
> +        libxl_ev_user for_event; /* used if callback==NULL */
> +        void *for_callback; /* passed to callback */

Why void * for one bit of "closure" but an explicit uint64_t for the
other. I nearly commented on the use of uint64_t previously -- void *,
or perhaps (u)intptr_t is more normal.

> +    } u;
> +} libxl_asyncop_how;
> +
> +
>  #define LIBXL_VERSION 0
> 
>  typedef struct {
> diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
> index 82889f6..b99049a 100644
> --- a/tools/libxl/libxl_event.c
> +++ b/tools/libxl/libxl_event.c
> @@ -771,10 +771,21 @@ static void egc_run_callbacks(libxl__egc *egc)
>  {
>      EGC_GC;
>      libxl_event *ev, *ev_tmp;
> +
>      LIBXL_TAILQ_FOREACH_SAFE(ev, &egc->occurred_for_callback, link, ev_tmp) {
>          LIBXL_TAILQ_REMOVE(&egc->occurred_for_callback, ev, link);
>          CTX->event_hooks->event_occurs(CTX->event_hooks_user, ev);
>      }
> +
> +    libxl__ao *ao, *ao_tmp;
> +    LIBXL_TAILQ_FOREACH_SAFE(ao, &egc->aos_for_callback,
> +                             entry_for_callback, ao_tmp) {
> +        LIBXL_TAILQ_REMOVE(&egc->aos_for_callback, ao, entry_for_callback);
> +        ao->how.callback(CTX, ao->rc, ao->how.u.for_callback);
> +        ao->notified = 1;
> +        if (!ao->in_initiator)
> +            libxl__ao__destroy(CTX, ao);
> +    }
>  }
> 
>  void libxl__egc_cleanup(libxl__egc *egc)
> @@ -1061,6 +1072,178 @@ int libxl_event_wait(libxl_ctx *ctx, libxl_event 
> **event_r,
>      return rc;
>  }
> 
> +
> +
> +/*
> + * The two possible state flow of an ao:
> + *
> + * Completion before initiator return:
> + *
> + *     Initiator thread                       Possible other threads
> + *
> + *   * ao_create allocates memory and
> + *     initialises the struct
> + *
> + *   * the initiator function does its
> + *     work, setting up various internal
> + *     asynchronous operations -----------> * asynchronous operations
> + *                                            start to take place and
> + *                                            might cause ao completion
> + *                                                |
> + *   * initiator calls ao_complete:               |
> + *     - if synchronous, run event loop           |
> + *       until the ao completes                   |
> + *                              - ao completes on some thread
> + *                              - completing thread releases the lock
> + *                     <--------------'
> + *     - ao_complete takes the lock
> + *     - destroy the ao
> + *
> + *
> + * Completion after initiator return (asynch. only):
> + *
> + *
> + *     Initiator thread                       Possible other threads
> + *
> + *   * ao_create allocates memory and
> + *     initialises the struct
> + *
> + *   * the initiator function does its
> + *     work, setting up various internal
> + *     asynchronous operations -----------> * asynchronous operations
> + *                                            start to take place and
> + *                                            might cause ao completion
> + *                                                |
> + *   * initiator calls ao_complete:               |
> + *     - observes event not net done,             |
> + *     - returns to caller                        |
> + *                                                |
> + *                              - ao completes on some thread
> + *                              - generate the event or call the callback
> + *                              - destroy the ao

Where does ao_inprogress fit into these diagrams?

> + */
> +
> +void libxl__ao__destroy(libxl_ctx *ctx, libxl__ao *ao) {

CODING_STYLE wants these braces on the next line (a bunch more follow)

> +    if (!ao) return;
> +    if (ao->poller) libxl__poller_put(ctx, ao->poller);
> +    ao->magic = LIBXL__AO_MAGIC_DESTROYED;
> +    libxl__free_all(&ao->gc);
> +    free(ao);
> +}
> +
> +void libxl__ao_abort(libxl__ao *ao) {
> +    AO_GC;
> +    assert(ao->magic == LIBXL__AO_MAGIC);
> +    assert(ao->in_initiator);
> +    assert(!ao->complete);
> +    libxl__ao__destroy(CTX, ao);
> +}
> +
> +void libxl__ao_complete(libxl__egc *egc, libxl__ao *ao, int rc) {
> +    assert(ao->magic == LIBXL__AO_MAGIC);
> +    assert(!ao->complete);
> +    ao->complete = 1;
> +    ao->rc = rc;
> +
> +    if (ao->poller) {
> +        assert(ao->in_initiator);
> +        libxl__poller_wakeup(egc, ao->poller);
> +    } else if (ao->how.callback) {
> +        LIBXL_TAILQ_INSERT_TAIL(&egc->aos_for_callback, ao, 
> entry_for_callback);
> +    } else {
> +        libxl_event *ev;
> +        ev = NEW_EVENT(egc, OPERATION_COMPLETE, ao->domid);
> +        if (ev) {
> +            ev->for_user = ao->how.u.for_event;
> +            ev->u.operation_complete.rc = ao->rc;
> +            libxl__event_occurred(egc, ev);
> +        }
> +        ao->notified = 1;
> +    }
> +    if (!ao->in_initiator && ao->notified)
> +        libxl__ao__destroy(libxl__gc_owner(&egc->gc), ao);

You added a helper for this libxl__gc_owner(&egc..) construct.

> +}
> +
> +libxl__ao *libxl__ao_create(libxl_ctx *ctx, uint32_t domid,
> +                            const libxl_asyncop_how *how) {
> +    libxl__ao *ao;
> +
> +    ao = calloc(sizeof(*ao),1);

calloc is actually (nmemb, size). I'm sure it doesn't really matter
though.


> +    if (!ao) goto out;
> +
> +    ao->magic = LIBXL__AO_MAGIC;
> +    ao->in_initiator = 1;
> +    ao->poller = 0;
> +    ao->domid = domid;
> +    LIBXL_INIT_GC(ao->gc, ctx);
> +
> +    if (how) {
> +        ao->how = *how;
> +    } else {
> +        ao->poller = libxl__poller_get(ctx);
> +        if (!ao->poller) goto out;
> +    }
> +    return ao;
> +
> + out:
> +    if (ao) libxl__ao__destroy(ctx, ao);
> +    return NULL;
> +}
> +
> +int libxl__ao_inprogress(libxl__ao *ao) {
> +    AO_GC;
> +    int rc;
> +
> +    assert(ao->magic == LIBXL__AO_MAGIC);
> +    assert(ao->in_initiator);
> +
> +    if (ao->poller) {
> +        /* Caller wants it done synchronously. */
> +        /* We use a fresh gc, so that we can free things
> +         * each time round the loop. */
> +        libxl__egc egc;
> +        LIBXL_INIT_EGC(egc,CTX);
> +
> +        for (;;) {
> +            assert(ao->magic == LIBXL__AO_MAGIC);
> +
> +            if (ao->complete) {
> +                rc = ao->rc;
> +                ao->notified = 1;
> +                break;
> +            }
> +
> +            rc = eventloop_iteration(&egc,ao->poller);
> +            if (rc) {
> +                /* Oh dear, this is quite unfortunate. */
> +                LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Error waiting for"
> +                           " event during long-running operation (rc=%d)", 
> rc);
> +                sleep(1);
> +                /* It's either this or return ERROR_I_DONT_KNOW_WHETHER
> +                 * _THE_THING_YOU_ASKED_FOR_WILL_BE_DONE_LATER_WHEN
> +                 * _YOU_DIDNT_EXPECT_IT, since we don't have any kind of
> +                 * cancellation ability. */

Does this constitute a "disaster" (in the special hook sense)?

> +            }
> +
> +            CTX_UNLOCK;
> +            libxl__egc_cleanup(&egc);
> +            CTX_LOCK;
> +        }
> +    } else {
> +        rc = ERROR_ASYNC_INPROGRESS;
> +    }
> +
> +    ao->in_initiator = 0;
> +
> +    if (ao->notified) {
> +        assert(ao->complete);
> +        libxl__ao__destroy(CTX,ao);
> +    }
> +
> +    return rc;
> +}
> +
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 53d2462..594b9fb 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
[...]
> @@ -1123,6 +1144,97 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc);
> 
> 
>  /*
> + * Machinery for asynchronous operations ("ao")
> + *
> + * All "slow" functions (includes anything that might block on a
> + * guest or an external script) need to use the asynchronous
> + * operation ("ao") machinery.  The function should take a parameter
> + * const libxl_asyncop_how *ao_how and must start with a call to
> + * AO_INITIATOR_ENTRY.  These functions MAY NOT be called from
> + * outside libxl, because they can cause reentrancy callbacks.
> + *
> + * No functions called internally within libxl should ever return
> + * ERROR_ASYNCH_INPROGRESS.

Aitch.

> + *
> + * Lifecycle of an ao:
> + *
> + * - Created by libxl__ao_create (or the AO_CREATE convenience macro).
> + *
> + * - After creation, can be used by code which implements
> + *   the operation as follows:
> + *      - the ao's gc, for allocating memory for the lifetime
> + *        of the operation (possibly with the help of the AO_GC
> + *        macro to introduce the gc into scope)
> + *      - the ao itself may be passed about to sub-functions
> + *        so that they can stash it away etc.
> + *      - in particular, the ao pointer must be stashed in some
> + *        per-operation structure which is also passed as a user
> + *        pointer to the internal event generation request routines
> + *        libxl__evgen_FOO, so that at some point a CALLBACK will be
> + *        made when the operation is complete.
> + *
> + * - If initiation is successful, the initiating function needs
> + *   to run libxl__ao_inprogress right before unlocking and
> + *   returning, and return whatever it returns (AO_INPROGRESS macro).
> + *
> + * - If the initiation is unsuccessful, the initiating function must
> + *   call libxl__ao_abort before unlocking and returning whatever
> + *   error code is appropriate (AO_ABORT macro).
> + *
> + * - Later, some callback function, whose callback has been requested
> + *   directly or indirectly, should call libxl__ao_complete (with the
> + *   ctx locked, as it will generally already be in any event callback
> + *   function).  This must happen exactly once for each ao (and not if
> + *   the ao has been destroyed, obviously), and it may not happen
> + *   until libxl__ao_inprogress has been called on the ao.
> + *
> + * - Note that during callback functions, two gcs are available:
> + *    - The one in egc, whose lifetime is only this callback
> + *    - The one in ao, whose lifetime is the asynchronous operation
> + *   Usually callback function should use GET_CONTAINING_STRUCT

Now called CONTAINER_OF

> + *   to obtain its own structure, containing a pointer to the ao,
> + *   and then use the gc from that ao.
> + */
> +
> +#define AO_CREATE(ctx, domid, ao_how)                           \
> +    libxl__ao *ao = libxl__ao_create(ctx, domid, ao_how);       \
> +    if (!ao) return ERROR_NOMEM;                                \
> +    AO_GC;                                                      \
> +    CTX_LOCK;

Where does the unlock which balances this come from? The only unlock I
see in this patch is the temporary drop in libxl__ao_inprogress which is
matched by another lock.

> +
> +#define AO_INPROGRESS do{                                       \
> +        libxl_ctx *ao__ctx = libxl__gc_owner(&ao->gc);          \
> +        int ao__rc = libxl__ao_inprogress(ao);                  \
> +        libxl__ctx_lock(ao__ctx); /* gc is now invalid */       \

Is this supposed to be unlock answering my question above? Likewise in
ABORT?

> +        return ao__rc;                                          \
> +   }while(0)

Can we arrange for AO_INPROGRESS and AO_ABORT to return the rc? So it
would become
        return AO_INPROGRESS;

Is the ({stuff,stuff,stuff,val}) syntax a gcc-ism?

> +
> +
> +#define AO_ABORT(rc) do{                                        \
> +        libxl_ctx *ao__ctx = libxl__gc_owner(&ao->gc);          \
> +        assert(rc);                                             \
> +        assert(rc != ERROR_ASYNC_INPROGRESS);                   \
> +        libxl__ao_abort(ao);                                    \
> +        libxl__ctx_lock(ao__ctx); /* gc is now invalid */       \
> +        return (rc);                                            \
> +    }while(0)
> +
> +#define AO_GC                                   \
> +    libxl__gc *const gc = &ao->gc
> +
> +
> +/* All of these MUST be called with the ctx locked.

Except libxl__ao_create? at least according to the implementation of
AO_CREATE which takes the lock after.

> + * libxl__ao_inprogress MUST be called with the ctx locked exactly once. */
> +_hidden libxl__ao *libxl__ao_create(libxl_ctx*, uint32_t domid,
> +                                    const libxl_asyncop_how*);
> +_hidden int libxl__ao_inprogress(libxl__ao *ao);
> +_hidden void libxl__ao_abort(libxl__ao *ao);
> +_hidden void libxl__ao_complete(libxl__egc *egc, libxl__ao *ao, int rc);
> +
> +/* For use by ao machinery ONLY */
> +_hidden void libxl__ao__destroy(libxl_ctx*, libxl__ao *ao);
> +
> +/*
>   * Convenience macros.
>   */
> 
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index a6dac79..325bb21 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -395,6 +395,7 @@ libxl_event_type = Enumeration("event_type", [
>      (1, "DOMAIN_SHUTDOWN"),
>      (2, "DOMAIN_DESTROY"),
>      (3, "DISK_EJECT"),
> +    (4, "OPERATION_COMPLETE"),
>      ])
> 
>  libxl_ev_user = UInt(64)
> @@ -418,4 +419,7 @@ libxl_event = Struct("event",[
>                                          ("vdev", string),
>                                          ("disk", libxl_device_disk),
>                                   ])),
> +           ("operation_complete", Struct(None, [
> +                                        ("rc", integer),
> +                                 ])),
>             ]))])
> --
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.