[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 17/24] libxl: ao: convert libxl__spawn_*



Neither this patch nor the rest of the series seems to handle the long
running nature of xc_domain_restore, is that expected at this stage?

We discussed a similar thing in the context of xc_domain_save, and I
expect the required scaffolding (bumping an op over into a thread) will
be the same on both sides?

On Mon, 2012-04-16 at 18:17 +0100, Ian Jackson wrote:
> libxl__spawn_spawn becomes a callback-style asynchronous function.
> The implementation is now in terms of libxl__ev_* including
> libxl_ev_child.
> 
> All the callers need to be updated.  This includes the device model
> spawning functions libxl__create_device_model and
> libxl__create_stubdom; these are replaced with libxl__spawn_local_dm
> and libxl__spawn_stubdom.  libxl__confirm_device_model_startup is
> abolished; instead the dm spawner calls back.
> 
> (The choice of which kind of device model to create is lifted out of
> what used to be libxl__create_device_model, because that function was
> indirectly recursive.  Recursive callback-style operations are clumsy
> because they require a pointer indirection for the nested states.)
> 
> Waiting for proper device model startup it is no longer notionally
> optional.  Previously the code appeared to tolerate this by passing
> NULL for various libxl__spawner_starting* parameters to device model
> spawners.  However, this was not used anywhere.
> 
> Conversely, the "for_spawn" parameter to libxl__wait_for_offspring is
> no longer supported.  It remains as an unused formal parameter to
> avoid updating, in this patch, all the call sites which pass NULL.
> libxl__wait_for_offspring is in any case itself an obsolete function,
> so this wrinkle will go away when its callers are updated to use the
> event system.  Consequently libxl__spawn_check is also abolished.
> 
> The "console ready" callback also remains unchanged in this patch.
> The API for this needs to be reviewed in the context of the event
> series and its reentrancy restrictions documented.
> 
> Thus their callers need to be updated.  These are the domain creation
> functions libxl_domain_create_new and _restore.  These functions now
> take ao_hows, and have a private state structure.
> 
> However domain creation remains not completely converted to the event
> mechanism; in particular it runs the outward-facing function
> libxl_run_bootloader with a NULL ao_how, which is quite wrong.  As it
> happens in the current code this is not a bug because none of the rest
> of the functionality surrounding the bootloader call will mind if the
> event loop is reentered in the middle of its execution.
> 
> The file-scope function libxl__set_fd_flag which was used by the
> previous spawn arrangements becomes unused and is removed; other
> places in libxl can use libxl_fd_set_nonblock and
> libxl_fd_set_cloexec, which of course remain.
> 
> Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> ---
>  tools/libxl/libxl.h          |   14 ++-
>  tools/libxl/libxl_create.c   |  198 +++++++++++++++++++-----
>  tools/libxl/libxl_dm.c       |  219 +++++++++++++++-----------
>  tools/libxl/libxl_exec.c     |  354 
> +++++++++++++++++++++---------------------
>  tools/libxl/libxl_internal.h |  286 +++++++++++++++++++++++-----------
>  tools/libxl/xl_cmdimpl.c     |    6 +-
>  6 files changed, 677 insertions(+), 400 deletions(-)

(nb: I actually skipped ahead and reviewed the libxl_internal.h change
first so I could read the docs, if only diff could be taught such
things ;-))

> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 477b72a..6f59364 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -465,8 +465,18 @@ int libxl_ctx_free(libxl_ctx *ctx /* 0 is OK */);
> 
>  /* domain related functions */
>  typedef int (*libxl_console_ready)(libxl_ctx *ctx, uint32_t domid, void 
> *priv);
> -int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, 
> libxl_console_ready cb, void *priv, uint32_t *domid);
> -int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config 
> *d_config, libxl_console_ready cb, void *priv, uint32_t *domid, int 
> restore_fd);
> +  /* fixme-ao   Need to review this API.  If we keep it, the reentrancy
> +   * properties need to be documented but they may turn out to be too
> +   * awkward */

The reentrancy concerns here relate to the "cb" or to something else?

[...]
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 8408c26..09a03a7 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -537,16 +537,40 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t 
> domid,

[...]

> @@ -635,13 +667,13 @@ static int do_domain_create(libxl__gc *gc,
> libxl_domain_config *d_config,
>          libxl_device_vkb_add(ctx, domid, &vkb);
>          libxl_device_vkb_dispose(&vkb);
> [...]
> +        dcs->dmss.guest_domid = domid;

dcs->dmss is part of a union with dcs->sdss. I'd rather this was done
inside the if once we've committed to one or the other, IYSWIM.

Actually the hunk above (which I've trimmed already, sorry) also
initialised dmss unconditionally.

(does sdss really not need guest_domid somewhere too? odd)

> +        if (libxl_defbool_val(d_config->b_info.device_model_stubdomain))
> +            libxl__spawn_stubdom(egc, &dcs->sdss);
> +        else
> +            libxl__spawn_local_dm(egc, &dcs->dmss);
> +        return;
> +
>          break;

return then break? I think the break is redundant.

>      }
>      case LIBXL_DOMAIN_TYPE_PV:
> @@ -669,7 +701,9 @@ static int do_domain_create(libxl__gc *gc, 
> libxl_domain_config *d_config,
>          libxl__device_console_dispose(&console);
> 
>          if (need_qemu) {
> -            libxl__create_xenpv_qemu(gc, domid, d_config, state, 
> &dm_starting);
> +            dcs->dmss.guest_domid = domid;
> +            libxl__spawn_local_dm(egc, &dcs->dmss);
> +            return;
>          }
>          break;
>      }
> @@ -678,17 +712,41 @@ static int do_domain_create(libxl__gc *gc, 
> libxl_domain_config *d_config,
>          goto error_out;
>      }
> 
> -    if (dm_starting) {
> +    assert(!dcs->dmss.guest_domid);
> +    domcreate_devmodel_started(egc, &dcs->dmss, 0);

This is actually logically the "else" of the preceding need_qemu if,
since all the other paths return before they get here. I spent quite a
while figuring out why the problem with the "return; break;" I mentioned
above was an extra break and not the return. Might be clearer to have
this in the else?

> +    return;
> +
> + error_out:
> +    assert(ret);
> +    domcreate_complete(egc, dcs, ret);
> +}
> +
> +static void domcreate_devmodel_started(libxl__egc *egc,
> +                                       libxl__dm_spawn_state *dmss,
> +                                       int ret)
> +{
[...]
> +}
> 
> -    return ret;
> +static void domcreate_complete(libxl__egc *egc,
> +                               libxl__domain_create_state *dcs,
> +                               int rc) {

{ should be on the next line.

> +    STATE_AO_GC(dcs->ao);
> +
> +    if (rc) {
> +        if (dcs->guest_domid) {
> +            int rc2 = libxl_domain_destroy(CTX, dcs->guest_domid);
> +            if (rc2)
> +                LOG(ERROR, "unable to destroy domain %d following"
> +                    " failed creation", dcs->guest_domid);
> +        }
> +        dcs->guest_domid = -1;
> +    }
> +    dcs->callback(egc, dcs, rc, dcs->guest_domid);
> +}
> +
> +/*----- application-facing domain creation interface -----*/
> +
[...]
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 3921e2a..15472a8 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -667,24 +667,28 @@ retry_transaction:
>      return 0;
>  }
> 
> -static int libxl__create_stubdom(libxl__gc *gc,
> -                                 int guest_domid,
> -                                 libxl_domain_config *guest_config,
> -                                 libxl__domain_build_state *d_state,
> -                                 libxl__spawner_starting **starting_r)
> +static void spawn_stubdom_pvqemu_cb(libxl__egc *egc,
> +                                libxl__dm_spawn_state *stubdom_dmss,
> +                                int rc);
> +
> +void libxl__spawn_stubdom(libxl__egc *egc, libxl__stubdom_spawn_state *sdss)
>  {
> +    STATE_AO_GC(sdss->stubdom.spawn.ao);
>      libxl_ctx *ctx = libxl__gc_owner(gc);
>      int i, num_console = STUBDOM_SPECIAL_CONSOLES, ret;
>      libxl__device_console *console;
> -    libxl_domain_config dm_config[1];
>      libxl_device_vfb vfb;
>      libxl_device_vkb vkb;
> -    libxl__domain_build_state stubdom_state[1];
> -    uint32_t dm_domid;
>      char **args;
>      struct xs_permissions perm[2];
>      xs_transaction_t t;
> -    libxl__spawner_starting *dm_starting = 0;
> +
> +    /* convenience aliases */
> +    libxl_domain_config *dm_config = &sdss->stubdom_config;
> +    libxl_domain_config *guest_config = sdss->stubdom.guest_config;
> +    int guest_domid = sdss->stubdom.guest_domid;

const? Just in case someone tries to modify it? (maybe there are similar
cases in some of the previous blocks like this, I didn't notice).

[...]
> diff --git a/tools/libxl/libxl_exec.c b/tools/libxl/libxl_exec.c
> index 2ee2154..d44b702 100644
> --- a/tools/libxl/libxl_exec.c
> +++ b/tools/libxl/libxl_exec.c

[...]
> +/*----- spawn implementation -----*/
[...]
> +int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
>  {
> +    STATE_AO_GC(ss->ao);
> +    int r;
> +    pid_t child;
>      int status, rc;

> +    libxl__spawn_init(ss);
> +    ss->ssd = libxl__zalloc(0, sizeof(*ss->ssd));
> +    libxl__ev_child_init(&ss->ssd->mid);
> +
> +    rc = libxl__ev_time_register_rel(gc, &ss->timeout,
> +                                     spawn_timeout, ss->timeout_ms);
> +    if (rc) goto out_err;
> 
> +    rc = libxl__ev_xswatch_register(gc, &ss->xswatch,
> +                                    spawn_watch_event, ss->xspath);

IIRC xspath is optional, does libxl__ev_xswatch_register DTWT with NULL?

> +    if (rc) goto out_err;
> +
> +    pid_t middle = libxl__ev_child_fork(gc, &ss->ssd->mid, 
> spawn_middle_death);
> +    if (middle ==-1) { rc = ERROR_FAIL; goto out_err; }
> +
> +    if (middle) {
>          /* parent */
>          return 1;
>      }
[...]

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index ae71f70..5bab4d6 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h

> @@ -840,75 +840,197 @@ _hidden int libxl__create_pci_backend(libxl__gc *gc, 
> uint32_t domid,
>                                        libxl_device_pci *pcidev, int num);
>  _hidden int libxl__device_pci_destroy_all(libxl__gc *gc, uint32_t domid);
> 
> -/* xl_exec */
> +/*
> + *----- spawn -----
> + *
> + * Higher-level double-fork and separate detach eg as for device models
> + *
> + * Each libxl__spawn_state is in one of the conventional states
> + *    Undefined, Idle, Active

Conventional here means "as per event generation" I assume.

[...]
>  /*
> - * libxl__spawn_spawn - Create a new process
> - * gc: allocation pool
> - * for_spawn: malloc'd pointer to libxl__spawn_starting (optional)
> - * what: string describing the spawned process
> - * intermediate_hook: helper to record pid, such as libxl_spawner_record_pid
> - * hook_data: data to pass to the hook function
> + * libxl__spawn_spawn - Create a new process which will become daemonic
> + * Forks twice, to allow the child to detach entirely from the parent.
> + *
> + * We call the two generated processes the "middle child" (result of
> + * the first fork) and the "inner child" (result of the second fork
> + * which takes place in the middle child).
> + *
> + * The inner child must soon exit or exec.  If must also soon exit or

                                               It?

> + * notify the parent of its successful startup by writing to the
> + * xenstore path xspath (or by other means).  xspath may be 0 to
> + * indicate that only other means are being used.
> + *
> + * The user (in the parent) will be called back (confirm_cb) every
> + * time that xenstore path is modified.
> + *
> + * In both children, the ctx is not fully useable: gc and logging

                                             usable

(apparently, I'm not convinced actually)

> + * operations are OK, but operations on Xen and xenstore are not.
> + * (The restrictions are the same as those which apply to children
> + * made with libxl__ev_child_fork.)
> + *
> + * midproc_cb will be called in the middle child, with the pid of the
> + * inner child; this could for example record the pid.  midproc_cb
> + * should be fast, and should return.  It will be called (reentrantly)
> + * within libxl__spawn_init.
> + *
> + * failure_cb will be called in the parent on failure of the
> + * intermediate or final child; an error message will have been
> + * logged.
> + *
> + * confirm_cb and failure_cb will not be called reentrantly from
> + * within libxl__spawn_spawn.
> + *
> + * what: string describing the spawned process, used for logging
>   *
>   * Logs errors.  A copy of "what" is taken.
>   * Return values:
> - *  < 0   error, for_spawn need not be detached
> - *   +1   caller is the parent, must call detach on *for_spawn eventually
> + *  < 0   error, *spawn is now Idle and need not be detached
> + *   +1   caller is the parent, *spawn is Active and must eventually be 
> detached
>   *    0   caller is now the inner child, should probably call libxl__exec

... or libxl_postfork_child_noexec.

> - * Caller, may pass 0 for for_spawn, in which case no need to detach.
> + *
> + * The spawn state must be Undefined or Idle on entry.
>   */
> -_hidden int libxl__spawn_spawn(libxl__gc *gc,
> -                      libxl__spawn_starting *for_spawn,
> -                      const char *what,
> -                      void (*intermediate_hook)(void *for_spawn, pid_t 
> innerchild),
> -                      void *hook_data);
> +_hidden int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *spawn);
> 
>  /*
> - * libxl_spawner_record_pid - Record given pid in xenstore
> - * for_spawn: malloc'd pointer to libxl__spawn_starting (optional)
> - * innerchild: pid of the child
> + * libxl__spawn_detach - Detaches the daemonic child.
> + *
> + * Works by killing the intermediate process from spawn_spawn.
> + * After this function returns, failures of either child are no
> + * longer repaorted via failure_cb.

             reported

[...]

> +/*----- device model creation -----*/
> +
> +/* First layer; wraps libxl__spawn_spawn. */

The use of "First" here is terrifying to me as a reviewer ;-P

> +
> +typedef struct libxl__dm_spawn_state libxl__dm_spawn_state;
> +
> +typedef void libxl__dm_spawn_cb(libxl__egc *egc, libxl__dm_spawn_state*,
> +                                int rc /* if !0, error was logged */);
> +
> +struct libxl__dm_spawn_state {
> +    /* mixed - ao must be initialised by user; rest is private: */

I see no ao here, you mean spawn.ao?

> +    libxl__spawn_state spawn;
> +    /* filled in by user, must remain valid: */
> +    uint32_t guest_domid; /* domain being served */
> +    libxl_domain_config *guest_config;
> +    libxl__domain_build_state *build_state; /* relates to guest_domid */
> +    libxl__dm_spawn_cb *callback;
> +};
> +
> +_hidden void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state*);


> +
> +/* Stubdoms. */
> +
> +typedef struct {
> +    /* mixed - user must fill in public parts only: */
> +    libxl__dm_spawn_state stubdom; /* will always remain the first member */

why? If you are playing casting tricks then you could use CONTAINER_OF.

> +    libxl__dm_spawn_cb *callback; /* called as callback(,&sdss->stubdom,) */
> +    /* private to libxl__spawn_stubdom: */
> +    libxl_domain_config stubdom_config;
> +    libxl__domain_build_state stubdom_state;
> +    libxl__dm_spawn_state pvqemu;
> +} libxl__stubdom_spawn_state;
> +
> +_hidden void libxl__spawn_stubdom(libxl__egc *egc, 
> libxl__stubdom_spawn_state*);
> +

Ah, I see now what local_dm meant, it's the non-stubdom DM.

stubdom is a bit of an overloaded term (we have grub stubdoms, xenstore
stubdoms etc too). stub_dm would be better.

> 
>  /*
>   * libxl__wait_for_offspring - Wait for child state
> @@ -941,31 +1063,6 @@ _hidden int libxl__wait_for_offspring(libxl__gc *gc,
[...]

> @@ -1566,6 +1650,32 @@ _hidden void 
> libxl__bootloader_init(libxl__bootloader_state *bl);
>  _hidden void libxl__bootloader_run(libxl__egc*, libxl__bootloader_state *st);
> 
> 
> +/*----- Domain creation -----*/
> +
> +typedef struct libxl__domain_create_state libxl__domain_create_state;
> +
> +typedef void libxl__domain_create_cb(libxl__egc *egc,
> +                                     libxl__domain_create_state*,
> +                                     int rc, uint32_t domid);
> +
> +struct libxl__domain_create_state {
> +    /* filled in by user */
> +    libxl__ao *ao;
> +    libxl_domain_config *guest_config;
> +    int restore_fd;
> +    libxl_console_ready console_cb;
> +    void *console_cb_priv;
> +    libxl__domain_create_cb *callback;
> +    /* private to domain_create */
> +    int guest_domid;
> +    libxl__domain_build_state build_state;
> +    union {
> +        libxl__dm_spawn_state dmss;
> +        libxl__stubdom_spawn_state sdss;
> +    };
> +};
> +
> +
>  /*
>   * Convenience macros.
>   */

Right, back to the top to start on the implementation...



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.