[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks



On Fri, Nov 1, 2013 at 11:28 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Shriram Rajagopalan writes ("[PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks"):
> tools/libxl: Control network buffering in remus callbacks
...
>  static int libxl__remus_domain_suspend_callback(void *data)
>  {
> -    /* REMUS TODO: Issue disk and network checkpoint reqs. */
> -    return libxl__domain_suspend_common_callback(data);
> +    /* REMUS TODO: Issue disk checkpoint reqs. */
> +    libxl__save_helper_state *shs = data;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
> +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> +    bool is_suspended;
> +    STATE_AO_GC(dss->ao);
> +
> +    is_suspended = !!libxl__domain_suspend_common_callback(data);

I think this function would be less confusing if the return value
variable was called "ok" or something similar.  It's really just an
error flag.

Also AFAICT the !! has no function here.  The return value is
essentially a boolean.

> +    if (!remus_ctx->netbuf_ctx) return is_suspended;
> +
> +    if (is_suspended) {
> +        if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid,
> +                                                remus_ctx))
> +            return !is_suspended;

This construction is rather odd.  Why return !is_suspended when you
know that !!is_suspended ?


If a new network buffer cannot be created (for the next epoch), then
return an error. This ends up terminating the checkpointing process.

 
> @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi
>  static void remus_checkpoint_dm_saved(libxl__egc *egc,
>                                        libxl__domain_suspend_state *dss, int rc)
>  {
> -    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
> -    /* REMUS TODO: make this asynchronous */
> -    assert(!rc); /* REMUS TODO handle this error properly */
> -    usleep(dss->remus_ctx->interval * 1000);
> -    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
> +    /*
> +     * REMUS TODO: Wait for disk and explicit memory ack (through restore
> +     * callback from remote) before releasing network buffer.
> +     */
> +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> +    struct timespec epoch;
> +    int do_next_iter = 0;

Again, this variable is probably best called "ok".

> +    epoch.tv_sec = remus_ctx->interval / 1000; /* interval is in ms */
> +    epoch.tv_nsec = remus_ctx->interval * 1000L * 1000L;
> +    nanosleep(&epoch, 0);

This is no good, I'm afraid.  You should be using a libxl event loop
timer.  (Also why are you changing your existing usleep to this new
nanosleep which seems functionally very similar?)

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.