[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy



> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> Jennifer Herbert
> Sent: 19 September 2017 19:06
> To: Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>;
> xen-devel@xxxxxxxxxxxxxxxxxxxx; jtotto@xxxxxxxxxxxx
> Cc: Jennifer Herbert <jennifer.herbert@xxxxxxxxxx>
> Subject: [Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy
> 
> This Patch allows a migration precopy policy to be specified.
> 
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such external state transfer state, interactive
>   user input, or the passage of wall-clock time.
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
> 
> To permit callers to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy, used when
>   precopy_policy callback  is not provided.
> 
> Signed-off-by: Jennifer Herbert <Jennifer.Herbert@xxxxxxxxxx>
> Signed-off-by: Joshua Otto <jtotto@xxxxxxxxxxxx>
> 
> ---
> 
> v2:
> 
> Have made a few formatting corrections, added typedef as suggested.
> 
> v1:
> 
> This is updated/modified subset of patch 7/20, part of
> Joshua Otto's "Add postcopy live migration support." patch,
> dated 27th March 2017.  As indicated on the original thread,
> I wish to make use of this this within the XenServer product.
> I hope this will aid Josh in pushing the remainder of his series.
> 
> ---
>  tools/libxc/include/xenguest.h | 31 ++++++++++++--
>  tools/libxc/xc_sr_common.h     |  6 +--
>  tools/libxc/xc_sr_save.c       | 97 +++++++++++++++++++++++++++++--------
> -----
>  3 files changed, 97 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 6626f0c..a2a654c 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,16 @@
>   */
>  struct xenevtchn_handle;
> 
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> +    unsigned int iteration;
> +    unsigned int total_written;
> +    long dirty_count; /* -1 if unknown */
> +};
> +
> +typedef int (*precopy_policy_t)(struct precopy_stats, void *);
> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>      /* Called after expiration of checkpoint interval,
> @@ -46,7 +56,20 @@ struct save_callbacks {
>       */
>      int (*suspend)(void* data);
> 
> -    /* Called after the guest's dirty pages have been
> +    /*
> +     * Called after every batch of page data sent during the precopy
> +     * phase of a live migration to ask the caller what to do next
> +     * based on the current state of the precopy migration.
> +     */
> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
> +                                        * and tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy
> phase. */
> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and
> transmit the
> +                                        * remaining dirty pages. */
> +    precopy_policy_t precopy_policy;
> +
> +    /*
> +     * Called after the guest's dirty pages have been
>       *  copied into an output buffer.
>       * Callback function resumes the guest & the device model,
>       *  returns to xc_domain_save.
> @@ -55,7 +78,8 @@ struct save_callbacks {
>       */
>      int (*postcopy)(void* data);
> 
> -    /* Called after the memory checkpoint has been flushed
> +    /*
> +     * Called after the memory checkpoint has been flushed
>       * out into the network. Typical actions performed in this
>       * callback include:
>       *   (a) send the saved device model state (for HVM guests),
> @@ -65,7 +89,8 @@ struct save_callbacks {
>       *
>       * returns:
>       * 0: terminate checkpointing gracefully
> -     * 1: take another checkpoint */
> +     * 1: take another checkpoint
> +     */
>      int (*checkpoint)(void* data);
> 
>      /*
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index a83f22a..3635704 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -198,12 +198,10 @@ struct xc_sr_context
>              /* Further debugging information in the stream. */
>              bool debug;
> 
> -            /* Parameters for tweaking live migration. */
> -            unsigned max_iterations;
> -            unsigned dirty_threshold;
> -
>              unsigned long p2m_size;
> 
> +            struct precopy_stats stats;
> +
>              xen_pfn_t *batch_pfns;
>              unsigned nr_batch_pfns;
>              unsigned long *deferred_pages;
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 1e7502d..f58c008 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -452,8 +452,7 @@ static int update_progress_string(struct
> xc_sr_context *ctx,
>      xc_interface *xch = ctx->xch;
>      char *new_str = NULL;
> 
> -    if ( asprintf(&new_str, "Frames iteration %u of %u",
> -                  iter, ctx->save.max_iterations) == -1 )
> +    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
>      {
>          PERROR("Unable to allocate new progress string");
>          return -1;
> @@ -467,6 +466,24 @@ static int update_progress_string(struct
> xc_sr_context *ctx,
>  }
> 
>  /*
> + * This is the live migration precopy policy - it's called periodically 
> during
> + * the precopy phase of live migrations, and is responsible for deciding
> when
> + * the precopy phase should terminate and what should be done next.
> + *
> + * The policy implemented here behaves identically to the policy previously
> + * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy
> phase of
> + * the live migration when there are either fewer than 50 dirty pages, or
> more
> + * than 5 precopy rounds have completed.
> + */
> +static int simple_precopy_policy(struct precopy_stats stats, void *user)
> +{
> +    return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
> +            stats.iteration >= 5)
> +        ? XGS_POLICY_STOP_AND_COPY
> +        : XGS_POLICY_CONTINUE_PRECOPY;
> +}
> +
> +/*
>   * Send memory while guest is running.
>   */
>  static int send_memory_live(struct xc_sr_context *ctx)
> @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context
> *ctx)
>      xc_interface *xch = ctx->xch;
>      xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>      char *progress_str = NULL;
> -    unsigned x;
> +    unsigned int x = 0;
>      int rc;
> +    int policy_decision;
> +
> +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> +                                    &ctx->save.dirty_bitmap_hbuf);
> +
> +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
> +    struct precopy_stats *policy_stats;
> 
>      rc = update_progress_string(ctx, &progress_str, 0);
>      if ( rc )
>          goto out;
> 
> -    rc = send_all_pages(ctx);
> -    if ( rc )
> -        goto out;
> +    ctx->save.stats = (struct precopy_stats)
> +        { .dirty_count   = ctx->save.p2m_size };
> +    policy_stats = &ctx->save.stats;
> +
> +    if ( precopy_policy == NULL )
> +         precopy_policy = simple_precopy_policy;
> +
> +    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
> +
> +    do {
> +        policy_decision = precopy_policy(*policy_stats, data);
> +        x++;
> +
> +        if ( stats.dirty_count > 0 && policy_decision != XGS_POLICY_ABORT )
> +        {
> +            rc = update_progress_string(ctx, &progress_str, x);
> +            if ( rc )
> +                goto out;
> +
> +            rc = send_dirty_pages(ctx, stats.dirty_count);
> +            if ( rc )
> +                goto out;
> +        }
> +
> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +            break;
> +
> +        policy_stats->iteration     = x;
> +        policy_stats->total_written += policy_stats->dirty_count;
> +        policy_stats->dirty_count   = -1;
> +
> +        policy_decision = precopy_policy(*policy_stats, data);
> +
> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +           break;
> 
> -    for ( x = 1;
> -          ((x < ctx->save.max_iterations) &&
> -           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
> -    {
>          if ( xc_shadow_control(
>                   xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
>                   &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
> @@ -499,17 +553,9 @@ static int send_memory_live(struct xc_sr_context
> *ctx)
>              goto out;
>          }
> 
> -        if ( stats.dirty_count == 0 )
> -            break;
> +        policy_stats->dirty_count = stats.dirty_count;
> 
> -        rc = update_progress_string(ctx, &progress_str, x);
> -        if ( rc )
> -            goto out;
> -
> -        rc = send_dirty_pages(ctx, stats.dirty_count);
> -        if ( rc )
> -            goto out;
> -    }
> +    } while ( true );

I'm sure any compiler worth its salt will optimise to an unconditional jump, 
but I tend to prefer using for (;;) for infinite loops. Doesn't really matter 
so...

Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>

> 
>   out:
>      xc_set_progress_prefix(xch, NULL);
> @@ -601,7 +647,7 @@ static int suspend_and_send_dirty(struct
> xc_sr_context *ctx)
>      if ( ctx->save.live )
>      {
>          rc = update_progress_string(ctx, &progress_str,
> -                                    ctx->save.max_iterations);
> +                                    ctx->save.stats.iteration);
>          if ( rc )
>              goto out;
>      }
> @@ -937,15 +983,6 @@ int xc_domain_save(xc_interface *xch, int io_fd,
> uint32_t dom,
>             stream_type == XC_MIG_STREAM_REMUS ||
>             stream_type == XC_MIG_STREAM_COLO);
> 
> -    /*
> -     * TODO: Find some time to better tweak the live migration algorithm.
> -     *
> -     * These parameters are better than the legacy algorithm especially for
> -     * busy guests.
> -     */
> -    ctx.save.max_iterations = 5;
> -    ctx.save.dirty_threshold = 50;
> -
>      /* Sanity checks for callbacks. */
>      if ( hvm )
>          assert(callbacks->switch_qemu_logdirty);
> --
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.