|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl
On 27/03/17 10:06, Joshua Otto wrote:
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded. This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
> mechanism of the precopy stage
> - it can't take into account facts external to the immediate
> migration context, such as interactive user input or the passage of
> wall-clock time
> - it does not permit the user to change their mind, over time, about
> what to do at the end of the precopy (they get an unconditional
> transition into the stop-and-copy phase of the migration)
>
> To permit users to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
> callbacks
> - during the precopy phase of live migrations, consult this policy after
> each batch of pages transmitted and take the dictated action, which
> may be to a) abort the migration entirely, b) continue with the
> precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy as such a callback in
> libxl and plumb it through the IPC machinery to libxc, effectively
> maintaing the old policy for now
>
> Signed-off-by: Joshua Otto <jtotto@xxxxxxxxxxxx>
This patch should be split into two. One modifying libxc to use struct
precopy_stats, and a second to wire up the RPC call.
> ---
> tools/libxc/include/xenguest.h | 23 ++++-
> tools/libxc/xc_nomigrate.c | 3 +-
> tools/libxc/xc_sr_common.h | 7 +-
> tools/libxc/xc_sr_save.c | 194
> ++++++++++++++++++++++++++-----------
> tools/libxl/libxl_dom_save.c | 20 ++++
> tools/libxl/libxl_save_callout.c | 3 +-
> tools/libxl/libxl_save_helper.c | 7 +-
> tools/libxl/libxl_save_msgs_gen.pl | 4 +-
> 8 files changed, 189 insertions(+), 72 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index aa8cc8b..30ffb6f 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,14 @@
> */
> struct xenevtchn_handle;
>
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> + unsigned iteration;
> + unsigned total_written;
> + long dirty_count; /* -1 if unknown */
total_written and dirty_count are liable to be equal, so having them as
different widths of integer clearly can't be correct.
> +};
> +
> /* callbacks provided by xc_domain_save */
> struct save_callbacks {
> /* Called after expiration of checkpoint interval,
> @@ -46,6 +54,17 @@ struct save_callbacks {
> */
> int (*suspend)(void* data);
>
> + /* Called after every batch of page data sent during the precopy phase
> of a
> + * live migration to ask the caller what to do next based on the current
> + * state of the precopy migration.
> + */
> +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely and
> + * tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */
> +#define XGS_POLICY_STOP_AND_COPY 1 /* Immediately suspend and transmit
> the
> + * remaining dirty pages. */
> + int (*precopy_policy)(struct precopy_stats stats, void *data);
Structures shouldn't be passed by value like this, as the compiler has
to do a lot of memcpy() work to make it happen. You should pass by
const pointer, as (as far as I can tell), they are strictly read-only to
the implementation of this hook?
> +
> /* Called after the guest's dirty pages have been
> * copied into an output buffer.
> * Callback function resumes the guest & the device model,
> @@ -100,8 +119,8 @@ typedef enum {
> * doesn't use checkpointing
> * @return 0 on success, -1 on failure
> */
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t
> max_iters,
> - uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
> + uint32_t flags /* XCFLAGS_xxx */,
> struct save_callbacks* callbacks, int hvm,
> xc_migration_stream_t stream_type, int recv_fd);
It would be cleaner for existing callers, and to extend in the future,
to encapsulate all of these parameters in a struct domain_save_params
and pass it by pointer to here.
That way, we'd avoid the situation we currently have where some
information is passed in bitfields in a single parameter, whereas other
booleans are passed as integers.
The hvm parameter specifically is useless, and can be removed by
rearranging the sanity checks until after the xc_domain_getinfo() call.
>
> diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
> index 15c838f..2af64e4 100644
> --- a/tools/libxc/xc_nomigrate.c
> +++ b/tools/libxc/xc_nomigrate.c
> @@ -20,8 +20,7 @@
> #include <xenctrl.h>
> #include <xenguest.h>
>
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t
> max_iters,
> - uint32_t max_factor, uint32_t flags,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t
> flags,
> struct save_callbacks* callbacks, int hvm,
> xc_migration_stream_t stream_type, int recv_fd)
> {
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index b1aa88e..a9160bd 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -198,12 +198,11 @@ struct xc_sr_context
> /* Further debugging information in the stream. */
> bool debug;
>
> - /* Parameters for tweaking live migration. */
> - unsigned max_iterations;
> - unsigned dirty_threshold;
> -
> unsigned long p2m_size;
>
> + struct precopy_stats stats;
> + int policy_decision;
> +
> xen_pfn_t *batch_pfns;
> unsigned nr_batch_pfns;
> unsigned long *deferred_pages;
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 797aec5..eb95334 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -271,13 +271,29 @@ static int write_batch(struct xc_sr_context *ctx)
> }
>
> /*
> + * Test if the batch is full.
> + */
> +static bool batch_full(struct xc_sr_context *ctx)
const struct xc_sr_context *ctx
This is a predicate, after all.
> +{
> + return ctx->save.nr_batch_pfns == MAX_BATCH_SIZE;
> +}
> +
> +/*
> + * Test if the batch is empty.
> + */
> +static bool batch_empty(struct xc_sr_context *ctx)
> +{
> + return ctx->save.nr_batch_pfns == 0;
> +}
> +
> +/*
> * Flush a batch of pfns into the stream.
> */
> static int flush_batch(struct xc_sr_context *ctx)
> {
> int rc = 0;
>
> - if ( ctx->save.nr_batch_pfns == 0 )
> + if ( batch_empty(ctx) )
> return rc;
>
> rc = write_batch(ctx);
> @@ -293,19 +309,12 @@ static int flush_batch(struct xc_sr_context *ctx)
> }
>
> /*
> - * Add a single pfn to the batch, flushing the batch if full.
> + * Add a single pfn to the batch.
> */
> -static int add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
> +static void add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
> {
> - int rc = 0;
> -
> - if ( ctx->save.nr_batch_pfns == MAX_BATCH_SIZE )
> - rc = flush_batch(ctx);
> -
> - if ( rc == 0 )
> - ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
> -
> - return rc;
> + assert(ctx->save.nr_batch_pfns < MAX_BATCH_SIZE);
> + ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
> }
>
> /*
> @@ -352,10 +361,15 @@ static int suspend_domain(struct xc_sr_context *ctx)
> * Send a subset of pages in the guests p2m, according to the dirty bitmap.
> * Used for each subsequent iteration of the live migration loop.
> *
> + * During the precopy stage of a live migration, test the user-supplied
> + * policy function after each batch of pages and cut off the operation
> + * early if indicated. Unless aborting, the dirty pages remaining in this
> round
> + * are transferred into the deferred_pages bitmap.
Is this actually a sensible thing to do? On iteration 0, this is going
to be a phenomenal number of RPC calls, which are all going to make the
same decision.
> + *
> * Bitmap is bounded by p2m_size.
> */
> static int send_dirty_pages(struct xc_sr_context *ctx,
> - unsigned long entries)
> + unsigned long entries, bool precopy)
Shouldn't this precopy boolean be some kind of state variable in ctx ?
> {
> xc_interface *xch = ctx->xch;
> xen_pfn_t p;
> @@ -364,31 +378,57 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
> DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> &ctx->save.dirty_bitmap_hbuf);
>
> - for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
> + int (*precopy_policy)(struct precopy_stats, void *) =
> + ctx->save.callbacks->precopy_policy;
> + void *data = ctx->save.callbacks->data;
> +
> + assert(batch_empty(ctx));
> + for ( p = 0, written = 0; p < ctx->save.p2m_size; )
This looks suspicious without an increment. Conceptually, it might be
better as a do {} while ( decision == XGS_POLICY_CONTINUE_PRECOPY ); loop?
> {
> - if ( !test_bit(p, dirty_bitmap) )
> - continue;
> + if ( ctx->save.live && precopy )
> + {
> + ctx->save.policy_decision = precopy_policy(ctx->save.stats,
> data);
Newline here please.
> + if ( ctx->save.policy_decision == XGS_POLICY_ABORT )
> + {
Please but a log message here indicating that abort has been requested.
Otherwise, the migration will give up with a failure and no obvious
indication why.
> + return -1;
> + }
> + else if ( ctx->save.policy_decision !=
> XGS_POLICY_CONTINUE_PRECOPY )
> + {
> + /* Any outstanding dirty pages are now deferred until the
> next
> + * phase of the migration. */
/*
* The comment style for multiline comments
* is like this.
*/
> + bitmap_or(ctx->save.deferred_pages, dirty_bitmap,
> + ctx->save.p2m_size);
> + if ( entries > written )
> + ctx->save.nr_deferred_pages += entries - written;
> +
> + goto done;
> + }
> + }
>
> - rc = add_to_batch(ctx, p);
> + for ( ; p < ctx->save.p2m_size && !batch_full(ctx); ++p )
> + {
> + if ( test_and_clear_bit(p, dirty_bitmap) )
> + {
> + add_to_batch(ctx, p);
> + ++written;
> + ++ctx->save.stats.total_written;
> + }
> + }
> +
> + rc = flush_batch(ctx);
> if ( rc )
> return rc;
>
> - /* Update progress every 4MB worth of memory sent. */
> - if ( (written & ((1U << (22 - 12)) - 1)) == 0 )
> - xc_report_progress_step(xch, written, entries);
> -
> - ++written;
> + /* Update progress after every batch (4MB) worth of memory sent. */
> + xc_report_progress_step(xch, written, entries);
> }
>
> - rc = flush_batch(ctx);
> - if ( rc )
> - return rc;
> -
> if ( written > entries )
> DPRINTF("Bitmap contained more entries than expected...");
>
> xc_report_progress_step(xch, entries, entries);
>
> + done:
> return ctx->save.ops.check_vm_state(ctx);
> }
>
> @@ -396,14 +436,14 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
> * Send all pages in the guests p2m. Used as the first iteration of the live
> * migration loop, and for a non-live save.
> */
> -static int send_all_pages(struct xc_sr_context *ctx)
> +static int send_all_pages(struct xc_sr_context *ctx, bool precopy)
> {
> DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> &ctx->save.dirty_bitmap_hbuf);
>
> bitmap_set(dirty_bitmap, ctx->save.p2m_size);
>
> - return send_dirty_pages(ctx, ctx->save.p2m_size);
> + return send_dirty_pages(ctx, ctx->save.p2m_size, precopy);
> }
>
> static int enable_logdirty(struct xc_sr_context *ctx)
> @@ -446,8 +486,7 @@ static int update_progress_string(struct xc_sr_context
> *ctx,
> xc_interface *xch = ctx->xch;
> char *new_str = NULL;
>
> - if ( asprintf(&new_str, "Frames iteration %u of %u",
> - iter, ctx->save.max_iterations) == -1 )
> + if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
> {
> PERROR("Unable to allocate new progress string");
> return -1;
> @@ -468,20 +507,47 @@ static int send_memory_live(struct xc_sr_context *ctx)
> xc_interface *xch = ctx->xch;
> xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
> char *progress_str = NULL;
> - unsigned x;
> int rc;
>
> + DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> + &ctx->save.dirty_bitmap_hbuf);
> +
> + int (*precopy_policy)(struct precopy_stats, void *) =
> + ctx->save.callbacks->precopy_policy;
> + void *data = ctx->save.callbacks->data;
> +
> rc = update_progress_string(ctx, &progress_str, 0);
> if ( rc )
> goto out;
>
> - rc = send_all_pages(ctx);
> +#define CONSULT_POLICY
> \
:(
The reason this code is readable and (hopefully) easy to follow, is due
in large part to a lack of macros like this trying to hide what is
actually going on.
> + do {
> \
> + if ( ctx->save.policy_decision == XGS_POLICY_ABORT )
> \
> + {
> \
> + rc = -1;
> \
> + goto out;
> \
> + }
> \
> + else if ( ctx->save.policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> \
> + {
> \
> + rc = 0;
> \
> + goto out;
> \
> + }
> \
> + } while (0)
> +
> + ctx->save.stats = (struct precopy_stats)
> + {
> + .iteration = 0,
> + .total_written = 0,
> + .dirty_count = -1
> + };
> + rc = send_all_pages(ctx, /* precopy */ true);
> if ( rc )
> goto out;
>
> - for ( x = 1;
> - ((x < ctx->save.max_iterations) &&
> - (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
> + /* send_all_pages() has updated the stats */
> + CONSULT_POLICY;
> +
> + for ( ctx->save.stats.iteration = 1; ; ++ctx->save.stats.iteration )
Again, without an exit condition, this looks suspicious.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |