[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save






On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@xxxxxxxxxxxxxx> wrote:
implement remus checkpoint in v2 save

Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
---
Âtools/libxc/saverestore/common.h | Â1 +
Âtools/libxc/saverestore/save.c  | 88 ++++++++++++++++++++++++----------------
Â2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 24ba95b..1dd9f51 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -153,6 +153,7 @@ struct xc_sr_context

  Âxc_dominfo_t dominfo;
  Âbool checkpointed;
+ Â Âbool firsttime;

  Âunion
  Â{
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
index d2fa8a6..98a5c2f 100644
--- a/tools/libxc/saverestore/save.c
+++ b/tools/libxc/saverestore/save.c
@@ -375,6 +375,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
    Âgoto out;
  Â}

+ Â Âif ( ctx->checkpointed && !ctx->firsttime )
+ Â Â Â Âgoto lastiter;

nit: last_iter

I suggest adding a comment before this code block to explain what has happened soÂ
far above why we are jumping to the last_iter block skipping the rest of the stuff below.

I also suggest maintaining some stats structure somewhere (number of dirty pages,Â
time when checkpoint was initiated, etc.). They could be simply debug prints that
can be enabled on demand.

A better alternative would be to somehow pass this stats structure back to libxl,
such that the user can enable/disable stats printing using the xl remus command
for example.
Â
  Â/* This juggling is required if logdirty is already on, e.g. VRAM tracking */
  Âif ( xc_shadow_control(xch, ctx->domid,
              XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
@@ -436,6 +438,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
      Âbreak;
  Â}

+lastiter:
  Ârc = suspend_domain(ctx);
  Âif ( rc )
    Âgoto out;

I hope the postsuspend callbacks are called somewhere?
Â
@@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
  Âif ( rc )
    Âgoto err;

- Â Ârc = ctx->save.ops.start_of_stream(ctx);
- Â Âif ( rc )
- Â Â Â Âgoto err;
+ Â Âdo {
+ Â Â Â Ârc = ctx->save.ops.start_of_stream(ctx);
+ Â Â Â Âif ( rc )
+ Â Â Â Â Â Âgoto err;

- Â Âif ( ctx->save.live )
- Â Â{
- Â Â Â ÂDPRINTF("Starting live migrate");
- Â Â Â Ârc = send_domain_memory_live(ctx);
- Â Â}
- Â Âelse
- Â Â{
- Â Â Â ÂDPRINTF("Starting nonlive save");
- Â Â Â Ârc = send_domain_memory_nonlive(ctx);
- Â Â}
+ Â Â Â Âif ( ctx->save.live )
+ Â Â Â Â{
+ Â Â Â Â Â ÂDPRINTF("Starting live migrate");

I suggest using the ctx->firsttime bool var before this printf, so thatÂ
when debug print is enabled under remus, the console is not
flooded with the same statement that makes no sense past the
first iteration.
Â
+ Â Â Â Â Â Ârc = send_domain_memory_live(ctx);
+ Â Â Â Â}
+ Â Â Â Âelse
+ Â Â Â Â{
+ Â Â Â Â Â ÂDPRINTF("Starting nonlive save");
+ Â Â Â Â Â Ârc = send_domain_memory_nonlive(ctx);
+ Â Â Â Â}

- Â Âif ( rc )
- Â Â Â Âgoto err;
+ Â Â Â Âif ( rc )
+ Â Â Â Â Â Âgoto err;

- Â Â/* Refresh domain information now it has paused. */
- Â Âif ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
- Â Â Â Â (ctx->dominfo.domid != ctx->domid) )
- Â Â{
- Â Â Â ÂPERROR("Unable to refresh domain information");
- Â Â Â Ârc = -1;
- Â Â Â Âgoto err;
- Â Â}
- Â Âelse if ( (!ctx->dominfo.shutdown ||
- Â Â Â Â Â Â Â ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
- Â Â Â Â Â Â Â!ctx->dominfo.paused )
- Â Â{
- Â Â Â ÂERROR("Domain has not been suspended");
- Â Â Â Ârc = -1;
- Â Â Â Âgoto err;
- Â Â}
+ Â Â Â Â/* Refresh domain information now it has paused. */
+ Â Â Â Âif ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
+ Â Â Â Â Â Â (ctx->dominfo.domid != ctx->domid) )
+ Â Â Â Â{
+ Â Â Â Â Â ÂPERROR("Unable to refresh domain information");
+ Â Â Â Â Â Ârc = -1;
+ Â Â Â Â Â Âgoto err;
+ Â Â Â Â}
+ Â Â Â Âelse if ( (!ctx->dominfo.shutdown ||
+ Â Â Â Â Â Â Â Â Âctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
+ Â Â Â Â Â Â Â Â Â!ctx->dominfo.paused )
+ Â Â Â Â{
+ Â Â Â Â Â ÂERROR("Domain has not been suspended");
+ Â Â Â Â Â Ârc = -1;
+ Â Â Â Â Â Âgoto err;
+ Â Â Â Â}


A silly question: Shouldn't this check for 'suspended status' be before the call toÂ
send_domain_memory_live() [under remus]. I am assuming that the
send_domain_memory_live() function is the one that sends the dirty page data
out on the wire.

Â
- Â Ârc = ctx->save.ops.end_of_stream(ctx);
- Â Âif ( rc )
- Â Â Â Âgoto err;
+ Â Â Â Ârc = ctx->save.ops.end_of_stream(ctx);
+ Â Â Â Âif ( rc )
+ Â Â Â Â Â Âgoto err;
+
+ Â Â Â Âif ( ctx->checkpointed ) {
+ Â Â Â Â Â Âif ( ctx->firsttime )
+ Â Â Â Â Â Â Â Âctx->firsttime = false;
+
+ Â Â Â Â Â Âctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+
+ Â Â Â Â Â Ârc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+ Â Â Â Â Â Âif ( rc > 0 ) {
+ Â Â Â Â Â Â Â ÂIPRINTF("Next checkpoint\n");

I suggest maintaining checkpoint numbers instead. Much more helpful. Probably even add
a gettimeofday call to print out the time the new checkpoint is started. Once again, its useful
to be able to control this printout from libxl

+ Â Â Â Â Â Â} else {
+ Â Â Â Â Â Â Â Âctx->checkpointed = false;
+ Â Â Â Â Â Â}
+ Â Â Â Â}
+ Â Â} while ( ctx->checkpointed );

  Ârc = write_end_record(ctx);
  Âif ( rc )
@@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
  Âctx.save.live Â= !!(flags & XCFLAGS_LIVE);
  Âctx.save.debug = !!(flags & XCFLAGS_DEBUG);
  Âctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
+ Â Âctx.firsttime = true;

  Âif ( ctx.checkpointed ) {
    Â/* This is a checkpointed save, we need these callbacks */
--
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.