[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote: > # HG changeset patch > # User Ian Campbell <ian.campbell@xxxxxxxxxx> > # Date 1283766891 -3600 > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183 > # Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735 > libxc: provide notification of final checkpoint to restore end > > When the restore code sees this notification it will restore the > currently in-progress checkpoint when it completes. > > This allows the restore end to finish up without waiting for a > spurious timeout on the receive fd and thereby avoids unnecessary > error logging in the case of a successful migration or restore. > > In the normal migration or restore case the first checkpoint is always > the last. For a rolling checkpoint (such as Remus) the notification is > currently unused but could be used in the future for example to > provide a controlled failover for reasons other than error Unfortunatley, this introduces a backwards-compatibility problem, since older versions of the tools never send LAST_CHECKPOINT. Would it make sense to invert the logic on this -- i.e., default to only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this *not* to be the last (either by expecting the checkpoint() function to do it, or by sending it from xc_domain_save)? Now that 4.1 is out, we have to consider version compatibilities. But we should be able to make it so that there would only be if: 1) You're running REMUS 2) One of your toolstacks is 4.1.0, and one is not. That seems like it shouldn't be too bad. Thoughts? -George > > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> > Acked-by: Brendan Cully <brendan@xxxxxxxxx> > > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c > --- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 > +++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 > @@ -42,6 +42,7 @@ struct restore_ctx { > xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */ > xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. > */ > int completed; /* Set when a consistent image is available */ > + int last_checkpoint; /* Set when we should commit to the current > checkpoint when it completes. */ > struct domain_info_context dinfo; > }; > > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface > // DPRINTF("console pfn location: %llx\n", buf->console_pfn); > return pagebuf_get_one(xch, ctx, buf, fd, dom); > > + case XC_SAVE_ID_LAST_CHECKPOINT: > + ctx->last_checkpoint = 1; > + // DPRINTF("last checkpoint indication received"); > + return pagebuf_get_one(xch, ctx, buf, fd, dom); > + > default: > if ( (count > MAX_BATCH_SIZE) || (count < 0) ) { > ERROR("Max batch size exceeded (%d). Giving up.", count); > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch, > goto out; > } > ctx->completed = 1; > - /* shift into nonblocking mode for the remainder */ > - if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) > - flags = 0; > - fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); > + > + /* > + * If more checkpoints are expected then shift into > + * nonblocking mode for the remainder. > + */ > + if ( !ctx->last_checkpoint ) > + { > + if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) > + flags = 0; > + fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); > + } > + } > + > + if ( ctx->last_checkpoint ) > + { > + // DPRINTF("Last checkpoint, finishing\n"); > + goto finish; > } > > // DPRINTF("Buffered checkpoint\n"); > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c > --- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 > +++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in > } > } > > + if ( !callbacks->checkpoint ) > + { > + /* > + * If this is not a checkpointed save then this must be the first and > + * last checkpoint. > + */ > + i = XC_SAVE_ID_LAST_CHECKPOINT; > + if ( wrexact(io_fd, &i, sizeof(int)) ) > + { > + PERROR("Error when writing last checkpoint chunk"); > + goto out; > + } > + } > + > /* Zero terminate */ > i = 0; > if ( wrexact(io_fd, &i, sizeof(int)) ) > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h > --- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 > +++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 > @@ -131,6 +131,7 @@ > #define XC_SAVE_ID_TMEM_EXTRA -6 > #define XC_SAVE_ID_TSC_INFO -7 > #define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */ > +#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after > completion of current iteration. */ > > /* > ** We process save/restore/migrate in batches of pages; the below > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |