[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)
On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx> wrote: > On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote: >> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@xxxxxxxxxx> >> wrote: >>> # HG changeset patch >>> # User Ian Campbell <ian.campbell@xxxxxxxxxx> >>> # Date 1283766891 -3600 >>> # Node ID bdf8ce09160d715451e1204babe5f80886ea6183 >>> # Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735 >>> libxc: provide notification of final checkpoint to restore end >>> >>> When the restore code sees this notification it will restore the >>> currently in-progress checkpoint when it completes. >>> >>> This allows the restore end to finish up without waiting for a >>> spurious timeout on the receive fd and thereby avoids unnecessary >>> error logging in the case of a successful migration or restore. >>> >>> In the normal migration or restore case the first checkpoint is always >>> the last. For a rolling checkpoint (such as Remus) the notification is >>> currently unused but could be used in the future for example to >>> provide a controlled failover for reasons other than error >> >> Unfortunatley, this introduces a backwards-compatibility problem, >> since older versions of the tools never send LAST_CHECKPOINT. > > Out of interest what actually happens? (I think you mentioned something > about failing when restoring from a file?) > > My intention at the time was that in the absence of a LAST_CHECKPOINT > things would continue to behave as before e.g. you would see the > spurious timeout on the receive fd but still restore correctly. > > Perhaps the associated changes to the O_NONBLOCK'ness of the fd at > various stages is what broken this for the non-Remus migrations? > >> Would it make sense to invert the logic on this -- i.e., default to >> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this >> *not* to be the last (either by expecting the checkpoint() function to >> do it, or by sending it from xc_domain_save)? > > I'd quite like to understand what I broke and investigate the > trivialness (or otherwise) of a fix, but this change would probably > independently make sense too. > >> Now that 4.1 is out, we have to consider version compatibilities. But >> we should be able to make it so that there would only be if: >> 1) You're running REMUS > > Is it the case that Remus only cares about checkpointing between like > versions of the toolstack? > Can you please elaborate this statement ? Shriram >> 2) One of your toolstacks is 4.1.0, and one is not. >> >> That seems like it shouldn't be too bad. >> >> Thoughts? >> >> -George >> >>> >>> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> >>> Acked-by: Brendan Cully <brendan@xxxxxxxxx> >>> >>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c >>> --- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 >>> +++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100 >>> @@ -42,6 +42,7 @@ struct restore_ctx { >>> xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */ >>> xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. >>> */ >>> int completed; /* Set when a consistent image is available */ >>> + int last_checkpoint; /* Set when we should commit to the current >>> checkpoint when it completes. */ >>> struct domain_info_context dinfo; >>> }; >>> >>> @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface >>> // DPRINTF("console pfn location: %llx\n", buf->console_pfn); >>> return pagebuf_get_one(xch, ctx, buf, fd, dom); >>> >>> + case XC_SAVE_ID_LAST_CHECKPOINT: >>> + ctx->last_checkpoint = 1; >>> + // DPRINTF("last checkpoint indication received"); >>> + return pagebuf_get_one(xch, ctx, buf, fd, dom); >>> + >>> default: >>> if ( (count > MAX_BATCH_SIZE) || (count < 0) ) { >>> ERROR("Max batch size exceeded (%d). Giving up.", count); >>> @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch, >>> goto out; >>> } >>> ctx->completed = 1; >>> - /* shift into nonblocking mode for the remainder */ >>> - if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) >>> - flags = 0; >>> - fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); >>> + >>> + /* >>> + * If more checkpoints are expected then shift into >>> + * nonblocking mode for the remainder. >>> + */ >>> + if ( !ctx->last_checkpoint ) >>> + { >>> + if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 ) >>> + flags = 0; >>> + fcntl(io_fd, F_SETFL, flags | O_NONBLOCK); >>> + } >>> + } >>> + >>> + if ( ctx->last_checkpoint ) >>> + { >>> + // DPRINTF("Last checkpoint, finishing\n"); >>> + goto finish; >>> } >>> >>> // DPRINTF("Buffered checkpoint\n"); >>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c >>> --- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 >>> +++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100 >>> @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in >>> } >>> } >>> >>> + if ( !callbacks->checkpoint ) >>> + { >>> + /* >>> + * If this is not a checkpointed save then this must be the first >>> and >>> + * last checkpoint. >>> + */ >>> + i = XC_SAVE_ID_LAST_CHECKPOINT; >>> + if ( wrexact(io_fd, &i, sizeof(int)) ) >>> + { >>> + PERROR("Error when writing last checkpoint chunk"); >>> + goto out; >>> + } >>> + } >>> + >>> /* Zero terminate */ >>> i = 0; >>> if ( wrexact(io_fd, &i, sizeof(int)) ) >>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h >>> --- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 >>> +++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100 >>> @@ -131,6 +131,7 @@ >>> #define XC_SAVE_ID_TMEM_EXTRA -6 >>> #define XC_SAVE_ID_TSC_INFO -7 >>> #define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */ >>> +#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after >>> completion of current iteration. */ >>> >>> /* >>> ** We process save/restore/migrate in batches of pages; the below >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>> http://lists.xensource.com/xen-devel >>> > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |