[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] fix Remus failover regression
On 28/07/14 05:03, Yang Hongyang wrote: > commit: c2ba706c > tools/libxc: goto correct label on error paths by Andrew Cooper > broke Remus in Xen 4.4 or earlier versions that has this commit > backported. My appologies for breaking Remus. (it just goes to show how fragile this code is). > > With Remus, this jump essentially discards the current incomplete > checkpoint received by the backup and restore backup from the > last complete checkpoint. > This is required for Remus to work and this does not break live > migration. > It has been around since Xen 4.0. However, it is a genuine bugfix for regular migration, so simply reverting it as this patch does is not appropriate. For regular migration, you absolutely have to goto out; on a failure otherwise the finish code will run and declare the migration a success despite only having half a domain restored. You need something like: if ( !checkpointed_stream ) goto err; /* Remus comment */ goto finish; to deal with the different error handing requirements of remus and regular streams. ~Andrew > > CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > CC: Ian Campbell <ian.campbell@xxxxxxxxxx> > CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > CC: Shriram Rajagopalan <rshriram@xxxxxxxxx> > Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx> > --- > tools/libxc/xc_domain_restore.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c > index e73e0a2..b9a56d5 100644 > --- a/tools/libxc/xc_domain_restore.c > +++ b/tools/libxc/xc_domain_restore.c > @@ -1783,20 +1783,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, > uint32_t dom, > > if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { > PERROR("error when buffering batch, finishing"); > - goto out; > + /* > + * Remus: discard the current incomplete checkpoint and restore > + * backup from the last complete checkpoint. > + */ > + goto finish; > } > memset(&tmptail, 0, sizeof(tmptail)); > tmptail.ishvm = hvm; > if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, > ext_vcpucontext, vcpuextstate_size) < 0 ) { > ERROR ("error buffering image tail, finishing"); > - goto out; > + /* > + * Remus: discard the current incomplete checkpoint and restore > + * backup from the last complete checkpoint. > + */ > + goto finish; > } > tailbuf_free(&tailbuf); > memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); > > goto loadpages; > > + /* With Remus: restore from last complete checkpoint */ > finish: > if ( hvm ) > goto finish_hvm; _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |