|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] fix Remus failover regression
On 28/07/14 05:03, Yang Hongyang wrote:
> commit: c2ba706c
> tools/libxc: goto correct label on error paths by Andrew Cooper
> broke Remus in Xen 4.4 or earlier versions that has this commit
> backported.
My appologies for breaking Remus. (it just goes to show how fragile this
code is).
>
> With Remus, this jump essentially discards the current incomplete
> checkpoint received by the backup and restore backup from the
> last complete checkpoint.
> This is required for Remus to work and this does not break live
> migration.
> It has been around since Xen 4.0.
However, it is a genuine bugfix for regular migration, so simply
reverting it as this patch does is not appropriate.
For regular migration, you absolutely have to goto out; on a failure
otherwise the finish code will run and declare the migration a success
despite only having half a domain restored.
You need something like:
if ( !checkpointed_stream )
goto err;
/* Remus comment */
goto finish;
to deal with the different error handing requirements of remus and
regular streams.
~Andrew
>
> CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: Shriram Rajagopalan <rshriram@xxxxxxxxx>
> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
> ---
> tools/libxc/xc_domain_restore.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index e73e0a2..b9a56d5 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1783,20 +1783,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd,
> uint32_t dom,
>
> if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
> PERROR("error when buffering batch, finishing");
> - goto out;
> + /*
> + * Remus: discard the current incomplete checkpoint and restore
> + * backup from the last complete checkpoint.
> + */
> + goto finish;
> }
> memset(&tmptail, 0, sizeof(tmptail));
> tmptail.ishvm = hvm;
> if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
> ext_vcpucontext, vcpuextstate_size) < 0 ) {
> ERROR ("error buffering image tail, finishing");
> - goto out;
> + /*
> + * Remus: discard the current incomplete checkpoint and restore
> + * backup from the last complete checkpoint.
> + */
> + goto finish;
> }
> tailbuf_free(&tailbuf);
> memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
>
> goto loadpages;
>
> + /* With Remus: restore from last complete checkpoint */
> finish:
> if ( hvm )
> goto finish_hvm;
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |