[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] fix Remus failover regression
Hi Andrew, On 07/28/2014 05:24 PM, Andrew Cooper wrote: On 28/07/14 05:03, Yang Hongyang wrote:commit: c2ba706c tools/libxc: goto correct label on error paths by Andrew Cooper broke Remus in Xen 4.4 or earlier versions that has this commit backported.My appologies for breaking Remus. (it just goes to show how fragile this code is).With Remus, this jump essentially discards the current incomplete checkpoint received by the backup and restore backup from the last complete checkpoint. This is required for Remus to work and this does not break live migration. It has been around since Xen 4.0.However, it is a genuine bugfix for regular migration, so simply reverting it as this patch does is not appropriate. For regular migration, you absolutely have to goto out; on a failure otherwise the finish code will run and declare the migration a success despite only having half a domain restored. I think regular migration shouldn't run into this path (see what I commented in v1), but I agree that add a check will be better. You need something like: if ( !checkpointed_stream ) goto err; /* Remus comment */ goto finish; to deal with the different error handing requirements of remus and regular streams. ~AndrewCC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> CC: Ian Campbell <ian.campbell@xxxxxxxxxx> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> CC: Shriram Rajagopalan <rshriram@xxxxxxxxx> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx> --- tools/libxc/xc_domain_restore.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index e73e0a2..b9a56d5 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -1783,20 +1783,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { PERROR("error when buffering batch, finishing"); - goto out; + /* + * Remus: discard the current incomplete checkpoint and restore + * backup from the last complete checkpoint. + */ + goto finish; } memset(&tmptail, 0, sizeof(tmptail)); tmptail.ishvm = hvm; if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, ext_vcpucontext, vcpuextstate_size) < 0 ) { ERROR ("error buffering image tail, finishing"); - goto out; + /* + * Remus: discard the current incomplete checkpoint and restore + * backup from the last complete checkpoint. + */ + goto finish; } tailbuf_free(&tailbuf); memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); goto loadpages; + /* With Remus: restore from last complete checkpoint */ finish: if ( hvm ) goto finish_hvm;. -- Thanks, Yang. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |