[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xl: Always use "fast" migration resume protocol



On Mon, 2014-01-13 at 18:15 +0000, Ian Jackson wrote:
> As Ian Campbell writes:

"...in http://bugs.xenproject.org/xen/bug/30"; would be useful here (can
add on commit, no need to resend just for this IMHO)

>   There are two mechanisms by which a suspend can be aborted and the
>   original domain resumed.
> 
>   The older method is that the toolstack resets a bunch of state (see
>   tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then
>   restarts the domain. The domain will see HYPERVISOR_suspend return 0
>   and will continue without any realisation that it is actually
>   running in the original domain and not in a new one. This method is
>   supposed to be implemented by libxl_domain_resume(suspend_cancel=0)
>   but it is not.
> 
>   The other method is newer and in this case the toolstack arranges
>   that HYPERVISOR_suspend returns SUSPEND_CANCEL and restarts it. The
>   domain will observe this and realise that it has been restarted in
>   the same domain and will behave accordingly. This method is
>   implemented, correctly AFAIK, by
>   libxl_domain_resume(suspend_cancel=1).
> 
> Attempting to use the old method without doing all of the work simply
> causes the guest to crash.  Implementing the work required for old
> method, or for checking that domains actually support the new method,
> is not feasible at this stage of the 4.4 release.
> 
> So, always use the new method, without regard to the declarations of
> support by the guest.  This is a strict improvement: guests which do
> in fact support the new method will work, whereas ones which don't are
> no worse off.

I agree with this rationale.

> There are two call sites of libxl_domain_resume that need fixing, both
> in the migration error path.
> 
> With this change I observe a correct and successful resumption of a
> Debian wheezy guest with a Linux 3.4.70 kernel after a migration
> attempt which I arranged to fail by nobbling the block hotplug script.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Acked-by: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

I think you have at least a partial patch ready for 4.5?

> CC: konrad.wilk@xxxxxxxxxx
> CC: David Vrabel <david.vrabel@xxxxxxxxxx>
> CC: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> ---
>  tools/libxl/xl_cmdimpl.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index c30f495..d93e01b 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -3734,7 +3734,7 @@ static void migrate_domain(uint32_t domid, const char 
> *rune, int debug,
>          if (common_domname) {
>              libxl_domain_rename(ctx, domid, away_domname, common_domname);
>          }
> -        rc = libxl_domain_resume(ctx, domid, 0, 0);
> +        rc = libxl_domain_resume(ctx, domid, 1, 0);
>          if (!rc) fprintf(stderr, "migration sender: Resumed OK.\n");
>  
>          fprintf(stderr, "Migration failed due to problems at target.\n");
> @@ -3756,7 +3756,7 @@ static void migrate_domain(uint32_t domid, const char 
> *rune, int debug,
>      close(send_fd);
>      migration_child_report(recv_fd);
>      fprintf(stderr, "Migration failed, resuming at sender.\n");
> -    libxl_domain_resume(ctx, domid, 0, 0);
> +    libxl_domain_resume(ctx, domid, 1, 0);
>      exit(-ERROR_FAIL);
>  
>   failed_badly:



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.