|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 08/19] libxl: wait for qemu to acknowledge logdirty command
On Fri, 2012-06-08 at 18:34 +0100, Ian Jackson wrote:
> The current migration code in libxl instructs qemu to start or stop
> logdirty, but it does not wait for an acknowledgement from qemu before
> continuing. This might lead to memory corruption (!)
>
> Fix this by waiting for qemu to acknowledge the command.
>
> Unfortunately the necessary ao arrangements for waiting for this
> command are unique because qemu has a special protocol for this
> particular operation.
>
> Also, this change means that the switch_qemu_logdirty callback
> implementation in libxl can no longer synchronously produce its return
> value, as it now needs to wait for xenstore. So we tell the
> marshalling code generator that it is a message which does not need a
> reply. This turns the callback function called by the marshaller into
> one which returns void; the callback function arranges to later
> explicitly sends the reply to the helper, when the xs watch triggers
> and the appropriate value is read from xenstore.
>
> Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> ---
> tools/libxl/libxl_dom.c | 176
> +++++++++++++++++++++++++++++++++---
> tools/libxl/libxl_internal.h | 18 ++++-
> tools/libxl/libxl_save_callout.c | 8 ++
> tools/libxl/libxl_save_msgs_gen.pl | 7 +-
> 4 files changed, 193 insertions(+), 16 deletions(-)
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index a597627..d5ac79f 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -528,30 +528,180 @@ int libxl__toolstack_restore(uint32_t domid, const
> uint8_t *buf,
> static void domain_suspend_done(libxl__egc *egc,
> libxl__domain_suspend_state *dss, int rc);
>
> -/*----- callbacks, called by xc_domain_save -----*/
> +/*----- complicated callback, called by xc_domain_save -----*/
> +
> +/*
> + * We implement the other end of protocol for controlling qemu-dm's
> + * logdirty. There is no documentation for this protocol, but our
> + * counterparty's implementation is in
> + * qemu-xen-traditional.git:xenstore.c in the function
> + * xenstore_process_logdirty_event
> + */
> +
> +static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
> + const struct timeval *requested_abs);
> +static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
> + const char *watch_path, const char *event_path);
> +static void switch_logdirty_done(libxl__egc *egc,
> + libxl__domain_suspend_state *dss, int ok);
>
> -int libxl__domain_suspend_common_switch_qemu_logdirty
> +static void logdirty_init(libxl__logdirty_switch *lds)
> +{
> + lds->cmd_path = 0;
> + libxl__ev_xswatch_init(&lds->watch);
> + libxl__ev_time_init(&lds->timeout);
I meant to mention this once before when reviewing some patch or other
but I'm not sure I actually did, so at the risk of repeating myself:
This watch with timeout pattern seems to be reasonably common (in fact,
I'm not sure we have any watches without a timeout). It might be a
candidate for a specific helper routine in the future?
[...]
> +static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
> + const struct timeval *requested_abs)
> +{
> + libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss,
> logdirty.timeout);
> + STATE_AO_GC(dss->ao);
> + LOG(ERROR,"logdirty switch: wait for device model timed out");
> + switch_logdirty_done(egc,dss,0);
> }
>
> +static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch
> *watch,
> + const char *watch_path, const char *event_path)
> +{
> + libxl__domain_suspend_state *dss =
> + CONTAINER_OF(watch, *dss, logdirty.watch);
> + libxl__logdirty_switch *lds = &dss->logdirty;
> + STATE_AO_GC(dss->ao);
> + const char *got;
> + xs_transaction_t t = 0;
> + int rc;
> +
> + for (;;) {
> + rc = libxl__xs_transaction_start(gc, &t);
> + if (rc) goto out;
> +
> + rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got);
> + if (rc) goto out;
> +
> + if (!got) {
> + rc = +1;
> + goto out;
> + }
> +
> + if (strcmp(got, lds->cmd)) {
> + LOG(ERROR,"logdirty switch: sent command `%s' but got reply `%s'"
> + " (xenstore paths `%s' / `%s')", lds->cmd, got,
> + lds->cmd_path, lds->ret_path);
> + rc = ERROR_FAIL;
> + goto out;
> + }
> +
> + rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
> + if (rc) goto out;
> +
> + rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
> + if (rc) goto out;
> +
> + rc = libxl__xs_transaction_commit(gc, &t);
> + if (!rc) break;
> + if (rc<0) goto out;
> + }
> +
> + out:
> + /* rc < 0: error
> + * rc == 0: ok, we are done
> + * rc == +1: need to keep waiting
> + */
> + libxl__xs_transaction_abort(gc, &t);
> +
> + if (!rc) {
> + switch_logdirty_done(egc,dss,1);
> + } else if (rc < 0) {
> + LOG(ERROR,"logdirty switch: response read failed (rc=%d)",rc);
Is it only "response read failed" which can cause us to end up here,
looks like the read, rm etc could do it?
Anyway, I've got no substantive comments so:
Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> + switch_logdirty_done(egc,dss,0);
> + }
> +}
[...]
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |