[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 08/19] libxl: wait for qemu to acknowledge logdirty command
On Fri, 2012-06-08 at 18:34 +0100, Ian Jackson wrote: > The current migration code in libxl instructs qemu to start or stop > logdirty, but it does not wait for an acknowledgement from qemu before > continuing. This might lead to memory corruption (!) > > Fix this by waiting for qemu to acknowledge the command. > > Unfortunately the necessary ao arrangements for waiting for this > command are unique because qemu has a special protocol for this > particular operation. > > Also, this change means that the switch_qemu_logdirty callback > implementation in libxl can no longer synchronously produce its return > value, as it now needs to wait for xenstore. So we tell the > marshalling code generator that it is a message which does not need a > reply. This turns the callback function called by the marshaller into > one which returns void; the callback function arranges to later > explicitly sends the reply to the helper, when the xs watch triggers > and the appropriate value is read from xenstore. > > Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > --- > tools/libxl/libxl_dom.c | 176 > +++++++++++++++++++++++++++++++++--- > tools/libxl/libxl_internal.h | 18 ++++- > tools/libxl/libxl_save_callout.c | 8 ++ > tools/libxl/libxl_save_msgs_gen.pl | 7 +- > 4 files changed, 193 insertions(+), 16 deletions(-) > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index a597627..d5ac79f 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -528,30 +528,180 @@ int libxl__toolstack_restore(uint32_t domid, const > uint8_t *buf, > static void domain_suspend_done(libxl__egc *egc, > libxl__domain_suspend_state *dss, int rc); > > -/*----- callbacks, called by xc_domain_save -----*/ > +/*----- complicated callback, called by xc_domain_save -----*/ > + > +/* > + * We implement the other end of protocol for controlling qemu-dm's > + * logdirty. There is no documentation for this protocol, but our > + * counterparty's implementation is in > + * qemu-xen-traditional.git:xenstore.c in the function > + * xenstore_process_logdirty_event > + */ > + > +static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev, > + const struct timeval *requested_abs); > +static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*, > + const char *watch_path, const char *event_path); > +static void switch_logdirty_done(libxl__egc *egc, > + libxl__domain_suspend_state *dss, int ok); > > -int libxl__domain_suspend_common_switch_qemu_logdirty > +static void logdirty_init(libxl__logdirty_switch *lds) > +{ > + lds->cmd_path = 0; > + libxl__ev_xswatch_init(&lds->watch); > + libxl__ev_time_init(&lds->timeout); I meant to mention this once before when reviewing some patch or other but I'm not sure I actually did, so at the risk of repeating myself: This watch with timeout pattern seems to be reasonably common (in fact, I'm not sure we have any watches without a timeout). It might be a candidate for a specific helper routine in the future? [...] > +static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev, > + const struct timeval *requested_abs) > +{ > + libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, > logdirty.timeout); > + STATE_AO_GC(dss->ao); > + LOG(ERROR,"logdirty switch: wait for device model timed out"); > + switch_logdirty_done(egc,dss,0); > } > > +static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch > *watch, > + const char *watch_path, const char *event_path) > +{ > + libxl__domain_suspend_state *dss = > + CONTAINER_OF(watch, *dss, logdirty.watch); > + libxl__logdirty_switch *lds = &dss->logdirty; > + STATE_AO_GC(dss->ao); > + const char *got; > + xs_transaction_t t = 0; > + int rc; > + > + for (;;) { > + rc = libxl__xs_transaction_start(gc, &t); > + if (rc) goto out; > + > + rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got); > + if (rc) goto out; > + > + if (!got) { > + rc = +1; > + goto out; > + } > + > + if (strcmp(got, lds->cmd)) { > + LOG(ERROR,"logdirty switch: sent command `%s' but got reply `%s'" > + " (xenstore paths `%s' / `%s')", lds->cmd, got, > + lds->cmd_path, lds->ret_path); > + rc = ERROR_FAIL; > + goto out; > + } > + > + rc = libxl__xs_rm_checked(gc, t, lds->cmd_path); > + if (rc) goto out; > + > + rc = libxl__xs_rm_checked(gc, t, lds->ret_path); > + if (rc) goto out; > + > + rc = libxl__xs_transaction_commit(gc, &t); > + if (!rc) break; > + if (rc<0) goto out; > + } > + > + out: > + /* rc < 0: error > + * rc == 0: ok, we are done > + * rc == +1: need to keep waiting > + */ > + libxl__xs_transaction_abort(gc, &t); > + > + if (!rc) { > + switch_logdirty_done(egc,dss,1); > + } else if (rc < 0) { > + LOG(ERROR,"logdirty switch: response read failed (rc=%d)",rc); Is it only "response read failed" which can cause us to end up here, looks like the read, rm etc could do it? Anyway, I've got no substantive comments so: Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx> > + switch_logdirty_done(egc,dss,0); > + } > +} [...] _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |