[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v16 2/7] remus: introduce remus device

On 07/16/2014 02:28 AM, Ian Jackson wrote:
Yang Hongyang writes ("[PATCH v16 2/7] remus: introduce remus device"):
introduce remus device, an abstract layer of remus devices(nic, disk,
etc).It provides the following APIs for libxl:

Thanks.  I've gone into a lot more detail here.  Many of my changes
are wording changes to comments, since it's probably easier for me to
simply provide you with the better wording.  Please feel free to cut
and paste them.  If they seem inaccurate then I have misunderstood
something :-).

For the comments,
   Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

Thank you very much!
I've copy/paste your commnets and added your Signed-off-by.

--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
+ * If this is defined, then libxl supports remus.
+ */

It occurs to me to ask: would this be better put in at the end, given
that the code after (only) this patch has gone in is not really
useable because of the lack of proper device handling ?

I will make a separate patch to update this at the end, thanks!

+static void remus_device_preresume_cb(libxl__egc *egc,
+                                      libxl__remus_device_state *rds,
+                                      int rc)
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+    if (!rc) {
+        /* Resumes the domain and the device model */
+        if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+            ok = 1;
+    }
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);

Can we please use the standard `goto out' error handling style ?  Both
here and in remus_device_postsuspend_cb.

The right approach is this: At the top of the function, test rc, and
if nonzero log if necessary, and `goto out'.  The success path should
set `ok=1'.

Ok, thanks.

You have it right in remus_checkpoint_dm_saved.

+static void remus_device_commit_cb(libxl__egc *egc,
+                                   libxl__remus_device_state *rds,
+                                   int rc)
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    } else {

This shouldn't be an else; that results in every-increasing nesting

Ok, thanks.

+        /* Set checkpoint interval timeout */
+        rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+                                         remus_next_checkpoint,
+                                         dss->interval);
+        if (rc) {
+            LOG(ERROR, "unable to register timeout for next epoch."
+                " Terminating Remus..");
+            goto out;

+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+    libxl__domain_suspend_state *dss =
+                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+    STATE_AO_GC(dss->ao);
+    libxl__ev_time_deregister(gc, &dss->checkpoint_timeout);

This is not necessary.  libxl's timers are one-shot.

      libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);

But this is a bit mystifying to me.  It seems that when you have
finished committing to the checkpoint, you delay by the
inter-checkpoint delay before returning.  Is that right ?  I think
this could at the very least do with some commentary explaining why...

This delay time is the interval between checkpoints. Remus doing checkpoints

@@ -1752,6 +1849,34 @@ static void domain_suspend_done(libxl__egc *egc,
          xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                             dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);

+    if (dss->remus) {

The usual libxl idiom for this kind of conditional processing can be
seen in libxl_create.c.  Search, for example, for
"domcreate_bootloader_done".  You'll see that what we do is

         if (restore_fd >= 0) {
             LOG(DEBUG, "restoring, not running bootloader\n");
                     domcreate_bootloader_done(egc, &dcs->bl, 0);

So I think what you should do here is:

       if (!dss->remus) {

If it's not a remus suspend, there's no rds...so we can not call
I may keep the original code the next version.


and then put the meat of the function outside the conditional.

The effect of this is that there is still only one place where
dss->callback is called.

+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        LOGE(WARN, "Remus: Domain suspend terminated with rc %d,"
+             " teardown Remus devices...", rc);
+        dss->rds.callback = libxl__remus_teardown_done;

I don't understand why libxl__remus_teardown_done has an
inter-file-visible name.  Isn't it just a callback chaining function ?
If so I would call it remus_teardown_done or something.

I will rename it to remus_teardown_done, thanks.

+typedef enum libxl__remus_device_kind {
+} libxl__remus_device_kind;

AFAICT this enum is used only for filtering which subkind ops vtables
we attempt setup() on.  Maybe it would be better to remove this enum
and replace its uses in libxl__remus_devices_setup with references to
(lists of) vtables ?


    extern const libxl__remus_device_subkind_ops libxl__remus_subkinds_nic[];
    extern const libxl__remus_device_subkind_ops libxl__remus_subkinds_disk[];
    const libxl__remus_device_subkind_ops *libxl__remus_all_subkinds[] = {

and then

     for (i = 0; i < rds->num_nics; i++) {
             libxl__remus_device_init(egc, rds,

or something.

Thanks for the detailed comments. this enum also used for match() and
other uses, so I will keep it.

+struct libxl__remus_device_subkind_ops {
+    /* the device kind this ops belongs to... */
+    libxl__remus_device_kind kind;
+    /*
+     * init() and destroy() APIs are produced by a device subkind and
+     * consumed by the main remus code, a device subkind must implement

We would normally write, in English, that an API is "provided" and
"called", rather than "produced" and "consumed".  Objects may be
produced and consumed, but not functions.

Also this whole API has the property there that you attribute to these
functions.  So I would say:

         * These operations are provided by a device subkind and
         * called by the main remus code.  Functions must be
         * implemented unless otherwise stated.

at the top of the structure.

I would also move the replicated text about async functions to a
common section:

         * Many of these functions are asynchronous.  They call
         * dev->callback when done.  The actual implementations may be
         * synchronously and call dev->callback directly (as the last
         * thing they do).

+     * init() and destroy() APIs are produced by a device subkind and
+     * consumed by the main remus code, a device subkind must implement
+     * these two APIs.

What you really mean is that these init and destroy a subkind's data.
So I would rename `destroy' to `cleanup' and say:

         * init() and cleanup() relate to the subkind-specific state in
         * the libxl ctx, not to any specific device.

Although, now I have read later in this file, it seems that this
subkind-specific state is stored in the per-remus-operation state
structure, not in the libxl_ctx.

I see from (for example) that libxl_netbuffer.c uses a netlink socket
per remus operation.  Would it be possible to share a netlink socket
between operations ?  I seem to remember this was a static global
variable in a previous version of the series, which suggests it would
be possible.

If it is, why not put it in the libxl_ctx ?

It is in the libxl_ctx. init() will only be called one time per remus

And of course:

         * Synchronous.  cleanup() cannot fail.

+    /*
+     * checkpoint callbacks, these are async ops, . These function pointers 
may be NULL, means the op is
+     * not implemented, and it will do nothing when checkpoint.
+     * The callers of these APIs must check the function pointer first.
+     *

      * Checkpoint operations.  May be NULL, meaning the op is not
      * implemented and the caller should treat them as a no-op (and do
      * nothing when checkpointing).

And of course:

      * Asynchronous.

+    void (*postsuspend)(libxl__remus_device *dev);
+    void (*preresume)(libxl__remus_device *dev);
+    void (*commit)(libxl__remus_device *dev);

+    /*
+     * This API determines whether the subkind matchs the specific
+     * device. In the implementation, we first init all device
+     * subkind, for example, NIC, DRBD disk... Then we will find
+     * out the libxl devices, and match the device with the
+     * subkind, if the device is a drbd disk, then it will be
+     * matched with DRBD subkind, and the further ops(such as
+     * checkpoint etc.)  of this device will using DRBD subkind
+     * ops. This API is mainly for disks, because we must use an
+     * external script to determine whether a libxl_disk is a DRBD
+     * disk.  This function pointer may be NULL. That means this
+     * *kind* of device's ops is always matched with the *kind* of
+     * device.  It's an async op and must be implemented
+     * asynchronously, call dev->callback when done.
+     */

      * Determines whether the subkind matches the specific device.
      * init() will have been called for the subkind.  If match()
      * succeeds, the device will then be managed with this set of
      * subkind operations.  Asynchronous.  Yields 0 if the device
      * matches, or REMUS_DEVOPS_NOT_MATCH if it does not; any other rc
      * indicates failure.  May be NULL, meaning this subkind matches
      * any device.

+    void (*match)(libxl__remus_device *dev);

+    /*
+     * setup() and teardown() are refer to the actual remus device,
+     * a device subkind must implement these two APIs. They are async
+     * ops, and call dev->callback when done.
+     * These callbacks can be implemented synchronously, call
+     * dev->callback at last directly.
+     */

Having dealt with the boilerplatey bits we can simply write:

      * setup() and teardown() are refer to the actual remus device.
      * Asynchronous.

I seem to remember asking you to document that:

      * teardown is called even if setup fails.

+ * This structure is for remus device layer, it records remus devices
+ * that have been set up.

I don't think this struct is entirely for the remus device layer,
since only part of it is "private".

     * State associated with a remus invocation, including parameters
     * passed to the remus abstract device layer by the generic
     * (non-remus) save/restore machinery.

+ */
+struct libxl__remus_device_state {
+    /* must set by caller of libxl__remus_device_(setup|teardown) */

You mean
       must be set by [etc]

+    libxl__ao *ao;
+    libxl__egc *egc;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+    /* the last ops must be NULL */
+    const libxl__remus_device_subkind_ops **ops;

Surely it shouldn't be the non-remus code's responsibility to provide
the vtable list ?  In fact I don't really understand why this needs to
be a parameter at all.  Isn't it always the same ?

We make remus device abstract layer more generic, because COLO will reuse remus
device layer, so it will has diffrent ops.

+    /* private */
+    /* devices that have been set up */
+    int saved_rc;
+    libxl__remus_device **dev;

Please make the sectional comments like "private" and "must be set..."
stand out more.  I seem to remember asking this before (about a
different struct perhaps?)

The 2nd comment there can't be right.  Perhaps it refers to dev.

But if dev is the set of devices that have been set up, what form is
it in ?  I think it's an array and that the counters num_devices and
num_set_up are associated with it.

If so you need to bring dev and these sizes/counts together, and write
much more informative commentary.  To give a specific example:

+    /* for counting devices that have been set up */
+    int num_set_up;

The variable is called "num_set_up".  And everything here is devices.
So the variable must be a "num[ber]" of devices which have been "set
up".  That's the same as "counting" the devices which have been set

So the comment tells me nothing that the variable name doesn't.

But together these comments leave many important questions unanswered.
What is the size of the devs array ?  What proportion of the devs
array is initialised at any one time ?  May the devs array contain
null pointers and what do they mean ?  etc.

I have made it more clear in the next version, please refer to v17.

+typedef void libxl__remus_device_callback(libxl__egc *,
+                                          libxl__remus_device *,
+                                          int rc);
+ * This structure is init and setup by remus device abstruct layer,
+ * and pass to remus device ops


      Information about a single device being handled by remus.
      Allocated by the remus abstract layer.

It's not all "init" by the abstract layer - your more detailed
comments inside the struct explain that nicely.

+ */
+struct libxl__remus_device {
+    /*----- shared between abstract and concrete layers -----*/
+    /* set by remus device abstruct layer */

I would say
         * all set by remus device abstruct layer */
and add a blank line after it.

That makes it clear that the 2nd comment line applies to the whole
section, not just the single next line.

+    /* libxl__device_* which this remus device related to */
+    const void *backend_dev;
+    libxl__remus_device_kind kind;
+    /*----- private for abstract layer only -----*/
+    /*
+     * we must go through all device ops until we find a matched ops
+     * for the device.
+     */

This last comment belongs in the code, if anywhere, but I suspect it
is obvious.  Perhaps what you really mean is

         * Control and state variables for the asynchronous callback
         * based loops which iterate over device subkinds, and over
         * individual devices.



+    int ops_index;
+    const libxl__remus_device_subkind_ops *ops;
+    libxl__remus_device_callback *callback;
+    libxl__remus_device_state *rds;

+    /*----- private for concrete (device-specific) layer -----*/
+    /* *kind* of device's private data */
+    void *data;

Again, the /*----- -----*/ comment would benefit from a following
blank line.

It would be better to change the variable name than write the
information in the comment.  Is there some reason why data shouldn't
be `concrete_data' or something ?

I will rename it to concrete_data. and remove *kind*

I'm not sure why "*kind*" is emphasized in the comment.
+    /* for calling scripts, eg. setup|teardown|match scripts */
+    libxl__async_exec_state aes;
+    /*
+     * for async func calls, in the implementation of device ops, we
+     * may use fork to do async ops. this is owned by device-specific
+     * ops methods
+     */
+    libxl__ev_child child;

I still think these variables would be better in the subkind private
data structs.  Having them here is a layering violation.  The fact
that each of the subkinds needs these is a coincidence, not part of
the design.  So it is fine to write them out twice for that reason.

And I'm not sure how helpful the comments are.  It's fairly obvious
that a libxl__async_exec_state is for running scripts, for example.
But maybe if these were in the concrete code it would be possible to
make the comments more specific.

If we move these 2 variables into device specific structs, that is in
concrete_data, there will be some problems:
  concrete_data will be init when setup() is called.
  match() is called before setup(), but in match() we may call hotplug
scripts or doing async func calls, so this two vars may be needed.

--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,339 @@
+ * Copyright (C) 2014
+ * Author: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>

I think you need to specify who owns the copyright.  Probably
Fujitsu ?  Although this will depend on your contractual situation.

Will update, thanks!

+/*----- helper functions -----*/
+static int init_device_subkind(libxl__remus_device_state *rds)
+    int rc;
+    const libxl__remus_device_subkind_ops **ops;
+    for (ops = rds->ops; *ops; ops++) {
+        rc = (*ops)->init(rds);

Why not call init immediately before match() ?  That would do away
with this loop entirely.

Furthermore, this seems to imply that init is idempotent.  This isn't
spelled out in your API comments.

If init is idempotent, and is going to be called before match() every
time, why not fold it into match ?

a ops will be called many times in match(). but init()/destroy() only
need to be called once. so I think it's better to separate it. and also
if a subkind init failed, we can exit immediately, and not need to wait
for match complete.

+        if (rc) {
+            goto out;
+        }

If you keep this, there is no need for the braces here.  Indeed you
can put `goto out' on the same line.

Ok, thanks.

+/* remus device setup and teardown */
+static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+                                     libxl__remus_device_state *rds,
+                                     libxl__remus_device_kind kind,
+                                     void *libxl_dev);
+void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_device_state 

Missing blank line between the two declarations.

If libxl__remus_device_init is static, why does it have a
globally-visible name ?

Will rename it, thanks!

+    STATE_AO_GC(rds->ao);
+    if (!rds->ops[0])
+        goto out;

Why the special case ?  AFAICT init_device_subkind would do the right
thing if called anyway.

Ok, will delete this case, thanks.

+static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+                                     libxl__remus_device_state *rds,
+                                     libxl__remus_device_kind kind,
+                                     void *libxl_dev)
+    libxl__remus_device *dev = NULL;
+    STATE_AO_GC(rds->ao);
+    GCNEW(dev);
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->rds = rds;
+    libxl__async_exec_init(&dev->aes);
+    libxl__ev_child_init(&dev->child);
+    /* match the ops begin */
+    dev->ops_index = 0;
+    dev->ops = rds->ops[dev->ops_index];
+    for (; dev->ops; dev->ops = rds->ops[++dev->ops_index]) {
+        if (dev->ops->kind == dev->kind) {
+            if (dev->ops->match) {
+                dev->callback = device_match_cb;
+                dev->ops->match(dev);
+            } else {
+                /*
+                 * This devops do not have match() implementation.
+                 * That means this *kind* of device's ops is always
+                 * matched with the *kind* of device.
+                 */
+                dev->callback = device_setup_cb;
+                dev->ops->setup(dev);
+            }

This would be much less confusing if you had this !match case call
device_match_cb with rc=0 instead of directly calling setup.

+static void device_match_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+    libxl__remus_device_state *const rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+    if (rds->saved_rc) {
+        /* there's already an error happened, we do not need to continue */
+        rds->num_devices++;
+        if (all_devices_handled(rds))
+            rds->callback(egc, rds, rds->saved_rc);
+        return;
+    }
+    if (rc) {
+        /* the ops does not match, try next ops */

But what if rc==ERROR_MY_HEAD_EXPLODED ?

this case is treated as fail, see below:
  if (!dev->ops || rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {

+        dev->ops = rds->ops[++dev->ops_index];
+        if (!dev->ops || rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
+            /* the device can not be matched */
+            rds->num_devices++;
+            rds->saved_rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+            if (all_devices_handled(rds))
+                rds->callback(egc, rds, rds->saved_rc);
+            return;
+        }
+        for ( ; dev->ops; dev->ops = rds->ops[++dev->ops_index]) {
+            if (dev->ops->kind == dev->kind) {

The logic here is tangled and duplicated.  Please arrange to have only
one instance of this loop.

The easiest way is probably to have libxl__remus_device_init set
ops_index to -1 or something and call device_match_cb.  Then
device_match_cb can do all the work.

Ok, I will refactor this, thanks.

+static void device_setup_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+    rds->num_devices++;
+    /*
+     * the netbuf script was designed as below:
+     * 1. when setup failed, the script won't teardown the device itself.
+     * 2. the teardown op is ok to be executed many times.

This information needs to be in the API commentary, and can then be
deleted here.

Ok, thanks.

+     * we add devices that have been set up to the array no matter
+     * the setup process succeed or failed because we need to ensure
+     * the device been teardown while setup failed. If any of the
+     * device setup failed, we will quit remus, but before we exit,
+     * we will teardown the devices that have been added to **dev
+     */

This information would be better expressed in the commentary on the
state variables, as invariant(s) on the state.

Ok, thanks.

+    if (all_devices_handled(rds))
+        rds->callback(egc, rds, rds->saved_rc);

This little snippet of code occurs many times.  Perhaps
all_devices_handled should include calling the callback and be called
"check_all_devices_handled" or "check_make_callback" or something.

I will refactor this, thanks!

+void libxl__remus_devices_teardown(libxl__egc *egc, libxl__remus_device_state 
+    int i, num_set_up;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rds->ao);
+    rds->saved_rc = 0;
+    if (rds->num_set_up == 0) {
+        destroy_device_subkind(rds);
+        goto out;
+    }

Again, you need to arrange for this unfolded loop to contain only one
copy of each bit of logic.

+#define define_remus_device_checkpoint_api(api)                             \
+void libxl__remus_devices_##api(libxl__egc *egc,                            \
+                                libxl__remus_device_state *rds)             \

Can you please make the lines shorter ?  When I quote this, it comes

Ok, thanks.

out looking like this:

+    for (i = 0; i < rds->num_set_up; i++) {                        \
+        dev = rds->dev[i];                                         \
+        dev->callback = device_checkpoint_cb;                      \
+        if (dev->ops->api) {                                       \
+            dev->ops->api(dev);                                    \

which is hard to read.

To comment on a slightly reformatted version:

+void libxl__remus_devices_##api(libxl__egc *egc,                  \
+                                libxl__remus_device_state *rds)   \
+{                                                                 \
+    int i;                                                        \
+    libxl__remus_device *dev;                                     \
+                                                                  \
+    STATE_AO_GC(rds->ao);                                         \
+                                                                  \
+    rds->num_devices = 0;                                         \
+    rds->saved_rc = 0;                                            \
+                                                                  \
+    if (rds->num_set_up == 0)                                     \
+        goto out;                                                 \

I think if you did as I suggest with check_all_devices_handled, you
would end up calling check_all_devices_handled on exit from this
function and therefore wouldn't need the special case.

This special case is needed because if rds->num_set_up == 0, rds->callback
will no be called in below loop.

+                                                                  \
+    for (i = 0; i < rds->num_set_up; i++) {                       \
+        dev = rds->dev[i];                                        \
+        dev->callback = device_checkpoint_cb;                     \
+        if (dev->ops->api) {                                      \
+            dev->ops->api(dev);                                   \
+        } else {                                                  \
+            rds->num_devices++;                                   \
+            if (rds->num_devices == rds->num_set_up)              \
+                rds->callback(egc, rds, rds->saved_rc);           \
+        }                                                         \
+    }                                                             \
+                                                                  \
+    return;                                                       \
+                                                                  \
+out:                                                              \
+    rds->callback(egc, rds, rds->saved_rc);                       \

This function is strikingly similar to the previous unfolded async

And it's also similar to libxl__ao_device and libxl__multidev_*.

Can any of this be factored out ?  Can we in fact reuse
libxl__ao_device and libxl__multidev_* ?

I don't have a good solution to reuse libxl__ao_device and libxl__multidev_*
currently, so I will keep the original code for now, but will refactor it.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.