[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v15 2/7] remus: introduce remus device
On 07/10/2014 01:26 AM, Ian Jackson wrote: Yang Hongyang writes ("[PATCH v15 2/7] remus: introduce remus device"):introduce remus device, an abstract layer of remus devices(nic, disk, etc).It provides the following APIs for libxl: >libxl__remus_device_setup setup remus devices, like attach qdisc, enable disk buffering, etc >libxl__remus_device_teardown teardown devices >libxl__remus_device_postsuspend >libxl__remus_device_preresume >libxl__remus_device_commit above three are for checkpoint.Sorry to mention this now, but I think it would be clearer if all these libxl__remus_device_* functions which manipulate _all_ the devices for a domain used the plural of "device", ie libxl__remus_devices_setup, etc. Ok diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 39f1c28..bcbd02b 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -733,6 +733,31 @@ out: static void remus_failover_cb(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc); +static void libxl__remus_setup_failed(libxl__egc *egc, + libxl__remus_state *rs, int rc) +{ + STATE_AO_GC(rs->ao); + libxl__ao_complete(egc, ao, rc); +} + +static void libxl__remus_setup_done(libxl__egc *egc, + libxl__remus_state *rs, int rc)This callback appears after the code which sets it up, which naturally executes first. Things should be in chronological order, as I said.+{ + libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs); + STATE_AO_GC(rs->ao); + + if (!rc) { + libxl__domain_suspend(egc, dss); + return; + } + + LOG(ERROR, "Remus: failed to setup device for guest with domid %u", + dss->domid);This log message should report the rc.diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 83eb29a..8cc90dc 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data) domain_suspend_callback_common(egc, dss); } +static void remus_device_postsuspend_cb(libxl__egc *egc, + libxl__remus_state *rs, int rc) +{I can't quite tell, but isn't this code not in chronological order ?diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index f8441f6..8ef20a0 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h...+/* + * This structure is for remus device layer, it records remus devices + * that have been setuped. + */Still lots of uses of "setuped". Correct would be "devices that have been set up".+/*----- checkpointing APIs -----*/ + +/* callbacks */ + +static void device_common_cb(libxl__egc *egc, + libxl__remus_device *dev, + int rc) +{I asked for things to be in chronological order in the file. I think this function therefore ought to be below these checkpoint functions. I used to thought that the definitions of callbacks should be before the functions that set them, sorry for the misunderstanding, I will correct them. Also its name isn't very suggestive. Perhaps it should have "checkpoint" in the name ?+/* API implementations */ + +void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)...+void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)...+void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)These three functions are nearly identical. They should be combined somehow. Perhaps there should be a single function which takes the offset of the per-device operation vtable member as a parameter. Or perhaps the three functions should be generated by a macro. Currently I'm leaning towards the former. Good suggestions + if(rds->num_set_up == 0)^ Coding style.+ for (i = 0; i < rds->num_set_up; i++) { + dev = rds->dev[i]; + dev->callback = device_common_cb; + if (dev->ops->commit) { + dev->ops->commit(dev); + } else { + rds->num_devices++; + if (rds->num_devices == rds->num_set_up) + rs->callback(egc, rs, rs->saved_rc);Why not have this call device_common_cb rather than open-coding it ? Sorry for the poor named function, device_common_cb should be used at checkpoint. will add checkpoint to the function name. +/*----- main flow of control -----*/This is for setup, right ? Or teardown ? Or both ?+/* callbacks */ + +static void device_setup_cb(libxl__egc *egc, + libxl__remus_device *dev, + int rc); +static void device_match_cb(libxl__egc *egc, + libxl__remus_device *dev, + int rc) +{This is still not in chronological order. You must put callbacks after the things that set them up, since obviously the callback happens after it is set up. Seeing things in the wrong order makes it hard to see what's going on. In the interests of trying to reduce the number of iterations of this series I'm going to do a kind of peephole review of what I see here; you should bear in mind that I'm not always trying to understand what the purpose of these functions is so I may have structural comments on your next version. Ok, just don't quite follow the meaning of chronological order, but it's more clearer for me now. + libxl__remus_device_state *const rds = dev->rds; + libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state); + + STATE_AO_GC(rs->ao); + + if (rc) { + if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||This use of ++ looks confusing to me.+ rc != ERROR_REMUS_DEVOPS_NOT_MATCH) { + /* the device can not be matched */ + rds->num_devices++; + rs->saved_rc = ERROR_FAIL;Please can we not return more ERROR_FAIL. Instead, why not return ERROR_REMUS_DEVICE_NOT_SUPPORTED as I suggested before ? I will add this error code. + rds->num_devices++; + /* + * we add devices that have been setuped to the array no matter + * the setup process succeed or failed because we need to ensure + * the device been teardown while setup failed. If any of the + * device setup failed, we will quit remus, but before we exit, + * we will teardown the devices that have been added to **dev + */ + rds->dev[rds->num_set_up++] = dev; + if (rc) { + /* setup failed */ + rs->saved_rc = ERROR_FAIL; + }This doesn't look right. If the setup fails, presumably we shouldn't put the device in the array ? If we do it will presumably be torn down later. the netbuf script was designed as below: 1. when setup failed, the script won't teardown the device itself. 2. the teardown op is ok to be excute many times. In the remus device layer, if one device setup failed(whether script exit or the script get killed or something), we will quit remus, but before that, we will teardown all device that has been set up, whether it's correctly set up or not. So if we don't put the device in the array, we will get a leaked device, that has not been teardown. +static void destroy_device_ops(libxl__remus_state *rs); +static void device_teardown_cb(libxl__egc *egc, + libxl__remus_device *dev, + int rc) +{ + libxl__remus_device_state *const rds = dev->rds; + libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state); + + STATE_AO_GC(rs->ao); + + /* ignore teardown errors to teardown as many devs as possible*/ + rds->num_set_up--;We should at least preserve the rc so that we don't pretend that everything worked if it didn't. You may need to invent a new teardown rc variable, and/or add logging, so that you get the right error code out, and the right log messages, if a setup fails and causes a teardown which also fails. Ok, thanks. +static int init_device_ops(libxl__remus_state *rs) +{ + int i, rc; + const libxl__remus_device_ops *ops; + + for (i = 0; i < ARRAY_SIZE(dev_ops); i++) { + ops = dev_ops[i]; + if (ops->init(ops, rs)) { + rc = ERROR_FAIL;In general, you should preserve the rc, not turn everything into ERROR_FAIL. Ok, thanks. I still don't understand why libxl__remus_device_state is not part of libxl__remus_state. libxl__remus_device_state is part of libxl__remus_state: +struct libxl__remus_state { + /* must set by caller of libxl__remus_device_(setup|teardown) */ + libxl__ao *ao; + uint32_t domid; + libxl__remus_callback *callback; + + /* private */ + int saved_rc; + /* context containing device related stuff */ + libxl__remus_device_state dev_state; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + libxl__ev_time timeout; /* used for checkpoint */ +}; I wonder if you need a different name for "the abstract thing which is represented by a libxl__remus_device_ops". Since each ops only deals with one kind (but perhaps a kind may have several ops), I suggest calling them "subkind"s or "devsubkinds" or something. At the moment you seem to be calling them "ops". As a result I think this function is poorly named. It doesn't initialise any ops. It initialises each subkind. Perhaps it should be called init_device_subkind. You might consider whether libxl__remus_device_ops should be libxl__remus_subkind_ops perhaps or libxl__remus_devsubkind_ops or libxl__remus_device_subkind_ops. Ok, thanks. +void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs) +{ + STATE_AO_GC(rs->ao); + + /* Convenience aliases */ + libxl__remus_device_state *const rds = &rs->dev_state; + + if (ARRAY_SIZE(dev_ops) == 0) + goto out;Why is this special case necessary ?diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index de25f42..e56567f 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [ (-12, "OSEVENT_REG_FAIL"), (-13, "BUFFERFULL"), (-14, "UNKNOWN_CHILD"), + (-15, "REMUS_DEVOPS_NOT_MATCH"),I don't object to this name for this exact case and it's fine to have a specific error code. But if you are going to make the whole machinery return ERROR_REMUS_DEVICE_NOT_SUPPORTED if no subkind matches, then you can reuse that error code for the subkind methods. Will add ERROR_REMUS_DEVICE_NOT_SUPPORTED error code. Thanks, Ian. . -- Thanks, Yang. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |