[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v13 4/7] remus netbuffer: implement remus network buffering for nic devices

Hi Ian,

  Thank you very much for the review. will address your comments in the
next version.

On 06/28/2014 01:54 AM, Ian Jackson wrote:
Yang Hongyang writes ("[PATCH v13 4/7] remus netbuffer: implement remus network 
buffering for nic devices"):
1.Add two members in libxl_domain_remus_info:


I'm going to make mostly general comments on this.

+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */

This seems to imply that the default at the API level is for network
buffering to be turned off.  I don't think that's right, is it ?

+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.

I don't think it is right in general to look in the backend directory
for the vifname.  If we are using driver domains, the backend xenstore
directory is writeable by the driver domain.

But I think Remus is not at all compatible with driver domains right
now - am I right ?

If so then at the very least this function should have something in
the comment saying that it must ONLY be used for remus because if
driver domains were in use it would constitute a security

+    ret = nl_cache_refill(netbuf_state->nlsock, netbuf_state->qdisc_cache);
+    if (ret < 0) {
+        LOG(ERROR, "cannot refill qdisc cache");

Do these functions return error codes or set errno or something ?  I
think you should print the actual error that was encountered.  That
might be as simple as using LOGE instead but it depends on what they

+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc) {
+        rc = ERROR_FAIL;

I think you can just "goto out" without setting rc again.

+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)

As I said in my other mail, can you reorganise this so that the
functions are the file in the same order as the control flow, with
appropriate comments dividing the different flows up ?

+static void async_call_done(libxl__egc *egc,
+                            libxl__ev_child *child,
+                            pid_t pid, int status)
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_device_state *rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+    if (WIFEXITED(status)) {
+        dev->callback(egc, dev, -WEXITSTATUS(status));
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }

This is really quite odd.  These nonzero statuses are surely always
errors ?  Why leave it to the next layer to interpret ?  Returning
either a positive ERROR_* or a -WEXITSTATUS is particularly odd.

And if you are going to do this you need to make a crystal-clear doc
comment about it.

+static void nic_match_async(const libxl__remus_device_ops *self,
+                            libxl__remus_device *dev)
+    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+        _exit(0);
+    _exit(-ERROR_NOT_MATCH);
+static void nic_match(libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+    if (!pid) {
+        /* child */
+        nic_match_async(self, dev);
+        /* notreached */
+        abort();

This is very odd.  Why are you spawning a subprocess to
test dev->kind ?

Because we will continue to match the next ops if this ops does
not match the device, so this must be an async call to avoid
the callstack going too deep. But if we do the check in the
device abstract layer as you suggested below, this should not
be necessary.

+static void nic_setup(libxl__remus_device *dev)
+    libxl__remus_device_nic *remus_nic;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    const libxl_device_nic *nic = dev->backend_dev;
+    STATE_AO_GC(ns->ao);
+    GCNEW(remus_nic);
+    dev->data = remus_nic;
+    remus_nic->vif = get_vifname(dev, nic);
+    if (!remus_nic->vif)
+        goto out;

I think you should set rc here rather than (implicitly) in out.

+    setup_async_exec(&dev->aes, "setup", dev);
+    if (libxl__async_exec_start(gc, &dev->aes)) {

And you should propagate the rc from libxl__async_exec_start.

+ * Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+static void nic_teardown(libxl__remus_device *dev)

This comment is interesting but I don't understand the implications.
Also isn't this a feature of the abstract/concrete API which should be
documented there ?

+static void nic_match_async(const libxl__remus_device_ops *self,
+                            libxl__remus_device *dev)
+    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+        _exit(-ERROR_FAIL);
+    _exit(-ERROR_NOT_MATCH);

Perhaps each libxl__remus_device_ops could mention the kind it
supports, so that the abstract code could check that ?  That would
avoid duplicating this check in every concrete kind's code.

Good idea, that should make life easier, thanks.

+extern libxl__remus_device_ops remus_device_nic;

This needs to be const.

I'm afraid I'm going to leave the drbd patch for another day...



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.