[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 3/5] libxl: call hotplug scripts from libxl for vbd
Thanks for this mammoth patch. My comments are below. Roger Pau Monne writes ("[Xen-devel] [PATCH v3 3/5] libxl: call hotplug scripts from libxl for vbd"): > This is a rather big change, that adds the necessary machinery to perform > hotplug script calling from libxl for Linux. This patch launches the necessary > scripts to attach and detach PHY and TAP backend types (Qemu doesn't > use hotplug scripts). > > Here's a list of the major changes introduced: Can you please wrap your commit messages to no more than 75 characters ? Some vcs's indent them. And of course they get indented when we reply leading to wrap damage. > * libxl__devices_destroy no longer calls libxl__device_destroy, and instead > calls libxl__initiate_device_remove, so we can disconnect the device and > execute the necessary hotplug scripts instead of just deleting the backend > entries. So libxl__devices_destroy now uses the event library, and so does > libxl_domain_destroy. So we no longer have any function which will synchronously and immediately destroy a domain and throw away everything to do with it. Is it actually the case that you think such a function cannot be provided ? > * Added a check in xen-hotplug-common.sh, so scripts are only executed from > udev when using xend. xl adds a disable_udev=y to xenstore private > directory > and with the modification of the udev rules every call from udev gets > HOTPLUG_GATE passed, that points to disable_udev. If HOTPLUG_GATE is passed > and points to a value, the hotplug script is not executed. WDYM "points to a value"? Did you just mean "is set to a nonempty value" ? I'm not convinced by the name of this variable. Shouldn't it have Xen in the name, and specify its own sense ? Eg, XEN_HOTPLUG_DISABLE or something ? > +SUBSYSTEM=="xen-backend", KERNEL=="tap*", > ENV{HOTPLUG_GATE}="/libxl/$env{XENBUS_PATH}/disable_udev", > RUN+="/etc/xen/scripts/blktap $env{ACTION}" ... > +# Hack to prevent the execution of hotplug scripts from udev if the domain > +# has been launched from libxl > +if [ -n "${HOTPLUG_GATE}" ] && \ > + `xenstore-read "$HOTPLUG_GATE" >/dev/null 2>&1`; then > + exit 0 > +fi Oh I see. Hmm. Why should this string not be fixed ? I'm not sure I like the idea of being able to pass some random other xenstore path. Also: this xenstore path should be a relative path, ie one relative to the xenstore home of domain running this part of the tools. That way the scripts can be easily and automatically disabled for dom0 and enabled in driver domains. > + /* compatibility addon to keep udev rules */ > + flexarray_append(private, "disable_udev"); > + flexarray_append(private, "y"); We would normally use "1" for true in xenstore. > libxl__device_generic_add(gc, &device, > - libxl__xs_kvs_of_flexarray(gc, back, > back->count), > - libxl__xs_kvs_of_flexarray(gc, front, > front->count)); > + libxl__xs_kvs_of_flexarray(gc, back, back->count), > + libxl__xs_kvs_of_flexarray(gc, front, front->count), > + libxl__xs_kvs_of_flexarray(gc, private, private->count)); > + > + switch(disk->backend) { > + case LIBXL_DISK_BACKEND_PHY: > + case LIBXL_DISK_BACKEND_TAP: > + rc = libxl__initiate_device_add(egc, ao, &device); > + if (rc) goto out_free; > + break; > + case LIBXL_DISK_BACKEND_QDISK: > + default: > + libxl__ao_complete(egc, ao, 0); > + break; > + } Does this really need no confirmation from qemu ? > out_free: > + flexarray_free(private); > +out_free_b: > flexarray_free(back); > +out_free_f: > flexarray_free(front); It would be much better to allocate these from the gc somehow. Failing that, perhaps we could initialise them to NULL and free them iff necessary on the exit path: out: ... if (back) flexarray_free(back); if (front) flexarray_free(front); etc. > int libxl_device_disk_remove(libxl_ctx *ctx, uint32_t domid, > @@ -1442,7 +1484,18 @@ int libxl_device_disk_remove(libxl_ctx *ctx, uint32_t > domid, > if (rc != 0) goto out; > > rc = libxl__initiate_device_remove(egc, ao, &device); > - if (rc) goto out; > + switch(rc) { > + case 1: > + /* event added */ I think this terminology "event added" is confusingly wrong. What you mean is "event queued", I think. You should change this everywhere. But before you do that, please see what I write below about your approach to libxl__initiate_device_remove. > @@ -1666,11 +1719,11 @@ int libxl_cdrom_insert(libxl_ctx *ctx, uint32_t > domid, libxl_device_disk *disk) > ret = 0; > > libxl_device_disk_remove(ctx, domid, disks + i, 0); > - libxl_device_disk_add(ctx, domid, disk); > + libxl_device_disk_add(ctx, domid, disk, 0); This needs a /* fixme-ao */ because inside-libxl callers are not allowed to invent an ao_how*, even NULL. You need to mark every occurrence of these that way. > @@ -1884,8 +1937,9 @@ int libxl_device_nic_add(libxl_ctx *ctx, uint32_t > domid, libxl_device_nic *nic) > flexarray_append(front, libxl__sprintf(gc, > LIBXL_MAC_FMT, > LIBXL_MAC_BYTES(nic->mac))); > libxl__device_generic_add(gc, &device, > - libxl__xs_kvs_of_flexarray(gc, back, > back->count), > - libxl__xs_kvs_of_flexarray(gc, front, > front->count)); > + libxl__xs_kvs_of_flexarray(gc, back, > back->count), > + libxl__xs_kvs_of_flexarray(gc, front, > front->count), > + NULL); Likewise. Also unintentional indentation change ? (In several more places, too.) > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 3675afe..de598ad 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -598,7 +598,7 @@ static int do_domain_create(libxl__gc *gc, > libxl_domain_config *d_config, > store_libxl_entry(gc, domid, &d_config->b_info); > > for (i = 0; i < d_config->num_disks; i++) { > - ret = libxl_device_disk_add(ctx, domid, &d_config->disks[i]); > + ret = libxl_device_disk_add(ctx, domid, &d_config->disks[i], 0); Here in xl simply passing 0 is correct. > +char *libxl__device_private_path(libxl__gc *gc, libxl__device *device) > +{ > + return libxl__sprintf(gc, "/libxl/backend/%s/%u/%d", > + libxl__device_kind_to_string(device->backend_kind), > + device->domid, device->devid); > +} Did you want to use GCSPRINTF ? > + rc = libxl__device_hotplug(gc, aorm->dev, DISCONNECT); This is not correct. You need to do something more with the exit status of the hotplug script than simply throwing it away as you do in hotplug_exited. libxl__device_hotplug needs to take a callback from the caller so that it can notify the caller when the hotplug script has exited. You need to not allow the ao to complete until the hotplug script has exited. Otherwise you will enter hotplug_exited after the libxl__hotplug_state has been destroyed with the ao. If it's too slow and you need to time out, you need to send the hotplug script a signal, and then wait for it to exit. If necessary that signal could be SIGKILL; SIGKILL can be relied on to cause the hotplug script to actually terminate and thus the hotplug_exited callback to occur. > +/* > + * Return codes: > + * 1 Success and event(s) HAVE BEEN added > + * 0 Success and NO events were added > + * < 0 Error > + * > + * Since this function can be called several times, and several events > + * can be added to the same context, the user has to call > + * libxl__ao_complete when necessary (if no events are added). > + */ > int libxl__initiate_device_remove(libxl__egc *egc, libxl__ao *ao, > libxl__device *dev) This comment should be in the header file near the declaration. And indeed if you had put it there you would have discovered another comment already there describing the old behaviour, which you have now left behind as an erroneous comment. And I think the new interface is entirely wrong. How does the caller know when to complete the ao if libxl__initiate_device_remove never calls back ? You need to split this into two functions: one with the current interface which is a simple wrapper, used for all the existing call sites, and one which never completes any ao but which always makes a callback when the operation is complete. That callback should be used by the caller of libxl__initiate_device_remove to do whatever it needs to, which might include completing the ao. At the moment, AFAICT from reading your code, the caller's ao will be completed by the first nontrivial device remove to complete, ie a bug. > -int libxl__device_destroy(libxl__gc *gc, libxl__device *dev) > +int libxl__initiate_device_add(libxl__egc *egc, libxl__ao *ao, > + libxl__device *dev) ... > + rc = libxl__ev_devstate_wait(gc, &aorm->ds, device_add_callback, > + state_path, XenbusStateInitWait, > + LIBXL_INIT_TIMEOUT * 1000); > + if (rc) { > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "unable to initialize device %" > + PRIu32, dev->devid); > + libxl__ev_devstate_cancel(gc, &aorm->ds); This cancel is not necessary, because device_add_cleanup will do this. (Or at least it should do; I haven't checked.) > + goto out_fail; > + } > + > + return 0; > + > +out_fail: > + assert(rc); > + device_add_cleanup(gc, aorm); > + return rc; > +out_ok: ^ blank line needed after the return. > + rc = libxl__device_hotplug(gc, dev, CONNECT); > + libxl__ao_complete(egc, ao, 0); > + return rc; > +} Some of my comments about initiate_device_remove's callback probably apply here too. In particular, I have found it is often better to make a callback-queueing function always call the callback right away, recursively, on the error path. The result is that the function can return void, which simplifies callers considerably. Whichever way you do it you do need to document your choices about reentrancy. > +static void hotplug_exited(libxl__egc *egc, libxl__ev_child *child, > + pid_t pid, int status) > +{ > + libxl__hotplug_state *hs = CONTAINER_OF(child, *hs, child); > + EGC_GC; > + > + if (status) { > + libxl_report_child_exitstatus(CTX, hs->rc ? LIBXL__LOG_ERROR > + : LIBXL__LOG_WARNING, > + "hotplug child", pid, status); > + } As I say, this needs to make a callback. > +int libxl__hotplug_exec(libxl__gc *gc, char *arg0, char **args, char **env) > +{ It would be better IMO to call this function "launch" or something, since it doesn't actually call exec in the parent (which is what I think a function called "exec" should do). > + libxl__hotplug_state *hs; > + > + hs = libxl__zalloc(gc, sizeof(*hs)); How about GCNEW(hs) ? > + if (!pid) { > + /* child */ > + signal(SIGCHLD, SIG_DFL); What is this for ? Is the problem that children of libxl otherwise inherit a weird SIGCHLD disposition ? If so you need to fix this in libxl__exec, probably in a separate patch - and you probably need to sort out sigprocmask too in case the libxl caller has set up something weird. This is something that ought to go in a new patch. > + if (libxl__ev_child_inuse(&hs->child)) { > + hs->rc = rc; Under what circumstances will rc!=0 here ? Surely that is the success path? It might be easier simply to have "out:" be only the error path. The success path always has _inuse and the failure path never does. > +/* Action to pass to hotplug caller functions */ > +typedef enum { > + CONNECT = 1, > + DISCONNECT = 2 > +} libxl__hotplug_action; These names are rather too generic, I think. > +typedef struct libxl__hotplug_state libxl__hotplug_state; > + > +struct libxl__hotplug_state { > + /* public, result, caller may only read in callback */ > + int rc; > + /* private for implementation */ > + libxl__ev_child child; > +}; And this is where the callback member ought to go, in the public part, with a note saying it should be set by the caller. Is there any provision for timeout here ? Would here be a good place to do the timeout, rather than higher up the stack ? Also you might find it better to move the args and env and so forth into the hotplug_state. That way, for example, you can actually produce a useful message when the script fails. > +/* > + * libxl__hotplug_exec is an internal function that should not be used > + * directly, all hotplug script calls have to implement and use > + * libxl__device_hotplug. > + */ > +_hidden int libxl__device_hotplug(libxl__gc *gc, libxl__device *dev, > + libxl__hotplug_action action); > +_hidden int libxl__hotplug_exec(libxl__gc *gc, char *arg0, char **args, > + char **env); Why does libxl__hotplug_exec need to be declared here at all then ? Could it be static in libxl_hotplug.c ? > +/* Hotplug scripts helpers */ > +static char **get_hotplug_env(libxl__gc *gc, libxl__device *dev) > +{ > + libxl_ctx *ctx = libxl__gc_owner(gc); Why not use CTX ? For that matter, if you use my new LOG macros you don't need to refer to CTX at all. > + script = libxl__xs_read(gc, XBT_NULL, > + libxl__sprintf(gc, "%s/%s", be_path, "script")); > + if (!script) { > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "unable to read script from %s", > + be_path); You need to log the errno. > + f_env = flexarray_make(9, 1); > + if (!f_env) { > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, > + "unable to create environment array"); > + return NULL; > + } Isn't this an allocation failure which should be dealt with by libxl__alloc_failed ? > + flexarray_set(f_env, nr++, "XENBUS_TYPE"); > + flexarray_set(f_env, nr++, (char *) > + libxl__device_kind_to_string(dev->backend_kind)); Why is the cast needed ? > + flexarray_set(f_env, nr++, "XENBUS_PATH"); > + flexarray_set(f_env, nr++, > + libxl__sprintf(gc, "backend/%s/%u/%d", > + libxl__device_kind_to_string(dev->backend_kind), > + dev->domid, dev->devid)); GCSPRINTF might help shorten this ? For that matter, you've called libxl__device_kind_to_string twice. Calling it only once would make this easier to read. I won't comment any more on places where I think GCSPRINTF, LOG/LOGE/LOGEV and CTX might usefully replace what you have written. > + env = get_hotplug_env(gc, dev); > + if (!env) > + return -1; Surely this should never fail as it just results in allocation failure ? This function seems to be supposed to return an rc value, so return -1 is wrong. > + args = (char **) flexarray_contents(f_args); Why is the cast needed ? > + what = libxl__sprintf(gc, "%s %s", args[0], > + action == CONNECT ? "connect" : "disconnect"); Surely action == CONNECT ? "connect" : action == DISCONNECT ? "disconnect" : abort() ? > + rc = libxl__hotplug_exec(gc, args[0], args, env); > + if (rc) { > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "unable execute hotplug scripts > for " > + "device %"PRIu32, dev->devid); libxl__hotplug_exec ought to do all the logging. That means that all the information needed to do so should be passed to it, including the domid (which you don't currently log) and the device id. That should probably be filled in by its caller in the libxl__hotplug_state struct. > diff --git a/tools/python/xen/lowlevel/xl/xl.c > b/tools/python/xen/lowlevel/xl/xl.c > index c4f7c52..ce7a2b2 100644 > --- a/tools/python/xen/lowlevel/xl/xl.c > +++ b/tools/python/xen/lowlevel/xl/xl.c > @@ -447,7 +447,7 @@ static PyObject *pyxl_domain_destroy(XlObject *self, > PyObject *args) > int domid; > if ( !PyArg_ParseTuple(args, "i", &domid) ) > return NULL; > - if ( libxl_domain_destroy(self->ctx, domid) ) { > + if ( libxl_domain_destroy(self->ctx, domid, NULL) ) { I think this is correct. Or, at least, we don't currently provide any asynchronous capability in the Python code and I don't mind not doing so. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |