Xen project Mailing List

Re: [Xen-devel] [PATCH v2 08/10] libxl: Kill QEMU by uid when possible

To: George Dunlap <george.dunlap@xxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxx>

Date: Wed, 12 Dec 2018 16:17:17 +0000

Cc: Anthony Perard <anthony.perard@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Wed, 12 Dec 2018 16:17:27 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

George Dunlap writes ("[PATCH v2 08/10] libxl: Kill QEMU by uid when possible"): > The privcmd fd that a dm_restrict'ed QEMU has gives it permission to > one specific domain ID. This domain ID will probably eventually be > used again. It is therefore necessary to make absolutely sure that a > rogue QEMU process cannot hang around after its domain has exited. Thanks. Detailed comments mostly on error handling follow... > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index f9e0bf6578..cd3208f4b8 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -1135,7 +1135,7 @@ typedef struct { > const char *shim_cmdline; > const char *pv_cmdline; > > - char *dm_runas; > + char *dm_runas, *dm_uid; I think `dm_uid' is misnamed. It is only set if the dm has a *dedicated* uid. If the dm is run as uid 0 or as a shared qemu uid, it is set to NULL. > @@ -3706,6 +3706,8 @@ struct libxl__destroy_devicemodel_state { > uint32_t domid; > libxl__devicemodel_destroy_cb *callback; /* May be called re-entrantly */ > /* private to implementation */ > + libxl__ev_child destroyer; > + int rc; /* Accumulated return value for the destroy operation */ Excellent. > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index 7f9c6a62fe..53fdf8daf7 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -129,6 +129,8 @@ static int libxl__domain_get_device_model_uid(libxl__gc > *gc, > int rc; > char *user; > uid_t intended_uid; > + bool kill_by_uid; > + I guess this new blank line is deliberate. I think it is probably a good idea. > @@ -148,8 +150,10 @@ static int libxl__domain_get_device_model_uid(libxl__gc > *gc, > LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", > user); > rc = ERROR_INVAL; > - } else > + } else { > intended_uid = user_base->pw_uid; > + kill_by_uid = true; > + } > > goto out; > } I think your changes to the out block newly imply that all `goto out' with `rc=0' must also set kill_by_uid. I have not (in this revision) applied this patch and searched for that because I think there will have to be a respin to consistently apply the rules I previously identified. (I mention this here partly so that when I review v3 I know that my former self wanted to double check that.) > @@ -226,6 +234,8 @@ out: > } > > state->dm_runas = user; > + if (kill_by_uid) > + state->dm_uid = GCSPRINTF("%ld", (long)intended_uid); > } More stuff is accumulating in this post-0-check success path. Perhaps this means my suggestion of if (!rc) { ... } if (!rc) { ... } will be most convenient. > + /* > + * If we're starting the dm with a non-root UID, save the UID so > + * that we can reliably kill it and any subprocesses > + */ > + if (state->dm_uid) > + libxl__xs_printf(gc, XBT_NULL, > + GCSPRINTF("%s/image/device-model-uid", dom_path), My comment about the misnamed libxl variable applies to device-model-uid too I think. > +#define PROPAGATE_RC if(!ddms->rc) ddms->rc = rc ^ I like this macro. But (i) it needs proper macro hygiene - either a do{ }while(0) block, or refactoring into an expression (and then putting in parens). And (ii) you missed out a space. (The references to ddms and rc are anaphoric, not macro parameters, so do not need parens.) > + path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, ""); > + rc = libxl__xs_rm_checked(gc, XBT_NULL, path); > + if (rc) { > + PROPAGATE_RC; > LOGD(ERROR, domid, "xs_rm failed for %s", path); > + } Your new macro makes this a nicely transparent pasttern. > + /* > + * See if we should try to kill by uid > + */ > + path = GCSPRINTF("/local/domain/%d/image/device-model-uid", domid); > + rc = libxl__xs_read_checked(gc, XBT_NULL, path, &dm_uid_str); > + > + /* > + * If there was an error here, accumulate the error and fall back > + * to killing by pid. > + */ > + if (rc) { > + PROPAGATE_RC; > + LOGD(ERROR, domid, "Reading dm UID path failed for %s", path); > + } From the comment for libxl__xs_read_checked: | * On error, *result_out is undefined. Arguably this is a bear trap. Maybe you would like to fix it there rather than by setting dm_uid_str to 0 here. > + reaper_pid = libxl__ev_child_fork(gc, &ddms->destroyer, > + kill_device_model_uid_cb); > + if (reaper_pid < 0) { > + rc = ERROR_FAIL; > + PROPAGATE_RC; > + /* > + * Note that if this fails, we still don't kill by pid, to > + * make sure that an untrusted DM has not "maliciously" > + * exited (potentially causing us to kill an unrelated > + * process which happened to get the same pid). > + */ > + goto out; > + } > + > + if (!reaper_pid) { /* child */ > + rc = kill_device_model_uid_child(ddms, dm_uid_str); > + _exit(rc); > + } You cannot _exit(rc). See my comments below... Personally I like to put the child process block right after the fork, especially when (like here) it is very short because the meat has been lifted elsewhere. Just a suggestion; you can leave it here if you prefer. > - /* We should try to destroy the device model anyway. */ > - rc = kill_device_model(gc, > - GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); > + /* > + * No uid to kill; attept to kill by pid. > + */ > + LOGD(DEBUG, domid, "Didn't find dm UID; destroying by pid"); > + > + path = GCSPRINTF("/local/domain/%d/image/device-model-pid", domid); > + rc = kill_device_model(gc, path); > + > + if (rc) { > + PROPAGATE_RC; > + LOGD(ERROR, domid, "Killing device model pid from path %s", path); > + } > + > +out: I would prefer the apparently-redundant: + rc = 0; > +out: This avoids introducing a bug if, just before the success exit path, new code gets added which doesn't happen to leave rc==0. > + /* > + * NB that we always return '0' here for the "status of exited > + * process"; since there is no process, it always "succeeds". I had to read this three times to figure out what this meant. Perhaps I'm being dense but would you mind writing + * NB that we always pass '0' here for the "status of exited ^^^^ > +/* > + * Destroy all processes of the given uid by setresuid to the > + * specified uid and kill(-1). NB this MUST BE CALLED FROM A SEPARATE > + * PROCESS from the normal libxl process. Returns a libxl-style error > + * code. > + */ > +static int kill_device_model_uid_child(libxl__destroy_devicemodel_state > *ddms, > + const char *dm_uid_str) { > + STATE_AO_GC(ddms->ao); > + int domid = ddms->domid; > + int r, rc = 0; > + uid_t dm_uid = atoi(dm_uid_str); > + > + /* > + * FIXME: the second uid needs to be distinct to avoid being > + * killed by a potential rogue process > + */ > + > + /* > + * Should never happen; but if it does, better to have the > + * toolstack crash with an error than nuking dom0. > + */ > + assert(dm_uid); > + > + LOGD(DEBUG, domid, "DM reaper: calling setresuid(%d, %d, 0)", > + dm_uid, dm_uid); > + r = setresuid(dm_uid, dm_uid, 0); > + if (r) { > + LOGED(ERROR, domid, "setresuid to (%d, %d, 0)", dm_uid, dm_uid); > + rc = ERROR_FAIL; > + goto out; > + } > + > + /* > + * And kill everyone but me. > + * > + * NB that it's not clear from either POSIX or the Linux man page > + * that ESRCH would be returned with a pid value of -1, but it > + * doesn't hurt to check. > + */ > + r = kill(-1, 9); > + if (r && errno != ESRCH) { > + LOGED(ERROR, domid, "kill(-1,9)"); > + rc = ERROR_FAIL; > + } > + > +out: > + return rc; > +} > + > +static void kill_device_model_uid_cb(libxl__egc *egc, > + libxl__ev_child *destroyer, > + pid_t pid, int status) > +{ > + libxl__destroy_devicemodel_state *ddms = CONTAINER_OF(destroyer, *ddms, > destroyer); > + STATE_AO_GC(ddms->ao); > + > + if (status) { > + int rc = ERROR_FAIL;; > + > + if (WIFEXITED(status)) > + rc = WEXITSTATUS(status) - 128; This can't be right. Where does this 128 come from ? I don't see a comment anywhere about your encoding of the libxl error value in the exit status. A libxl error code does not necessarily fit in an exit status since an exit status is just one byte. Your protocol reserves the exit status 0 for success. The C implementation typically reserves -1 (255) and sometimes also 127. By convention the exit status is normally regarded as positive and libxl error codes are negative. I suggest one of the following strategies: - Give up on the idea of distinguishing these error codes at all and simply _exit(!!rc). (After all the real error is logged and the function only ever returns FAIL.) - Say that the child function may only return one of a limited subset of libxl error codes (since only FAIL is currently needed), and assert that -125 <= rc <= -1, and _exit(-rc). Then the exit status can be recovered with -WEXITSTATUS(status). - Do something more subtle involving turning out-of-range rc values into -126 or some such. This seems like overkill. > - ddms->callback(egc, ddms, rc); > + ddms->callback(egc, ddms, ddms->rc); > } > +#undef PROPAGATE_RC I am tempted to suggest replacing each call PROPAGATE_RC; with ACCUMULATE_RC(ddms); and put the definition in libxl_internal.h for use elsewhere. I think we would probably already have some other call sites and if we go and fix the destroy stuff probably we will want a lot more of it. Up to you. Like this is certainly fine for now. Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.