Xen project Mailing List

Re: [Xen-devel] [PATCH v5 03/15] libxl_qmp: Implement fd callback and read data [and 1 more messages]

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxx>

Date: Mon, 15 Oct 2018 17:35:36 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Mon, 15 Oct 2018 16:36:07 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

As discussed, I squwashed several of these patches together and I'm now doing my code review on the result. I've gone through it in some detail. Again, I asked for more internal documentation about the legal states etc. I will have to read it in detail again I'm afraid after that is done. The reason I ask for this is that this is a complicated and substantial pile of code. It's not sensible for anyone to try to hold it in their head at once - we will make mistakes. And it should be possbile to modify it without reading all of it. So, it should be possible for anyone to: * Look only at the summary internal architecture state machine comments and so on, and confirm to themselves that this is a sensible design and that all the possible states are represented and that the interlocking states of the detailed variables are both sufficient, and of manageable complexity. * Read any small fragment of code and see that transforms legal states into other legal states, by reference to the above. Right now this is not really possible. I look at things like this: > + ev->id = ev->last_id_used; > + ev->tx_buf = buf; > + ev->tx_buf_len = len; > + > + libxl__ev_fd_modify(gc, &ev->qmp_efd, ev->qmp_efd.events | POLLOUT); in qmp_ev_prepare_cmd, and I need to guess whether the surrounding code has all the needed error checks and whetehr it does all that is needed. I can't be sure, because the only way to infer the rules about what combinations of values are legal in the variables is to read the entire program. If I somehow peer at it and with great perspicacity discover a bug, it won't be clear whether the bug is that the variable is set wrong, or used wrong, or that there is a design error. Now on to the details. Firstly, a small apology. I misread your squashing instructions and merged in libxl_qmp: Separate QMP message generation from qmp_send_prepare when I shouldn't have done. FAOD I like it separated out, so please don't squash it in even though my review comments below do include it. In your next respin could usefully buble it up I think. Ian Jackson writes ("[PATCH 2/8] libxl_qmp: PORTMANTEAU starting with: Connect to QMP socket"): > +typedef enum { > + /* initial state */ > + qmp_state_disconnected = 1, > + /* connected to QMP socket, waiting for greeting message */ > + qmp_state_connecting, > + /* greeting message received */ > + qmp_state_greeting, > + /* qmp_capabilities command sent, waiting for reply */ > + qmp_state_capability_negotiation, > + /* ready to send commands */ > + qmp_state_connected, > +} libxl__qmp_state; In an earlier email in response to this same v5 I wrote: IWBN to relate these private states to the outward-facing API states like `Connected'. I often write a table of legal field values - see for example, libxl_exec.c near l.241 and libxl_stream_read.c near l.35. But maybe this is not complicated enough to need that. Seeing this in context with all the new fields in libxl__ev_qmp I do think this would help. (I appreciate you've not had a chance to do that yet.) Ie ideally, if you can make it fit, a table whose columns are the qmp_state_* values and whose rows are: the private variables in libxl__ev_qmp; the outward-facing state; the qmp-facing protocol state. (I suggest that way round because there are only 5 states and more proposed rows.) > int id; > + libxl__carefd *qmp_cfd; > + libxl__ev_fd qmp_efd; > + libxl__qmp_state qmp_state; > + unsigned int last_id_used; Why is id signed but last_id_used unsigned ? This seems odd. > -static char *qmp_send_prepare(libxl__gc *gc, libxl__qmp_handler *qmp, > - const char *cmd, libxl__json_object *args, > - qmp_callback_t callback, void *opaque, > - qmp_request_context *context) > +static char *qmp_prepare_cmd(libxl__gc *gc, const char *cmd, > + const libxl__json_object *args, > + int id, size_t *len_r) ... > + const unsigned char *buf; Would be useful to write here that the memory for buf is owned by hand. And therefore to have hand first. > + libxl_yajl_length len; > yajl_gen_status s; > yajl_gen hand; hand should perhaps be initialised to NULL in case new code is added before it is allocated ? > + ret = libxl__malloc(NOGC, len + 3); > + strncpy(ret, (const char *)buf, len + 3); > + strncpy(ret + len, "\r\n", 3); > + len += 2; I'm not sure why this can't be done with libxl_sprintf and a suitable %<something>n. This open-coding of +3 etc. is quite fiddly. > +/* ------------ Implementation of libxl__ev_qmp ---------------- */ > + > +/* hard coded message ID used for capability negotiation */ > +#define QMP_CAPABILITY_NEGOTIATION_MSGID 1 > + > +static int qmp_ev_prepare_cmd(libxl__gc *gc, > + libxl__ev_qmp *ev, > + const char *cmd, > + const libxl__json_object *args) > +{ > + char *buf = NULL; > + size_t len; > + > + buf = qmp_prepare_cmd(gc, cmd, args, ++ev->last_id_used, &len); > + if (!buf) > + return ERROR_FAIL; > + > + ev->id = ev->last_id_used; This duplicate reference to last_id_used is a bit odd. Perhaps this should go above and then qmp_prepare_cmd could reference it ? > + ev->tx_buf = buf; > + ev->tx_buf_len = len; I think it would be wise to assert, earlier in this function, that the buffer is empty. > +static int qmp_error_class_to_libxl_error_code(const libxl__qmp_error_class > c) > +{ > + switch (c) { > + case LIBXL__QMP_ERROR_CLASS_GENERICERROR: > + return ERROR_QMP_GENERIC_ERROR; > + case LIBXL__QMP_ERROR_CLASS_COMMANDNOTFOUND: > + return ERROR_QMP_COMMAND_NOT_FOUND; > + case LIBXL__QMP_ERROR_CLASS_DEVICENOTFOUND: > + return ERROR_QMP_DEVICE_NOT_FOUND; Urgh. The slightly different names means that this can't be macro-ified. Without that, it would be really easy for someone in the future to accidentally write something like this: > + case LIBXL__QMP_ERROR_CLASS_GENERICERROR: > + return ERROR_QMP_GENERIC_ERROR; > + case LIBXL__QMP_ERROR_CLASS_COMMANDNOTFOUND: > + return ERROR_QMP_GENERIC_ERROR; > + case LIBXL__QMP_ERROR_CLASS_DEVICENOTFOUND: > + return ERROR_QMP_DEVICE_NOT_FOUND; and it's hard to spot. What are LIBXL__QMP_ERROR_CLASSes and why are they even different from ERROR_* values ? Maybe one of them is the QMP integer values and one of them is the corresponding libxl integer values ? Anyway, can we not make this less open-coded somehow. > + default: > + abort(); But what happens if qemu invents a new error code ? I don't think aborting is likely to be right. > +/* return 1 when a user callback as been called */ > +static int qmp_ev_handle_response(libxl__egc *egc, > + libxl__ev_qmp *ev, > + const libxl__json_object *resp, > + libxl__qmp_message_type type) > +{ > + EGC_GC; > + const libxl__json_object *response; > + const libxl__json_object *o; > + int rc; > + int id; > + > + o = libxl__json_map_get("id", resp, JSON_INTEGER); > + if (!o) { > + const char *error_desc; > + > + /* unexpected message, attempt to find an error description */ > + o = libxl__json_map_get("error", resp, JSON_MAP); > + o = libxl__json_map_get("desc", o, JSON_STRING); What is the dead store from "error" doing ? > + id = libxl__json_object_get_integer(o); > + if (id != ev->id && id != QMP_CAPABILITY_NEGOTIATION_MSGID) > + return 0; Err, is this an expected situation ? ISTM that if qemu sends us messages with uknown ids we should declare it gone wrong. > + switch (type) { > + case LIBXL__QMP_MESSAGE_TYPE_RETURN: { > + response = libxl__json_map_get("return", resp, JSON_ANY); > + rc = 0; > + break; > + } > + case LIBXL__QMP_MESSAGE_TYPE_ERROR: { > + const char *s; > + const libxl__json_object *err; > + libxl__qmp_error_class error_class; > + > + rc = ERROR_FAIL; > + response = NULL; > + > + err = libxl__json_map_get("error", resp, JSON_MAP); > + o = libxl__json_map_get("class", err, JSON_STRING); > + s = libxl__json_object_get_string(o); > + if (s && !libxl__qmp_error_class_from_string(s, &error_class)) > + rc = qmp_error_class_to_libxl_error_code(error_class); > + > + o = libxl__json_map_get("desc", err, JSON_STRING); > + s = libxl__json_object_get_string(o); > + if (s) > + LOGD(ERROR, ev->domid, "%s", s); > + > + break; Surely this should print more debugging (or maybe even error log messages) if the error json document was not in the expected form ? > + /* > + * Even if the current state is capability_negotiation and the correct ID > + * as been received, call the callback on error. > + */ > + if (ev->qmp_state == qmp_state_capability_negotiation && > + id == QMP_CAPABILITY_NEGOTIATION_MSGID && > + rc == 0) { > + ev->qmp_state = qmp_state_connected; > + libxl__ev_fd_modify(gc, &ev->qmp_efd, ev->qmp_efd.events | POLLOUT); > + return 0; I don't much like the open-coding of libx__ev_fd_modify here, although the alternative might be a very small function. Consider that, though ? Or a comment ? > + } else { > + ev->id = -1; > + ev->callback(egc, ev, response, rc); /* must be last */ > + return 1; Surely this should set the internal state to a new state ? > + } > +} > + > +/* return 1 when a user callback as been called */ > +static int qmp_ev_handle_message(libxl__egc *egc, > + libxl__ev_qmp *ev, > + const libxl__json_object *resp) If you have it like this, this explanation of the return value needs to be clearer. There is only one call site, so it might be better to fold it into the fd callback. > +{ > + EGC_GC; > + libxl__qmp_message_type type = qmp_response_type(resp); > + > + switch (type) { > + case LIBXL__QMP_MESSAGE_TYPE_QMP: > + /* greeting message */ > + assert(ev->qmp_state == qmp_state_connecting); This assert is wrong. A hostile qemu could crash libxl. I think you need a `declare qemu has gone wrong' function which cleanly tears the thing down and makes a suitable call to the callback. > + ev->qmp_state = qmp_state_greeting; > + /* Allow qmp_ev_callback_writable to be called in order to send > + * qmp_capabilities */ > + libxl__ev_fd_modify(gc, &ev->qmp_efd, ev->qmp_efd.events | POLLOUT); See my comments about writeable, above. Another possible approach to this would be to have a function called something like qmp_ev_ensure_reading_writing which looks at ev->qmp_state and makes an appropriate call to libxl__ev_fd_modify. That would make it easy to verify the correctness against the table of possible states, which I asked for right at the start of this review. > +static int qmp_ev_callback_readable(libxl__egc *egc, libxl__ev_qmp *ev, int > fd) > +{ > + EGC_GC; > + ssize_t r; > + char *end = NULL; ... I think I already commented on some of this so I won't repeat myself. > + /* workaround strstr limitation */ > + ev->rx_buf[ev->buf_used] = '\0'; It's possible that you wanted `memmem' which is in glibc and FreeBSD's libc, at least ? > + /* > + * Search for the end of a QMP message: "\r\n" in the newly received > + * bytes + the last byte on the previous read() if any This is to avoid rescanning the buffer pointlessly ? > + * end: This will point to the byte right after \r\n > + */ > + end = strstr(ev->rx_buf + ev->buf_used - r > + + (ev->buf_used - r == 0 ? 0 : -1), > + "\r\n"); This is really quite fiddly. I presume that there should be no bare \n in this stream. How about searching for just \n instead, and then checking for and stripping the \r ? (Instead of my memmem suggestion.) > + while (end) { > + libxl__json_object *o; > + /* Start parsing from s */ > + char *s = ev->rx_buf + ev->buf_consumed; This complicated buffer management would really benefit from some comments the stating invariants between rx_buf, buf_consumed, and so on. Please provide that, and then I will have to re-review it. > + /* Findout how much can be parsed */ > + size_t len = end - s; > + > + LOG_QMP("parsing %luB: '%.*s'", len, (int)len, s); > + > + /* Replace \n by \0 so that libxl__json_parse can use strlen */ > + s[len - 1] = '\0'; Doesn't this feed the \r to libxl__json_parse ? Also, why does this have to be a loop ? Does qemu really send multiple json documents end to end, and only sometimes with intervening \r\n ? Does it ever send a json document without \r\n at the end and then stop speaking for a while ? > + ev->buf_consumed += len; > + > + if (ev->buf_consumed >= ev->buf_used) { > + free(ev->rx_buf); Surely buf_consumed <= buf_used ? Maybe you should assert that. > + /* check if there is another message received at the same time */ > + if (ev->rx_buf) { > + end = strstr(ev->rx_buf + ev->buf_consumed, "\r\n"); Wait, you have two calls to strstr ? Now I am confused about the code structure. You definitely need to write this so that the searching for delimiters is only done once. > + /* Must be last and return when the user callback is called */ > + if (qmp_ev_handle_message(egc, ev, o) == 1) > + return 0; There seems to be nothing in any of this that handles the case where a malicious or buggy qemu sends an unsolicited response. I think that will probably make a wild callback ? > +static int qmp_ev_callback_writable(libxl__gc *gc, libxl__ev_qmp *ev, int fd) > +{ > + int rc; > + char *buf; > + size_t len; > + int send_fd = -1; > + > + /* No need to call qmp_ev_callback_writable again, everything that needs > to > + * be send for now will be in this call. */ So you guarantee never to send any message which is too large to fit into a socket buffer ? Do you know what that length is ? > + case qmp_state_connected: > + if (!ev->tx_buf) > + return 0; > + > + buf = ev->tx_buf; > + len = ev->tx_buf_len; > + send_fd = libxl__carefd_fd(ev->cfd); > + > + ev->tx_buf = NULL; I don't like the way you leave tx_buf_len set to a nonzero value. A sensible invariant would be that tx_buf_len is always valid and that therefore if tx_buf is 0, so is tx_buf_len. Where is tx_buf freed ? Indeed, what is tx_buf's memory management ? That needs to be written down. > +static void qmp_ev_callback_error(libxl__egc *egc, libxl__ev_qmp *ev) > +{ > + EGC_GC; > + > + LOGD(ERROR, ev->domid, "Error happend with the QMP connection to QEMU"); > + > + /* On error, deallocate all private ressources */ > + libxl__ev_qmp_dispose(gc, ev); This surely needs to set the state. Presumably that should be done in libxl__ev_qmp_dispose but AFAICT it isn't. > +static void qmp_ev_fd_callback(libxl__egc *egc, libxl__ev_fd *ev_fd, > + int fd, short events, short revents) > +{ > + EGC_GC; > + int rc; You could implement the retry loop here. You could assign a positive value return code from qmp_ev_callback_writable / qmp_ev_callback_readable to indicate that it is done. > + libxl__ev_qmp *ev = CONTAINER_OF(ev_fd, *ev, qmp_efd); > + > + if (revents & (POLLHUP)) { > + LOGD(DEBUG, ev->domid, "received POLLHUP from QMP socket"); Surely these should be LOGD(ERROR,...). This is not expected, is it, and it is probably going to cause something else to fail. > + rc = ERROR_FAIL; I think it might be a good idea to invent a new ERROR_QEMU_QMP_PROTOCOL_ERROR or something (look at the other codes to pick a good name...). > +static int qmp_ev_connect(libxl__gc *gc, libxl__ev_qmp *ev) > +{ > + int rc, r; > + struct sockaddr_un un; > + const char *qmp_socket_path; > + > + if (ev->qmp_state != qmp_state_disconnected) > + return 0; > + > + qmp_socket_path = libxl__qemu_qmp_path(gc, ev->domid); > + > + LOGD(DEBUG, ev->domid, "Connecting to %s", qmp_socket_path); > + > + libxl__carefd_begin(); > + ev->qmp_cfd = libxl__carefd_opened(CTX, socket(AF_UNIX, SOCK_STREAM, 0)); I find this socket call nested in the arguments to libxl__carefd_opened rather un-plain. > + r = connect(libxl__carefd_fd(ev->qmp_cfd), > + (struct sockaddr *) &un, sizeof(un)); > + if (r) { I commented on the error handling here already I think. > +void libxl__ev_qmp_init(libxl__ev_qmp *ev) > +{ > + ev->id = -1; > + > + ev->qmp_cfd = NULL; > + libxl__ev_fd_init(&ev->qmp_efd); > + ev->qmp_state = qmp_state_disconnected; > + ev->last_id_used = QMP_CAPABILITY_NEGOTIATION_MSGID + 1; Going back a bit, earlier we had: > +#define QMP_CAPABILITY_NEGOTIATION_MSGID 1 I recommend using a different value. Can we safely let this wrap ? If so maybe we could use 0x786c7100 which spells "xlq\0" or something. This can make it easier to see where rogue values are coming from, to distinguish arguments in debug output, etc. > +int libxl__ev_qmp_send(libxl__gc *gc, libxl__ev_qmp *ev, > + const char *cmd, libxl__json_object *args) > +{ > + int rc; > + > + LOGD(DEBUG, ev->domid, " ev %p, cmd '%s'", ev, cmd); Needs an assert about the state. > + /* Connect to QEMU if not already connected */ > + rc = qmp_ev_connect(gc, ev); > + if (rc) > + goto out; > + > + rc = qmp_ev_prepare_cmd(gc, ev, cmd, args); > + if (rc) > + goto out; > + > +out: > + if (rc) > + libxl__ev_qmp_dispose(gc, ev); I think you need to write in your API documents that if callback was called indicating an error, the connection may be made Idle. Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.