[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 03/15] libxl_qmp: Implement fd callback and read data [and 1 more messages]



As discussed, I squwashed several of these patches together and I'm
now doing my code review on the result.  I've gone through it in some
detail.


Again, I asked for more internal documentation about the legal states
etc.  I will have to read it in detail again I'm afraid after that is
done.

The reason I ask for this is that this is a complicated and
substantial pile of code.  It's not sensible for anyone to try to hold
it in their head at once - we will make mistakes.  And it should be
possbile to modify it without reading all of it.

So, it should be possible for anyone to:

 * Look only at the summary internal architecture state machine
   comments and so on, and confirm to themselves that this is a
   sensible design and that all the possible states are represented
   and that the interlocking states of the detailed variables are
   both sufficient, and of manageable complexity.

 * Read any small fragment of code and see that transforms legal
   states into other legal states, by reference to the above.

Right now this is not really possible.  I look at things like this:

> +    ev->id = ev->last_id_used;
> +    ev->tx_buf = buf;
> +    ev->tx_buf_len = len;
> +
> +    libxl__ev_fd_modify(gc, &ev->qmp_efd, ev->qmp_efd.events | POLLOUT);

in qmp_ev_prepare_cmd, and I need to guess whether the surrounding
code has all the needed error checks and whetehr it does all that is
needed.

I can't be sure, because the only way to infer the rules about what
combinations of values are legal in the variables is to read the
entire program.  If I somehow peer at it and with great perspicacity
discover a bug, it won't be clear whether the bug is that the variable
is set wrong, or used wrong, or that there is a design error.


Now on to the details.

Firstly, a small apology.  I misread your squashing instructions
and merged in
  libxl_qmp: Separate QMP message generation from qmp_send_prepare
when I shouldn't have done.  FAOD I like it separated out, so please
don't squash it in even though my review comments below do include it.

In your next respin could usefully buble it up I think.


Ian Jackson writes ("[PATCH 2/8] libxl_qmp: PORTMANTEAU starting with: Connect 
to QMP socket"):
> +typedef enum {
> +    /* initial state */
> +    qmp_state_disconnected = 1,
> +    /* connected to QMP socket, waiting for greeting message */
> +    qmp_state_connecting,
> +    /* greeting message received */
> +    qmp_state_greeting,
> +    /* qmp_capabilities command sent, waiting for reply */
> +    qmp_state_capability_negotiation,
> +    /* ready to send commands */
> +    qmp_state_connected,
> +} libxl__qmp_state;

In an earlier email in response to this same v5 I wrote:

  IWBN to relate these private states to the outward-facing API states
  like `Connected'.

  I often write a table of legal field values - see for example,
  libxl_exec.c near l.241 and libxl_stream_read.c near l.35.  But maybe
  this is not complicated enough to need that.

Seeing this in context with all the new fields in libxl__ev_qmp I
do think this would help.  (I appreciate you've not had a chance to do
that yet.)

Ie ideally, if you can make it fit, a table whose columns are the
qmp_state_* values and whose rows are: the private variables in
libxl__ev_qmp; the outward-facing state; the qmp-facing protocol
state.  (I suggest that way round because there are only 5 states and
more proposed rows.)

>      int id;
> +    libxl__carefd *qmp_cfd;
> +    libxl__ev_fd qmp_efd;
> +    libxl__qmp_state qmp_state;
> +    unsigned int last_id_used;

Why is id signed but last_id_used unsigned ?  This seems odd.

> -static char *qmp_send_prepare(libxl__gc *gc, libxl__qmp_handler *qmp,
> -                              const char *cmd, libxl__json_object *args,
> -                              qmp_callback_t callback, void *opaque,
> -                              qmp_request_context *context)
> +static char *qmp_prepare_cmd(libxl__gc *gc, const char *cmd,
> +                             const libxl__json_object *args,
> +                             int id, size_t *len_r)
...
> +    const unsigned char *buf;

Would be useful to write here that the memory for buf is owned by
hand.  And therefore to have hand first.

> +    libxl_yajl_length len;
>      yajl_gen_status s;
>      yajl_gen hand;

hand should perhaps be initialised to NULL in case new code is added
before it is allocated ?

> +    ret = libxl__malloc(NOGC, len + 3);
> +    strncpy(ret, (const char *)buf, len + 3);
> +    strncpy(ret + len, "\r\n", 3);
> +    len += 2;

I'm not sure why this can't be done with libxl_sprintf and a suitable
%<something>n.  This open-coding of +3 etc. is quite fiddly.

> +/* ------------ Implementation of libxl__ev_qmp ---------------- */
> +
> +/* hard coded message ID used for capability negotiation */
> +#define QMP_CAPABILITY_NEGOTIATION_MSGID 1
> +
> +static int qmp_ev_prepare_cmd(libxl__gc *gc,
> +                              libxl__ev_qmp *ev,
> +                              const char *cmd,
> +                              const libxl__json_object *args)
> +{
> +    char *buf = NULL;
> +    size_t len;
> +
> +    buf = qmp_prepare_cmd(gc, cmd, args, ++ev->last_id_used, &len);
> +    if (!buf)
> +        return ERROR_FAIL;
> +
> +    ev->id = ev->last_id_used;

This duplicate reference to last_id_used is a bit odd.  Perhaps this
should go above and then qmp_prepare_cmd could reference it ?

> +    ev->tx_buf = buf;
> +    ev->tx_buf_len = len;

I think it would be wise to assert, earlier in this function, that
the buffer is empty.

> +static int qmp_error_class_to_libxl_error_code(const libxl__qmp_error_class 
> c)
> +{
> +    switch (c) {
> +    case LIBXL__QMP_ERROR_CLASS_GENERICERROR:
> +        return ERROR_QMP_GENERIC_ERROR;
> +    case LIBXL__QMP_ERROR_CLASS_COMMANDNOTFOUND:
> +        return ERROR_QMP_COMMAND_NOT_FOUND;
> +    case LIBXL__QMP_ERROR_CLASS_DEVICENOTFOUND:
> +        return ERROR_QMP_DEVICE_NOT_FOUND;

Urgh.  The slightly different names means that this can't be
macro-ified.  Without that, it would be really easy for someone in the
future to accidentally write something like this:

> +    case LIBXL__QMP_ERROR_CLASS_GENERICERROR:
> +        return ERROR_QMP_GENERIC_ERROR;
> +    case LIBXL__QMP_ERROR_CLASS_COMMANDNOTFOUND:
> +        return ERROR_QMP_GENERIC_ERROR;
> +    case LIBXL__QMP_ERROR_CLASS_DEVICENOTFOUND:
> +        return ERROR_QMP_DEVICE_NOT_FOUND;

and it's hard to spot.

What are LIBXL__QMP_ERROR_CLASSes and why are they even different from
ERROR_* values ?  Maybe one of them is the QMP integer values and one
of them is the corresponding libxl integer values ?

Anyway, can we not make this less open-coded somehow.

> +    default:
> +        abort();

But what happens if qemu invents a new error code ?  I don't think
aborting is likely to be right.

> +/* return 1 when a user callback as been called */
> +static int qmp_ev_handle_response(libxl__egc *egc,
> +                                  libxl__ev_qmp *ev,
> +                                  const libxl__json_object *resp,
> +                                  libxl__qmp_message_type type)
> +{
> +    EGC_GC;
> +    const libxl__json_object *response;
> +    const libxl__json_object *o;
> +    int rc;
> +    int id;
> +
> +    o = libxl__json_map_get("id", resp, JSON_INTEGER);
> +    if (!o) {
> +        const char *error_desc;
> +
> +        /* unexpected message, attempt to find an error description */
> +        o = libxl__json_map_get("error", resp, JSON_MAP);
> +        o = libxl__json_map_get("desc", o, JSON_STRING);

What is the dead store from "error" doing ?

> +    id = libxl__json_object_get_integer(o);
> +    if (id != ev->id && id != QMP_CAPABILITY_NEGOTIATION_MSGID)
> +        return 0;

Err, is this an expected situation ?  ISTM that if qemu sends us
messages with uknown ids we should declare it gone wrong.

> +    switch (type) {
> +    case LIBXL__QMP_MESSAGE_TYPE_RETURN: {
> +        response = libxl__json_map_get("return", resp, JSON_ANY);
> +        rc = 0;
> +        break;
> +    }
> +    case LIBXL__QMP_MESSAGE_TYPE_ERROR: {
> +        const char *s;
> +        const libxl__json_object *err;
> +        libxl__qmp_error_class error_class;
> +
> +        rc = ERROR_FAIL;
> +        response = NULL;
> +
> +        err = libxl__json_map_get("error", resp, JSON_MAP);
> +        o = libxl__json_map_get("class", err, JSON_STRING);
> +        s = libxl__json_object_get_string(o);
> +        if (s && !libxl__qmp_error_class_from_string(s, &error_class))
> +            rc = qmp_error_class_to_libxl_error_code(error_class);
> +
> +        o = libxl__json_map_get("desc", err, JSON_STRING);
> +        s = libxl__json_object_get_string(o);
> +        if (s)
> +            LOGD(ERROR, ev->domid, "%s", s);
> +
> +        break;

Surely this should print more debugging (or maybe even error log
messages) if the error json document was not in the expected form ?

> +    /*
> +     * Even if the current state is capability_negotiation and the correct ID
> +     * as been received, call the callback on error.
> +     */
> +    if (ev->qmp_state == qmp_state_capability_negotiation &&
> +        id == QMP_CAPABILITY_NEGOTIATION_MSGID &&
> +        rc == 0) {
> +        ev->qmp_state = qmp_state_connected;
> +        libxl__ev_fd_modify(gc, &ev->qmp_efd, ev->qmp_efd.events | POLLOUT);
> +        return 0;

I don't much like the open-coding of libx__ev_fd_modify here, although
the alternative might be a very small function.  Consider that,
though ?  Or a comment ?

> +    } else {
> +        ev->id = -1;
> +        ev->callback(egc, ev, response, rc); /* must be last */
> +        return 1;

Surely this should set the internal state to a new state ?

> +    }
> +}
> +
> +/* return 1 when a user callback as been called */
> +static int qmp_ev_handle_message(libxl__egc *egc,
> +                                 libxl__ev_qmp *ev,
> +                                 const libxl__json_object *resp)

If you have it like this, this explanation of the return value needs
to be clearer.  There is only one call site, so it might be better to
fold it into the fd callback.

> +{
> +    EGC_GC;
> +    libxl__qmp_message_type type = qmp_response_type(resp);
> +
> +    switch (type) {
> +    case LIBXL__QMP_MESSAGE_TYPE_QMP:
> +        /* greeting message */
> +        assert(ev->qmp_state == qmp_state_connecting);

This assert is wrong.  A hostile qemu could crash libxl.

I think you need a `declare qemu has gone wrong' function which
cleanly tears the thing down and makes a suitable call to the
callback.

> +        ev->qmp_state = qmp_state_greeting;
> +        /* Allow qmp_ev_callback_writable to be called in order to send
> +         * qmp_capabilities */
> +        libxl__ev_fd_modify(gc, &ev->qmp_efd, ev->qmp_efd.events | POLLOUT);

See my comments about writeable, above.

Another possible approach to this would be to have a function called
something like
   qmp_ev_ensure_reading_writing
which looks at ev->qmp_state and makes an appropriate call to
libxl__ev_fd_modify.

That would make it easy to verify the correctness against the table of
possible states, which I asked for right at the start of this review.

> +static int qmp_ev_callback_readable(libxl__egc *egc, libxl__ev_qmp *ev, int 
> fd)
> +{
> +    EGC_GC;
> +    ssize_t r;
> +    char *end = NULL;
...

I think I already commented on some of this so I won't repeat myself.

> +    /* workaround strstr limitation */
> +    ev->rx_buf[ev->buf_used] = '\0';

It's possible that you wanted `memmem' which is in glibc and FreeBSD's
libc, at least ?

> +    /*
> +     * Search for the end of a QMP message: "\r\n" in the newly received
> +     * bytes + the last byte on the previous read() if any

This is to avoid rescanning the buffer pointlessly ?

> +     * end: This will point to the byte right after \r\n
> +     */
> +    end = strstr(ev->rx_buf + ev->buf_used - r
> +                 + (ev->buf_used - r == 0 ? 0 : -1),
> +                 "\r\n");

This is really quite fiddly.  I presume that there should be no bare
\n in this stream.  How about searching for just \n instead, and then
checking for and stripping the \r ?  (Instead of my memmem
suggestion.)

> +    while (end) {
> +        libxl__json_object *o;
> +        /* Start parsing from s */
> +        char *s = ev->rx_buf + ev->buf_consumed;

This complicated buffer management would really benefit from some
comments the stating invariants between rx_buf, buf_consumed, and so
on.  Please provide that, and then I will have to re-review it.

> +        /* Findout how much can be parsed */
> +        size_t len = end - s;
> +
> +        LOG_QMP("parsing %luB: '%.*s'", len, (int)len, s);
> +
> +        /* Replace \n by \0 so that libxl__json_parse can use strlen */
> +        s[len - 1] = '\0';

Doesn't this feed the \r to libxl__json_parse ?

Also, why does this have to be a loop ?  Does qemu really send
multiple json documents end to end, and only sometimes with
intervening \r\n ?  Does it ever send a json document without \r\n at
the end and then stop speaking for a while ?

> +        ev->buf_consumed += len;
> +
> +        if (ev->buf_consumed >= ev->buf_used) {
> +            free(ev->rx_buf);

Surely buf_consumed <= buf_used ?  Maybe you should assert that.

> +        /* check if there is another message received at the same time */
> +        if (ev->rx_buf) {
> +            end = strstr(ev->rx_buf + ev->buf_consumed, "\r\n");

Wait, you have two calls to strstr ?  Now I am confused about the code
structure.  You definitely need to write this so that the searching
for delimiters is only done once.

> +        /* Must be last and return when the user callback is called */
> +        if (qmp_ev_handle_message(egc, ev, o) == 1)
> +            return 0;

There seems to be nothing in any of this that handles the case where
a malicious or buggy qemu sends an unsolicited response.  I think that
will probably make a wild callback ?

> +static int qmp_ev_callback_writable(libxl__gc *gc, libxl__ev_qmp *ev, int fd)
> +{
> +    int rc;
> +    char *buf;
> +    size_t len;
> +    int send_fd = -1;
> +
> +    /* No need to call qmp_ev_callback_writable again, everything that needs 
> to
> +     * be send for now will be in this call. */

So you guarantee never to send any message which is too large to fit
into a socket buffer ?  Do you know what that length is ?

> +    case qmp_state_connected:
> +        if (!ev->tx_buf)
> +            return 0;
> +
> +        buf = ev->tx_buf;
> +        len = ev->tx_buf_len;
> +        send_fd = libxl__carefd_fd(ev->cfd);
> +
> +        ev->tx_buf = NULL;

I don't like the way you leave tx_buf_len set to a nonzero value.
A sensible invariant would be that tx_buf_len is always valid and that
therefore if tx_buf is 0, so is tx_buf_len.

Where is tx_buf freed ?  Indeed, what is tx_buf's memory management ?
That needs to be written down.

> +static void qmp_ev_callback_error(libxl__egc *egc, libxl__ev_qmp *ev)
> +{
> +    EGC_GC;
> +
> +    LOGD(ERROR, ev->domid, "Error happend with the QMP connection to QEMU");
> +
> +    /* On error, deallocate all private ressources */
> +    libxl__ev_qmp_dispose(gc, ev);

This surely needs to set the state.  Presumably that should be done in
libxl__ev_qmp_dispose but AFAICT it isn't.

> +static void qmp_ev_fd_callback(libxl__egc *egc, libxl__ev_fd *ev_fd,
> +                               int fd, short events, short revents)
> +{
> +    EGC_GC;
> +    int rc;

You could implement the retry loop here.  You could assign a positive
value return code from qmp_ev_callback_writable /
qmp_ev_callback_readable to indicate that it is done.

> +    libxl__ev_qmp *ev = CONTAINER_OF(ev_fd, *ev, qmp_efd);
> +
> +    if (revents & (POLLHUP)) {
> +        LOGD(DEBUG, ev->domid, "received POLLHUP from QMP socket");

Surely these should be LOGD(ERROR,...).  This is not expected, is it,
and it is probably going to cause something else to fail.

> +        rc = ERROR_FAIL;

I think it might be a good idea to invent a new
ERROR_QEMU_QMP_PROTOCOL_ERROR or something (look at the other codes to
pick a good name...).

> +static int qmp_ev_connect(libxl__gc *gc, libxl__ev_qmp *ev)
> +{
> +    int rc, r;
> +    struct sockaddr_un un;
> +    const char *qmp_socket_path;
> +
> +    if (ev->qmp_state != qmp_state_disconnected)
> +        return 0;
> +
> +    qmp_socket_path = libxl__qemu_qmp_path(gc, ev->domid);
> +
> +    LOGD(DEBUG, ev->domid, "Connecting to %s", qmp_socket_path);
> +
> +    libxl__carefd_begin();
> +    ev->qmp_cfd = libxl__carefd_opened(CTX, socket(AF_UNIX, SOCK_STREAM, 0));

I find this socket call nested in the arguments to
libxl__carefd_opened rather un-plain.

> +    r = connect(libxl__carefd_fd(ev->qmp_cfd),
> +                (struct sockaddr *) &un, sizeof(un));
> +    if (r) {

I commented on the error handling here already I think.

> +void libxl__ev_qmp_init(libxl__ev_qmp *ev)
> +{
> +    ev->id = -1;
> +
> +    ev->qmp_cfd = NULL;
> +    libxl__ev_fd_init(&ev->qmp_efd);
> +    ev->qmp_state = qmp_state_disconnected;
> +    ev->last_id_used = QMP_CAPABILITY_NEGOTIATION_MSGID + 1;

Going back a bit, earlier we had:

> +#define QMP_CAPABILITY_NEGOTIATION_MSGID 1

I recommend using a different value.  Can we safely let this wrap ?
If so maybe we could use 0x786c7100 which spells "xlq\0" or something.
This can make it easier to see where rogue values are coming from, to
distinguish arguments in debug output, etc.

> +int libxl__ev_qmp_send(libxl__gc *gc, libxl__ev_qmp *ev,
> +                       const char *cmd, libxl__json_object *args)
> +{
> +    int rc;
> +
> +    LOGD(DEBUG, ev->domid, " ev %p, cmd '%s'", ev, cmd);

Needs an assert about the state.

> +    /* Connect to QEMU if not already connected */
> +    rc = qmp_ev_connect(gc, ev);
> +    if (rc)
> +        goto out;
> +
> +    rc = qmp_ev_prepare_cmd(gc, ev, cmd, args);
> +    if (rc)
> +        goto out;
> +
> +out:
> +    if (rc)
> +        libxl__ev_qmp_dispose(gc, ev);

I think you need to write in your API documents that if callback was
called indicating an error, the connection may be made Idle.


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.