[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream



On Thu, 2015-07-09 at 19:26 +0100, Andrew Cooper wrote:
> From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
> 
> This contains the event machinary and state machines to read an act on a

"machinery"

[...]


> Large quantities of the logic here are completely overhauled since v1, mostly
> as part of fixing the checkpoint buffering bug which was the cause of the
> broken Remus failover.  The result is actually more simple overall;

I agree, it looks much nicer, thanks!

> +struct libxl__stream_read_state {
> +    /* filled by the user */
> +    libxl__ao *ao;
> +    int fd;
> +    void (*completion_callback)(libxl__egc *egc,
> +                                libxl__stream_read_state *srs,
> +                                int rc);
> +    /* Private */
> +    int rc;
> +    bool running;
[...]
> +void libxl__stream_read_start(libxl__egc *egc,
> +                              libxl__stream_read_state *stream)
> +{
> +    libxl__datacopier_state *dc = &stream->dc;
> +    int ret = 0;
> +
> +    /* State initialisation. */
> +    assert(!stream->running);

Since running is declared private and there is no _init function (I
think _start is effectively filling that role) I'm not sure that the
caller can necessarily be expected to have initialised anything other
than the ao, fd and callback fields.

You might choose to handle this as a request for a doc comment ("must
call LIBXL_FILLZERO on it to init"), or to add a separate init function
containing the memset or to do away with this check. I've not gotten to
the caller yet so I don't know which you will prefer.

> +
> +    memset(dc, 0, sizeof(*dc));
> +    dc->ao = stream->ao;
> +    dc->readfd = stream->fd;
> +    dc->writefd = -1;
> +
> +    /* Start reading the stream header. */
> +    ret = setup_read(stream, "stream header",
> +                     &stream->hdr, sizeof(stream->hdr),
> +                     stream_header_done);
> +    if (ret)
> +        goto err;
> +
> +    stream->running = true;
> +    stream->phase = SRS_PHASE_NORMAL;
> +    LIBXL_STAILQ_INIT(&stream->record_queue);
> +    stream->recursion_guard = 0;
> +
> +    assert(!ret);
> +    return;
> +
> + err:
> +    assert(ret);
> +    stream_failed(egc, stream, ret);

stream failed looks at stream->running, which due to the above might
also be uninitialised here.

> +static void stream_done(libxl__egc *egc,
> +                        libxl__stream_read_state *stream)
> +{
> +    libxl__sr_record_buf *rec, *trec;
> +
> +    assert(stream->running);
> +    stream->running = false;
> +
> +    if (stream->emu_carefd)
> +        libxl__carefd_close(stream->emu_carefd);
> +
> +    LIBXL_STAILQ_FOREACH_SAFE(rec, &stream->record_queue, entry, trec) {
> +        free(rec->body);
> +        free(rec);
> +    }

Am I right in thinking that we should only get here with a non-empty
queue on failure? If so then perhaps:
        assert(LIBXL_STAILQ_EMPTY(...) || stream->rc);
        
?

> +
> +    stream->completion_callback(egc, stream, stream->rc);
> +}
> +
> +static void stream_continue(libxl__egc *egc,
> +                            libxl__stream_read_state *stream)
> +{
> +    STATE_AO_GC(stream->ao);
> +
> +    /* Must not mutually recurse with process_record() */
> +    assert(stream->recursion_guard == false);
> +    stream->recursion_guard = true;

This smells a bit like it ought to be a SRS_PHASE_PROCESSING or some
such, but lets leave that alone...

> +
> +    switch (stream->phase) {
> +    case SRS_PHASE_NORMAL:
> +        /*
> +         * Normal phase of the stream.  We arrive here in several senarios.

"scenarios"

> +static void stream_header_done(libxl__egc *egc,
> +                               libxl__datacopier_state *dc,
> +                               int rc, int onwrite, int errnoval)
> +{
> +    libxl__stream_read_state *stream = CONTAINER_OF(dc, *stream, dc);
> +    libxl__sr_hdr *hdr = &stream->hdr;
> +    STATE_AO_GC(dc->ao);
> +    int ret = 0;
> +
> +    if (rc || onwrite || errnoval) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval);

Could use LOGEV(ERRRO, errnoval, "rc %d, onweite %d", rc, onwrite);
(for all cases I think).

Actually, doesn't dc guarantee to always have already logged on fail?
Comments in the libxl_internal.h suggest so, apart from the abort case,
so I think maybe you can avoid logging explicitly here.

> +        goto err;
> +    }
> +
> +    hdr->ident   = be64toh(hdr->ident);
> +    hdr->version = be32toh(hdr->version);
> +    hdr->options = be32toh(hdr->options);
> +
> +    if (hdr->ident != RESTORE_STREAM_IDENT) {
> +        ret = ERROR_FAIL;

Eventually I suspect the xapi people would like to see something more
specific at least for the general "SRS header fail" if not the
individual reasons.

> +        LOG(ERROR,
> +            "Invalid ident: expected 0x%016"PRIx64", got 0x%016"PRIx64,
> +            RESTORE_STREAM_IDENT, hdr->ident);
> +        goto err;
> +    }
> +    if (hdr->version != RESTORE_STREAM_VERSION) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "Unexpected Version: expected %u, got %u",

hdr->version is a uint32_t, so PRIu32 would be more appropriate.

> +            RESTORE_STREAM_VERSION, hdr->version);
> +        goto err;
> +    }
> +    if (hdr->options & RESTORE_OPT_BIG_ENDIAN) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "Unable to handle big endian streams");
> +        goto err;
> +    }
> +
> +    LOG(DEBUG, "Stream v%u%s", hdr->version,

and again.

Actually looking around since you've used uintXX_t throughout the format
structs, I think you need a lot more PRI[ux]FOO around the place.

_If_ you've compile tested this for both 32- and 64-bit and it works we
could perhaps leave that audit until later.

> +static void setup_read_record(libxl__egc *egc,
> +                              libxl__stream_read_state *stream)
> +{
> +    STATE_AO_GC(stream->ao);
> +    libxl__sr_record_buf *rec = NULL;
> +    int ret;
> +
> +    assert(stream->incoming_record == NULL);
> +
> +    stream->incoming_record = rec = libxl__zalloc(NOGC, sizeof(*rec));

I recall Ian J and you discussing NOGC allocations on IRC. Was the
conclusion that it was OK, or that it could be fixed later, or that it
should be fixed now via an nested ao or something similar?

Unless the answer is "fixed now" I think the reason for the NOGC should
be in either the commit log or a comment (in the header, around about
the definition of the allocated data structure).

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.