[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream
Andrew Cooper writes ("[PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream"): > From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx> > > This contains the event machinary and state machines to read an act on a > non-checkpointed migration v2 stream (with the exception of the > xc_domain_restore() handling which is spliced later in a bisectable way). > > It also contains some boilerplate to help support checkpointed streams, which > shall be introduced in a later patch. ... > +/* State for manipulating a libxl migration v2 stream */ > +typedef struct libxl__stream_read_state libxl__stream_read_state; ... > +struct libxl__stream_read_state { > + /* filled by the user */ > + libxl__ao *ao; > + int fd; > + void (*completion_callback)(libxl__egc *egc, > + libxl__stream_read_state *srs, > + int rc); > + /* Private */ > + int rc; > + bool running; > + > + /* Main stream-reading data */ > + libxl__datacopier_state dc; > + libxl__sr_hdr hdr; > + libxl__sr_record_buf *incoming_record; > + LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue; This comes from malloc, not from the gc. That needs a comment. (Is the same true of incoming_record ? I think it is.) > + enum { > + SRS_PHASE_NORMAL, > + } phase; > + bool recursion_guard; > + > + /* Emulator blob handling */ > + libxl__datacopier_state emu_dc; > + libxl__carefd *emu_carefd; > +}; ... The comments in libxl_stream_read.c are good but I'm afraid I'm going to ask for some commentary about the legal states and/or the invariants in the data structure. That is, you've written a state machine but the states are not explicitly listed. As far as I can tell, from the outside, this machinery has the usual Undefined/Idle/Active states. But does it not have more states on the inside ? PHASE at least needs explaining. See also comments about the record queue, below. For an example of the kind of thing I mean, see libxl_spawn. Particularly, libxl_internal.h: * Each libxl__spawn_state is in one of these states * Undefined, Idle, Attached, Detaching and the much more detailed state and data structure table in libxl_exec.c * Full set of possible states of a libxl__spawn_state and its _detachable: I think you probably don't need to write down the external states for your srs machinery because the states are Undefined, Idle and Active like most things. (Assuming you do something about the anomalous initialisation and checking of `running'.) What I'm missing is the internal description. > +/* > + * Infrastructure for reading and acting on the contents of a libxl migration > + * stream. There are a lot of moving parts here. (Can you wrap to 70 or at least 75?) Thanks. > +void libxl__stream_read_start(libxl__egc *egc, > + libxl__stream_read_state *stream) > +{ > + libxl__datacopier_state *dc = &stream->dc; > + int ret = 0; > + > + /* State initialisation. */ > + assert(!stream->running); This is being discussed apropos of Ian C's review. > + memset(dc, 0, sizeof(*dc)); Should be FILLZERO. > + /* Start reading the stream header. */ > + ret = setup_read(stream, "stream header", > + &stream->hdr, sizeof(stream->hdr), > + stream_header_done); > + if (ret) > + goto err; `ret' should be `rc'. This needs to be changed throughout. > + stream->running = true; > + stream->phase = SRS_PHASE_NORMAL; > + LIBXL_STAILQ_INIT(&stream->record_queue); > + stream->recursion_guard = 0; These initialisations-to-empty need to be moved to before any part of this function which might fail. As it is, if setup_read fails I think you end up reading from the uninitialised stream->record_queue, in stream_done. (In your actual code this does not matter because the caller did a FILLZERO but you should not be relying on that.) > +void libxl__stream_read_abort(libxl__egc *egc, > + libxl__stream_read_state *stream, int rc) > +{ > + stream_failed(egc, stream, rc); This has the same signature as stream_failed. Perhaps stream_failed could be eliminated and replaced with this ? Or maybe... > +static void stream_success(libxl__egc *egc, libxl__stream_read_state *stream) > +{ > + stream->rc = 0; > + stream_done(egc, stream); This function has one call site. As an alternative to abolishing stream_failed, maybe stream_failed should be renamed to `stream_complete' and then the call site for stream_success could simply say `stream_complete(egc, stream, 0)' ? > +static void stream_failed(libxl__egc *egc, > + libxl__stream_read_state *stream, int rc) > +{ > + assert(rc); > + stream->rc = rc; > + > + if (stream->running) { > + stream_done(egc, stream); Is the check for running necessary here ? Not checking, and instead relying on all the individual parts to be properly initialised, makes the correctness clearer. > +static void stream_header_done(libxl__egc *egc, > + libxl__datacopier_state *dc, > + int rc, int onwrite, int errnoval) I think it would be clearer if this function was immediately after libxl_stream_read_start. Probably moving libxl_stream_read_start is the right answer. Also, putting stream_continue after stream_header_done might help. I appreciate that these functions form a loop so can't be strictly in execution order, but it is much easier to read if the first iteration is in the right order. As CODING_STYLE says: The code for asynchronous operations should be laid out in chronological order. That is, where there is a chain of callback functions, each subsequent function should be, textually, the next function in the file. This will normally involve predeclaring the callback functions. Synchronous helper functions should be separated out into a section preceding the main callback chain. AFAICT there are a few other out-of-order functions. > + int ret = 0; > + > + if (rc || onwrite || errnoval) { > + ret = ERROR_FAIL; > + LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval); > + goto err; > + } > + > + hdr->ident = be64toh(hdr->ident); > + hdr->version = be32toh(hdr->version); > + hdr->options = be32toh(hdr->options); > + > + if (hdr->ident != RESTORE_STREAM_IDENT) { > + ret = ERROR_FAIL; > + LOG(ERROR, > + "Invalid ident: expected 0x%016"PRIx64", got 0x%016"PRIx64, > + RESTORE_STREAM_IDENT, hdr->ident); > + goto err; > + } > + if (hdr->version != RESTORE_STREAM_VERSION) { > + ret = ERROR_FAIL; > + LOG(ERROR, "Unexpected Version: expected %u, got %u", > + RESTORE_STREAM_VERSION, hdr->version); > + goto err; > + } > + if (hdr->options & RESTORE_OPT_BIG_ENDIAN) { > + ret = ERROR_FAIL; > + LOG(ERROR, "Unable to handle big endian streams"); > + goto err; > + } > + > + LOG(DEBUG, "Stream v%u%s", hdr->version, > + hdr->options & RESTORE_OPT_LEGACY ? " (from legacy)" : ""); > + > + stream_continue(egc, stream); > + return; > + > + err: > + assert(ret); > + stream_failed(egc, stream, ret); > +} > + > +static void setup_read_record(libxl__egc *egc, > + libxl__stream_read_state *stream) > +{ > + STATE_AO_GC(stream->ao); > + libxl__sr_record_buf *rec = NULL; > + int ret; > + > + assert(stream->incoming_record == NULL); > + > + stream->incoming_record = rec = libxl__zalloc(NOGC, sizeof(*rec)); > + ret = setup_read(stream, "record header", > + &rec->hdr, sizeof(rec->hdr), > + record_header_done); > + if (ret) > + goto err; > + > + return; > + > + err: > + assert(ret); > + stream_failed(egc, stream, ret); > +} > + > +static void record_header_done(libxl__egc *egc, > + libxl__datacopier_state *dc, > + int rc, int onwrite, int errnoval) > +{ > + libxl__stream_read_state *stream = CONTAINER_OF(dc, *stream, dc); > + libxl__sr_record_buf *rec = stream->incoming_record; > + STATE_AO_GC(dc->ao); > + int ret = 0; > + > + if (rc || onwrite || errnoval) { > + ret = ERROR_FAIL; > + LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval); > + goto err; > + } > + > + /* No body? All done. */ > + if (rec->hdr.length == 0) { > + record_body_done(egc, dc, 0, 0, 0); > + return; > + } > + > + size_t bytes_to_read = ROUNDUP(rec->hdr.length, REC_ALIGN_ORDER); > + rec->body = libxl__malloc(NOGC, bytes_to_read); > + > + ret = setup_read(stream, "record body", > + rec->body, bytes_to_read, > + record_body_done); > + if (ret) > + goto err; > + > + return; > + > + err: > + assert(ret); > + stream_failed(egc, stream, ret); > +} > + > +static void record_body_done(libxl__egc *egc, > + libxl__datacopier_state *dc, > + int rc, int onwrite, int errnoval) > +{ > + libxl__stream_read_state *stream = CONTAINER_OF(dc, *stream, dc); > + libxl__sr_record_buf *rec = stream->incoming_record; > + STATE_AO_GC(dc->ao); > + int ret = 0; > + > + if (rc || onwrite || errnoval) { > + ret = ERROR_FAIL; > + LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval); > + goto err; > + } > + > + LIBXL_STAILQ_INSERT_TAIL(&stream->record_queue, rec, entry); > + stream->incoming_record = NULL; > + > + stream_continue(egc, stream); > + return; > + > + err: > + assert(ret); > + stream_failed(egc, stream, ret); > +} > + > +/* > + * Returns a boolean indicating whether a further action should be set up by > + * the caller. This is needed to prevent mutual recursion with > + * stream_continue(). > + */ I don't understand why this mutual recursion would be a problem. AFAICT stream_continue called from process_record ought only to call process_record if record_queue had more than two items on it. But if record_queue has more than two items on it, stream_continue does the wrong thing already. So err, why is stream_continue not a loop; or, why is record_queue a queue ? If the state machine had better doc comments, stating the invariants, I would perhaps not be asking these foolish questions... > +static void write_emulator_blob(libxl__egc *egc, > + libxl__stream_read_state *stream, > + libxl__sr_record_buf *rec) > +{ ... > + > + if ( rec->hdr.length < sizeof(*emu_hdr) ) { Coding style (whitespace) > + sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE".%u", dcs->guest_domid); > + > + libxl__carefd_begin(); > + writefd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0600); Does this need to be a carefd ? This is going to be a small amount of data, and the worst that happens is some other process inherits an fd onto a file which might otherwise be deleted sooner. If it does, then please use libxl__carefd_opened. (Best would be to libxl__carefd_opened to save and restore errno and then you can call it unconditionally after the open.) Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |