[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
On 03/09/14 17:25, David Scott wrote: > Hvmloader uses the xenstore ring and then tries to reset it back > to its initial state before booting the guest. Occasionally xenstored > will read the ring while it is being zeroed and conclude it has > been corrupted. This prevents PV drivers from loading in the guest. > > This patch updates the xenstore ring protocol definition, enabling > a server to advertise additional features to the guest. One such feature > is defined: the ability to cleanly reset the ring (but not the > higher-level protocol, for which we already have RESET_WATCHES). > > This patch implements the ring reconnection features in oxenstored > and hvmloader, fixing the bug. > > This patch also defines an 'invalid' xenstore packet type and uses this > to poison the ring over a reconnect. This will make diagnosing this > bug much easier in future. > > Signed-off-by: David Scott <dave.scott@xxxxxxxxxx> > > --- > > Updates since v3 > * hvmloader: signal the event channel after requesting a reconnect > (thanks to Zheng Li for diagnosing this) > * switch to using feature flags/bits instead of version numbers > * rename the 'closing' field to 'connection state' > * define an invalid packet type (0xffff, since 0x0 was already taken) > * on ring connection/reset, fill the input and output buffers with > the invalid packet > * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one, > remove #define ERROR_FOO > * doc: write from the guest's point of view, weaken guarantees made by > server (e.g. producer = consumer != 0 is valid after reconnect) > * doc: relate reconnection to the RESET_WATCHES command in the higher > level protocol > * doc: be more specific about who must write what, when > > Updates since v2 > * hvmloader: use volatile for read of closing flag > * style improvements > * remove xenstore version #defines > > Updates since v1 > * remove unused client version and associated dead code > * treat 'closing' as a flag by using "!=0" rather than "==1" > * hvmloader: move function up and remove forward decl > * document the existing xenstore ring and the extention in misc/ > --- > docs/misc/xenstore-ring.txt | 112 > +++++++++++++++++++++++++++++++++++ > tools/firmware/hvmloader/xenbus.c | 48 +++++++++------ > tools/ocaml/libs/xb/xb.ml | 27 ++++++++- > tools/ocaml/libs/xb/xb.mli | 1 + > tools/ocaml/libs/xb/xs_ring.ml | 10 ++++ > tools/ocaml/libs/xb/xs_ring_stubs.c | 109 ++++++++++++++++++++++++---------- > xen/include/public/io/xs_wire.h | 13 +++- > 7 files changed, 269 insertions(+), 51 deletions(-) > create mode 100644 docs/misc/xenstore-ring.txt > > diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt > new file mode 100644 > index 0000000..6d18556 > --- /dev/null > +++ b/docs/misc/xenstore-ring.txt > @@ -0,0 +1,112 @@ > +The xenstore ring is a datastructure stored within a single 4KiB page > +shared between the xenstore server and the guest. The ring contains > +two queues of bytes -- one in each direction -- and some signalling > +information. The [xenstore protocol](xenstore.txt) is layered on top of > +the byte streams. > + > +The xenstore ring datastructure > +=============================== > + > +The following table describes the ring structure where > + - offsets and lengths are in bytes; > + - "Input" is used to describe the data sent to the server; and > + - "Output" is used to describe the data sent to the domain. > + > +Offset Length Description > +----------------------------------------------------------------- > +0 1024 Input data > +1024 1024 Output data > +2048 4 Input consumer offset > +2052 4 Input producer offset > +2056 4 Output consumer offset > +2060 4 Output producer offset > +2064 4 Server feature bitmap > +2068 4 Connection state > + > +The Input data and Output data are circular buffers. Each buffer is > +associated with a pair of free-running offsets labelled "consumer" and > +"producer". > + > +A "producer" offset is the offset in the byte stream of the next byte > +to be written modulo 2^32. A "consumer" offset is the offset in the byte > +stream of the next byte to be read modulo 2^32. Implementations must > +take care to handle wraparound properly when performing arithmetic with > +these values. > + > +The byte at offset 'x' in the byte stream will be stored at offset > +'x modulo 1024' in the circular buffer. > + > +Implementations may only overwrite previously-written data if it has > +been marked as 'consumed' by the relevant consumer pointer. > + > +When the guest domain is created, there is no outstanding Input or Output > +data. However > + > + - guests must not assume that producer or consumer pointers start > + at zero; and > + - guests must not assume that unused bytes in either the Input or > + Output data buffers has any particular value. > + > +A xenstore ring is always associated with an event channel. Whenever the > +ring structure is updated the event channel must be signalled. The > +guest and server are free to inspect the contents of the ring at any > +time, not only in response to an event channel event. This implies that > +updates must be ordered carefully to ensure consistency. > + > +The xenstore server may decide to advertise some features via the > +"Server feature bitmap". The server can start advertising features > +at any time by setting bits but it will never stop advertising features > +i.e. bits will never be cleared. The guest is not permitted to write to > +the server feature bitmap. The server features are offered to the guest; > +it is up to the guest whether to use them or not. The guest should ignore > +any unknown feature bits. > + > +The following features are defined: > + > +Mask Description > +----------------------------------------------------------------- > +1 Ring reconnection (see the ring reconnection feature below) > + > +The "Connection state" field is used to request a ring close and reconnect. > +The "Connection state" field only contains valid data if the server has > +advertised the ring reconnection feature. If the feature has been advertised > +then the "Connection state" may take the following values: > + > +Value Description > +----------------------------------------------------------------- > +0 Ring is connected > +1 Ring close and reconnect is in progress (see the "ring > + reconnection feature" described below) > + > +The ring reconnection feature > +============================= > + > +The ring reconnection feature allows the guest to ask the server to > +reset the ring to a valid initial state i.e. where the Input and Output > +queues contain no data. The feature operates at the ring-level only; > +it does not affect the higher-level protocol state machine at all. > +To reset theh higher-level protocol, please read about the 'RESET_WATCHES' > +command in the [xenstore protocol](xenstore.txt) specification. > + > +The ring reconnection feature is only available if the 'Ring reconnection' > +feature bit has been set by the server in the "Server feature bitmap". > + > +Assuming the server has advertised the feature, the guest can initiate > +a reconnection by setting the the Connection state to 1 ("Ring close > +and reconnect is in progress") and signalling the event channel. > +The guest must now ignore all fields except the Connection state and > +wait for it to be set to 0 ("Ring is connected") > + > +The server will guarantee to > + > + - drop any partially read or written higher-level > + [xenstore protocol](xenstore.txt) packets it may have; > + - empty the Input and Output queues in the xenstore ring; > + - set the Connection state to 0 ("Ring is connected"); and > + - signal the event channel. > + > +From the point of view of the guest, the connection has been reset on a > +packet boundary. > + > +Note that only the guest may set the Connection state to 1 and only the > +server may set it back to 0. > diff --git a/tools/firmware/hvmloader/xenbus.c > b/tools/firmware/hvmloader/xenbus.c > index 64c2176..f900a1e 100644 > --- a/tools/firmware/hvmloader/xenbus.c > +++ b/tools/firmware/hvmloader/xenbus.c > @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared > ring with dom0 */ > static evtchn_port_t event; /* Event-channel to dom0 */ > static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area */ > > +static void ring_wait(void) > +{ > + struct shared_info *shinfo = get_shared_info(); > + struct sched_poll poll; > + > + memset(&poll, 0, sizeof(poll)); > + set_xen_guest_handle(poll.ports, &event); > + poll.nr_ports = 1; > + > + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) ) > + hypercall_sched_op(SCHEDOP_poll, &poll); > +} > + > /* Connect our xenbus client to the backend. > * Call once, before any other xenbus actions. */ > void xenbus_setup(void) > @@ -61,14 +74,26 @@ void xenbus_setup(void) > void xenbus_shutdown(void) > { > struct shared_info *shinfo = get_shared_info(); > + evtchn_send_t send; > > ASSERT(rings != NULL); > > - /* We zero out the whole ring -- the backend can handle this, and it's > - * not going to surprise any frontends since it's equivalent to never > - * having used the rings. */ > - memset(rings, 0, sizeof *rings); > - > + if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) { > + rings->connection = XENSTORE_RECONNECT; > + send.port = event; > + hypercall_event_channel_op(EVTCHNOP_send, &send); > + while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT) > + ring_wait (); > + } else { > + /* If the backend reads the state while we're erasing it then the > + * ring state will become corrupted, preventing guest frontends from > + * connecting. This is rare. To help diagnose the failure, we fill > + * the ring with XS_INVALID packets. */ > + memset(rings->req, 0xff, XENSTORE_RING_SIZE); > + memset(rings->rsp, 0xff, XENSTORE_RING_SIZE); > + rings->req_cons = rings->req_prod = 0; > + rings->rsp_cons = rings->rsp_prod = 0; > + } > /* Clear the event-channel state too. */ > memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info)); > memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending)); > @@ -77,19 +102,6 @@ void xenbus_shutdown(void) > rings = NULL; > } > > -static void ring_wait(void) > -{ > - struct shared_info *shinfo = get_shared_info(); > - struct sched_poll poll; > - > - memset(&poll, 0, sizeof(poll)); > - set_xen_guest_handle(poll.ports, &event); > - poll.nr_ports = 1; > - > - while ( !test_and_clear_bit(event, shinfo->evtchn_pending) ) > - hypercall_sched_op(SCHEDOP_poll, &poll); > -} > - > /* Helper functions: copy data in and out of the ring */ > static void ring_write(const char *data, uint32_t len) > { > diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml > index 29d354d..6332433 100644 > --- a/tools/ocaml/libs/xb/xb.ml > +++ b/tools/ocaml/libs/xb/xb.ml > @@ -84,7 +84,26 @@ let write con s len = > | Fd backfd -> write_fd backfd con s len > | Xenmmap backmmap -> write_mmap backmmap con s len > > -let output con = > +(* If a function throws Xs_ring.Reconnect, then clear the ring state > + and serve the ring again. *) > +let rec handle_reconnect f con = > + match (try Some (f con) with Xs_ring.Reconnect -> None) with > + | Some x -> x > + | None -> > + begin match con.backend with > + | Fd _ -> raise Xs_ring.Reconnect (* should never happen, but > just in case *) > + | Xenmmap backend -> > + Xs_ring.close backend.mmap; > + backend.eventchn_notify (); > + (* Clear our old connection state *) > + Queue.clear con.pkt_in; > + Queue.clear con.pkt_out; > + con.partial_in <- init_partial_in (); > + con.partial_out <- ""; > + handle_reconnect f con > + end > + > +let output = handle_reconnect (fun con -> > (* get the output string from a string_of(packet) or partial_out *) > let s = if String.length con.partial_out > 0 then > con.partial_out > @@ -101,8 +120,9 @@ let output con = > ); > (* after sending one packet, partial is empty *) > con.partial_out = "" > +) > > -let input con = > +let input = handle_reconnect (fun con -> > let newpacket = ref false in > let to_read = > match con.partial_in with > @@ -133,6 +153,7 @@ let input con = > HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf) > ); > !newpacket > +) > > let newcon backend = { > backend = backend; > @@ -145,6 +166,8 @@ let newcon backend = { > let open_fd fd = newcon (Fd { fd = fd; }) > > let open_mmap mmap notifyfct = > + (* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *) > + Xs_ring.set_server_features mmap 1; Minor nit here: the hard-coded 1 looks bad. > newcon (Xenmmap { > mmap = mmap; > eventchn_notify = notifyfct; > diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli > index 58234ae..d47d869 100644 > --- a/tools/ocaml/libs/xb/xb.mli > +++ b/tools/ocaml/libs/xb/xb.mli > @@ -80,6 +80,7 @@ val read : t -> string -> int -> int > val write_fd : backend_fd -> 'a -> string -> int -> int > val write_mmap : backend_mmap -> 'a -> string -> int -> int > val write : t -> string -> int -> int > +val handle_reconnect : (t -> 'a) -> t -> 'a > val output : t -> bool > val input : t -> bool > val newcon : backend -> t > diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml > index 9469609..2c4eeab 100644 > --- a/tools/ocaml/libs/xb/xs_ring.ml > +++ b/tools/ocaml/libs/xb/xs_ring.ml > @@ -14,5 +14,15 @@ > * GNU Lesser General Public License for more details. > *) > > +exception Reconnect > + > +let _ = > + Callback.register_exception "Xs_ring.Reconnect" Reconnect > + > external read: Xenmmap.mmap_interface -> string -> int -> int = > "ml_interface_read" > external write: Xenmmap.mmap_interface -> string -> int -> int = > "ml_interface_write" > + > +external set_server_features: Xenmmap.mmap_interface -> int -> unit = > "ml_interface_set_server_features" "noalloc" > +external get_server_features: Xenmmap.mmap_interface -> int = > "ml_interface_get_server_features" "noalloc" > + > +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" > "noalloc" > diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c > b/tools/ocaml/libs/xb/xs_ring_stubs.c > index 8bd1047..c728a01 100644 > --- a/tools/ocaml/libs/xb/xs_ring_stubs.c > +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c > @@ -35,22 +35,39 @@ > > #define GET_C_STRUCT(a) ((struct mmap_interface *) a) > > -static int xs_ring_read(struct mmap_interface *interface, > - char *buffer, int len) > +CAMLprim value ml_interface_read(value ml_interface, > + value ml_buffer, > + value ml_len) > { > + CAMLparam3(ml_interface, ml_buffer, ml_len); > + CAMLlocal1(ml_result); > + > + struct mmap_interface *interface = GET_C_STRUCT(ml_interface); > + char *buffer = String_val(ml_buffer); > + int len = Int_val(ml_len); > + int result; > + > struct xenstore_domain_interface *intf = interface->addr; > XENSTORE_RING_IDX cons, prod; /* offsets only */ > int to_read; > + uint32_t connection; > > cons = *(volatile uint32*)&intf->req_cons; > prod = *(volatile uint32*)&intf->req_prod; > + connection = *(volatile uint32*)&intf->connection; > + > + if (connection != XENSTORE_CONNECTED) > + caml_raise_constant(*caml_named_value("Xs_ring.Reconnect")); > + > xen_mb(); > > if ((prod - cons) > XENSTORE_RING_SIZE) > - return -1; > + caml_failwith("bad connection"); > > - if (prod == cons) > - return 0; > + if (prod == cons) { > + result = 0; > + goto exit; > + } > cons = MASK_XENSTORE_IDX(cons); > prod = MASK_XENSTORE_IDX(prod); > if (prod > cons) > @@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface, > memcpy(buffer, intf->req + cons, len); > xen_mb(); > intf->req_cons += len; > - return len; > + result = len; > +exit: > + ml_result = Val_int(result); > + CAMLreturn(ml_result); > } > > -static int xs_ring_write(struct mmap_interface *interface, > - char *buffer, int len) > +CAMLprim value ml_interface_write(value ml_interface, > + value ml_buffer, > + value ml_len) > { > + CAMLparam3(ml_interface, ml_buffer, ml_len); > + CAMLlocal1(ml_result); > + > + struct mmap_interface *interface = GET_C_STRUCT(ml_interface); > + char *buffer = String_val(ml_buffer); > + int len = Int_val(ml_len); > + int result; > + > struct xenstore_domain_interface *intf = interface->addr; > XENSTORE_RING_IDX cons, prod; > int can_write; > + uint32_t connection; > > cons = *(volatile uint32*)&intf->rsp_cons; > prod = *(volatile uint32*)&intf->rsp_prod; > + connection = *(volatile uint32*)&intf->connection; > + > + if (connection != XENSTORE_CONNECTED) > + caml_raise_constant(*caml_named_value("Xs_ring.Reconnect")); > + > xen_mb(); > - if ( (prod - cons) >= XENSTORE_RING_SIZE ) > - return 0; > + if ( (prod - cons) >= XENSTORE_RING_SIZE ) { > + result = 0; > + goto exit; > + } > if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons)) > can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod); > else > @@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface > *interface, > memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len); > xen_mb(); > intf->rsp_prod += len; > - return len; > + result = len; > +exit: > + ml_result = Val_int(result); > + CAMLreturn(ml_result); > } > > -CAMLprim value ml_interface_read(value interface, value buffer, value len) > +CAMLprim value ml_interface_set_server_features(value interface, value v) > { > - CAMLparam3(interface, buffer, len); > - CAMLlocal1(result); > - int res; > + CAMLparam2(interface, v); > + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr; > > - res = xs_ring_read(GET_C_STRUCT(interface), > - String_val(buffer), Int_val(len)); > - if (res == -1) > - caml_failwith("bad connection"); > - result = Val_int(res); > - CAMLreturn(result); > + intf->server_features = Int_val(v); > + > + CAMLreturn(Val_unit); > +} > + > +CAMLprim value ml_interface_get_server_features(value interface) > +{ > + CAMLparam1(interface); > + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr; > + > + CAMLreturn(Val_int (intf->server_features)); > } > > -CAMLprim value ml_interface_write(value interface, value buffer, value len) > +CAMLprim value ml_interface_close(value interface) > { > - CAMLparam3(interface, buffer, len); > - CAMLlocal1(result); > - int res; > - > - res = xs_ring_write(GET_C_STRUCT(interface), > - String_val(buffer), Int_val(len)); > - result = Val_int(res); > - CAMLreturn(result); > + CAMLparam1(interface); > + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr; > + int i; > + > + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0; > + /* Ensure the unused space is full of invalid xenstore packets. */ > + for (i = 0; i < XENSTORE_RING_SIZE; i++) { > + intf->req[i] = 0xff; /* XS_INVALID = 0xffff */ > + intf->rsp[i] = 0xff; > + } > + xen_mb (); > + intf->connection = XENSTORE_CONNECTED; > + CAMLreturn(Val_unit); > } > diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h > index 585f0c8..0a0cdbc 100644 > --- a/xen/include/public/io/xs_wire.h > +++ b/xen/include/public/io/xs_wire.h > @@ -49,7 +49,9 @@ enum xsd_sockmsg_type > XS_RESUME, > XS_SET_TARGET, > XS_RESTRICT, > - XS_RESET_WATCHES > + XS_RESET_WATCHES, > + > + XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */ > }; > > #define XS_WRITE_NONE "NONE" > @@ -116,6 +118,8 @@ struct xenstore_domain_interface { > char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */ > XENSTORE_RING_IDX req_cons, req_prod; > XENSTORE_RING_IDX rsp_cons, rsp_prod; > + uint32_t server_features; /* Bitmap of features supported by the server > */ > + uint32_t connection; > }; > > /* Violating this is very bad. See docs/misc/xenstore.txt. */ > @@ -125,6 +129,13 @@ struct xenstore_domain_interface { > #define XENSTORE_ABS_PATH_MAX 3072 > #define XENSTORE_REL_PATH_MAX 2048 > > +/* The ability to reconnect a ring */ > +#define XENSTORE_SERVER_FEATURE_RECONNECTION 1 > + > +/* Valid values for the connection field */ > +#define XENSTORE_CONNECTED 0 /* the steady-state */ > +#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */ > + > #endif /* _XS_WIRE_H */ > > /* Other than that, it looks good to me. Reviewed-by: Jon Ludlam <jonathan.ludlam@xxxxxxxxxx> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |