[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal



Hvmloader uses the xenstore ring and then tries to reset it back
to its initial state before booting the guest. Occasionally xenstored
will read the ring while it is being zeroed and conclude it has
been corrupted. This prevents PV drivers from loading in the guest.

This patch updates the xenstore ring protocol definition, enabling
a server to advertise additional features to the guest. One such feature
is defined: the ability to cleanly reset the ring (but not the
higher-level protocol, for which we already have RESET_WATCHES).

This patch implements the ring reconnection features in oxenstored
and hvmloader, fixing the bug.

This patch also defines an 'invalid' xenstore packet type and uses this
to poison the ring over a reconnect. This will make diagnosing this
bug much easier in future.

Signed-off-by: David Scott <dave.scott@xxxxxxxxxx>

---

Updates since v3
* hvmloader: signal the event channel after requesting a reconnect
  (thanks to Zheng Li for diagnosing this)
* switch to using feature flags/bits instead of version numbers
* rename the 'closing' field to 'connection state'
* define an invalid packet type (0xffff, since 0x0 was already taken)
* on ring connection/reset, fill the input and output buffers with
  the invalid packet
* ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
  remove #define ERROR_FOO
* doc: write from the guest's point of view, weaken guarantees made by
  server (e.g. producer = consumer != 0 is valid after reconnect)
* doc: relate reconnection to the RESET_WATCHES command in the higher
  level protocol
* doc: be more specific about who must write what, when

Updates since v2
* hvmloader: use volatile for read of closing flag
* style improvements
* remove xenstore version #defines

Updates since v1
* remove unused client version and associated dead code
* treat 'closing' as a flag by using "!=0" rather than "==1"
* hvmloader: move function up and remove forward decl
* document the existing xenstore ring and the extention in misc/
---
 docs/misc/xenstore-ring.txt         |  112 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/xenbus.c   |   48 +++++++++------
 tools/ocaml/libs/xb/xb.ml           |   27 ++++++++-
 tools/ocaml/libs/xb/xb.mli          |    1 +
 tools/ocaml/libs/xb/xs_ring.ml      |   10 ++++
 tools/ocaml/libs/xb/xs_ring_stubs.c |  109 ++++++++++++++++++++++++----------
 xen/include/public/io/xs_wire.h     |   13 +++-
 7 files changed, 269 insertions(+), 51 deletions(-)
 create mode 100644 docs/misc/xenstore-ring.txt

diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
new file mode 100644
index 0000000..6d18556
--- /dev/null
+++ b/docs/misc/xenstore-ring.txt
@@ -0,0 +1,112 @@
+The xenstore ring is a datastructure stored within a single 4KiB page
+shared between the xenstore server and the guest. The ring contains
+two queues of bytes -- one in each direction -- and some signalling
+information. The [xenstore protocol](xenstore.txt) is layered on top of
+the byte streams.
+
+The xenstore ring datastructure
+===============================
+
+The following table describes the ring structure where
+  - offsets and lengths are in bytes;
+  - "Input" is used to describe the data sent to the server; and
+  - "Output" is used to describe the data sent to the domain.
+
+Offset  Length  Description
+-----------------------------------------------------------------
+0       1024    Input data
+1024    1024    Output data
+2048    4       Input consumer offset
+2052    4       Input producer offset
+2056    4       Output consumer offset
+2060    4       Output producer offset
+2064    4       Server feature bitmap
+2068    4       Connection state
+
+The Input data and Output data are circular buffers. Each buffer is
+associated with a pair of free-running offsets labelled "consumer" and
+"producer".
+
+A "producer" offset is the offset in the byte stream of the next byte
+to be written modulo 2^32. A "consumer" offset is the offset in the byte
+stream of the next byte to be read modulo 2^32. Implementations must
+take care to handle wraparound properly when performing arithmetic with
+these values.
+
+The byte at offset 'x' in the byte stream will be stored at offset
+'x modulo 1024' in the circular buffer.
+
+Implementations may only overwrite previously-written data if it has
+been marked as 'consumed' by the relevant consumer pointer.
+
+When the guest domain is created, there is no outstanding Input or Output
+data. However
+
+  - guests must not assume that producer or consumer pointers start
+    at zero; and
+  - guests must not assume that unused bytes in either the Input or
+    Output data buffers has any particular value.
+
+A xenstore ring is always associated with an event channel. Whenever the
+ring structure is updated the event channel must be signalled. The
+guest and server are free to inspect the contents of the ring at any
+time, not only in response to an event channel event. This implies that
+updates must be ordered carefully to ensure consistency.
+
+The xenstore server may decide to advertise some features via the
+"Server feature bitmap". The server can start advertising features
+at any time by setting bits but it will never stop advertising features
+i.e. bits will never be cleared. The guest is not permitted to write to
+the server feature bitmap. The server features are offered to the guest;
+it is up to the guest whether to use them or not. The guest should ignore
+any unknown feature bits.
+
+The following features are defined:
+
+Mask    Description
+-----------------------------------------------------------------
+1       Ring reconnection (see the ring reconnection feature below)
+
+The "Connection state" field is used to request a ring close and reconnect.
+The "Connection state" field only contains valid data if the server has
+advertised the ring reconnection feature. If the feature has been advertised
+then the "Connection state" may take the following values:
+
+Value   Description
+-----------------------------------------------------------------
+0       Ring is connected
+1       Ring close and reconnect is in progress (see the "ring
+        reconnection feature" described below)
+
+The ring reconnection feature
+=============================
+
+The ring reconnection feature allows the guest to ask the server to
+reset the ring to a valid initial state i.e. where the Input and Output
+queues contain no data. The feature operates at the ring-level only;
+it does not affect the higher-level protocol state machine at all.
+To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
+command in the [xenstore protocol](xenstore.txt) specification.
+
+The ring reconnection feature is only available if the 'Ring reconnection'
+feature bit has been set by the server in the "Server feature bitmap".
+
+Assuming the server has advertised the feature, the guest can initiate
+a reconnection by setting the the Connection state to 1 ("Ring close
+and reconnect is in progress") and signalling the event channel.
+The guest must now ignore all fields except the Connection state and
+wait for it to be set to 0 ("Ring is connected")
+
+The server will guarantee to
+
+  - drop any partially read or written higher-level
+    [xenstore protocol](xenstore.txt) packets it may have;
+  - empty the Input and Output queues in the xenstore ring;
+  - set the Connection state to 0 ("Ring is connected"); and
+  - signal the event channel.
+
+From the point of view of the guest, the connection has been reset on a
+packet boundary.
+
+Note that only the guest may set the Connection state to 1 and only the
+server may set it back to 0.
diff --git a/tools/firmware/hvmloader/xenbus.c 
b/tools/firmware/hvmloader/xenbus.c
index 64c2176..f900a1e 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared 
ring with dom0 */
 static evtchn_port_t event;                     /* Event-channel to dom0 */
 static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
 
+static void ring_wait(void)
+{
+    struct shared_info *shinfo = get_shared_info();
+    struct sched_poll poll;
+
+    memset(&poll, 0, sizeof(poll));
+    set_xen_guest_handle(poll.ports, &event);
+    poll.nr_ports = 1;
+
+    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
+        hypercall_sched_op(SCHEDOP_poll, &poll);
+}
+
 /* Connect our xenbus client to the backend.
  * Call once, before any other xenbus actions. */
 void xenbus_setup(void)
@@ -61,14 +74,26 @@ void xenbus_setup(void)
 void xenbus_shutdown(void)
 {
     struct shared_info *shinfo = get_shared_info();
+    evtchn_send_t send;
 
     ASSERT(rings != NULL);
 
-    /* We zero out the whole ring -- the backend can handle this, and it's 
-     * not going to surprise any frontends since it's equivalent to never 
-     * having used the rings. */
-    memset(rings, 0, sizeof *rings);
-
+    if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
+        rings->connection = XENSTORE_RECONNECT;
+        send.port = event;
+        hypercall_event_channel_op(EVTCHNOP_send, &send);
+        while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
+            ring_wait ();
+    } else {
+        /* If the backend reads the state while we're erasing it then the
+         * ring state will become corrupted, preventing guest frontends from
+         * connecting. This is rare. To help diagnose the failure, we fill
+         * the ring with XS_INVALID packets. */
+        memset(rings->req, 0xff, XENSTORE_RING_SIZE);
+        memset(rings->rsp, 0xff, XENSTORE_RING_SIZE);
+        rings->req_cons = rings->req_prod = 0;
+        rings->rsp_cons = rings->rsp_prod = 0;
+    }
     /* Clear the event-channel state too. */
     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
     memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending));
@@ -77,19 +102,6 @@ void xenbus_shutdown(void)
     rings = NULL;
 }
 
-static void ring_wait(void)
-{
-    struct shared_info *shinfo = get_shared_info();
-    struct sched_poll poll;
-
-    memset(&poll, 0, sizeof(poll));
-    set_xen_guest_handle(poll.ports, &event);
-    poll.nr_ports = 1;
-
-    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
-        hypercall_sched_op(SCHEDOP_poll, &poll);
-}
-
 /* Helper functions: copy data in and out of the ring */
 static void ring_write(const char *data, uint32_t len)
 {
diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 29d354d..6332433 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -84,7 +84,26 @@ let write con s len =
        | Fd backfd     -> write_fd backfd con s len
        | Xenmmap backmmap -> write_mmap backmmap con s len
 
-let output con =
+(* If a function throws Xs_ring.Reconnect, then clear the ring state
+   and serve the ring again. *)
+let rec handle_reconnect f con =
+       match (try Some (f con) with Xs_ring.Reconnect -> None) with
+       | Some x -> x
+       | None ->
+               begin match con.backend with
+               | Fd _ -> raise Xs_ring.Reconnect (* should never happen, but 
just in case *)
+               | Xenmmap backend ->
+                       Xs_ring.close backend.mmap;
+                       backend.eventchn_notify ();
+                       (* Clear our old connection state *)
+                       Queue.clear con.pkt_in;
+                       Queue.clear con.pkt_out;
+                       con.partial_in <- init_partial_in ();
+                       con.partial_out <- "";
+                       handle_reconnect f con
+               end
+
+let output = handle_reconnect (fun con ->
        (* get the output string from a string_of(packet) or partial_out *)
        let s = if String.length con.partial_out > 0 then
                        con.partial_out
@@ -101,8 +120,9 @@ let output con =
        );
        (* after sending one packet, partial is empty *)
        con.partial_out = ""
+)
 
-let input con =
+let input = handle_reconnect (fun con ->
        let newpacket = ref false in
        let to_read =
                match con.partial_in with
@@ -133,6 +153,7 @@ let input con =
                        HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
        );
        !newpacket
+)
 
 let newcon backend = {
        backend = backend;
@@ -145,6 +166,8 @@ let newcon backend = {
 let open_fd fd = newcon (Fd { fd = fd; })
 
 let open_mmap mmap notifyfct =
+       (* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *)
+       Xs_ring.set_server_features mmap 1;
        newcon (Xenmmap {
                mmap = mmap;
                eventchn_notify = notifyfct;
diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
index 58234ae..d47d869 100644
--- a/tools/ocaml/libs/xb/xb.mli
+++ b/tools/ocaml/libs/xb/xb.mli
@@ -80,6 +80,7 @@ val read : t -> string -> int -> int
 val write_fd : backend_fd -> 'a -> string -> int -> int
 val write_mmap : backend_mmap -> 'a -> string -> int -> int
 val write : t -> string -> int -> int
+val handle_reconnect : (t -> 'a) -> t -> 'a
 val output : t -> bool
 val input : t -> bool
 val newcon : backend -> t
diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
index 9469609..2c4eeab 100644
--- a/tools/ocaml/libs/xb/xs_ring.ml
+++ b/tools/ocaml/libs/xb/xs_ring.ml
@@ -14,5 +14,15 @@
  * GNU Lesser General Public License for more details.
  *)
 
+exception Reconnect
+
+let _ =
+  Callback.register_exception "Xs_ring.Reconnect" Reconnect
+
 external read: Xenmmap.mmap_interface -> string -> int -> int = 
"ml_interface_read"
 external write: Xenmmap.mmap_interface -> string -> int -> int = 
"ml_interface_write"
+
+external set_server_features: Xenmmap.mmap_interface -> int -> unit = 
"ml_interface_set_server_features" "noalloc"
+external get_server_features: Xenmmap.mmap_interface -> int = 
"ml_interface_get_server_features" "noalloc"
+
+external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c 
b/tools/ocaml/libs/xb/xs_ring_stubs.c
index 8bd1047..c728a01 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -35,22 +35,39 @@
 
 #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
 
-static int xs_ring_read(struct mmap_interface *interface,
-                             char *buffer, int len)
+CAMLprim value ml_interface_read(value ml_interface,
+                                 value ml_buffer,
+                                 value ml_len)
 {
+       CAMLparam3(ml_interface, ml_buffer, ml_len);
+       CAMLlocal1(ml_result);
+
+       struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+       char *buffer = String_val(ml_buffer);
+       int len = Int_val(ml_len);
+       int result;
+
        struct xenstore_domain_interface *intf = interface->addr;
        XENSTORE_RING_IDX cons, prod; /* offsets only */
        int to_read;
+       uint32_t connection;
 
        cons = *(volatile uint32*)&intf->req_cons;
        prod = *(volatile uint32*)&intf->req_prod;
+       connection = *(volatile uint32*)&intf->connection;
+
+       if (connection != XENSTORE_CONNECTED)
+               caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
+
        xen_mb();
 
        if ((prod - cons) > XENSTORE_RING_SIZE)
-           return -1;
+               caml_failwith("bad connection");
 
-       if (prod == cons)
-               return 0;
+       if (prod == cons) {
+               result = 0;
+               goto exit;
+       }
        cons = MASK_XENSTORE_IDX(cons);
        prod = MASK_XENSTORE_IDX(prod);
        if (prod > cons)
@@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface,
        memcpy(buffer, intf->req + cons, len);
        xen_mb();
        intf->req_cons += len;
-       return len;
+       result = len;
+exit:
+       ml_result = Val_int(result);
+       CAMLreturn(ml_result);
 }
 
-static int xs_ring_write(struct mmap_interface *interface,
-                              char *buffer, int len)
+CAMLprim value ml_interface_write(value ml_interface,
+                                  value ml_buffer,
+                                  value ml_len)
 {
+       CAMLparam3(ml_interface, ml_buffer, ml_len);
+       CAMLlocal1(ml_result);
+
+       struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+       char *buffer = String_val(ml_buffer);
+       int len = Int_val(ml_len);
+       int result;
+
        struct xenstore_domain_interface *intf = interface->addr;
        XENSTORE_RING_IDX cons, prod;
        int can_write;
+       uint32_t connection;
 
        cons = *(volatile uint32*)&intf->rsp_cons;
        prod = *(volatile uint32*)&intf->rsp_prod;
+       connection = *(volatile uint32*)&intf->connection;
+
+       if (connection != XENSTORE_CONNECTED)
+               caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
+
        xen_mb();
-       if ( (prod - cons) >= XENSTORE_RING_SIZE )
-               return 0;
+       if ( (prod - cons) >= XENSTORE_RING_SIZE ) {
+               result = 0;
+               goto exit;
+       }
        if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
                can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
        else 
@@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface *interface,
        memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
        xen_mb();
        intf->rsp_prod += len;
-       return len;
+       result = len;
+exit:
+       ml_result = Val_int(result);
+       CAMLreturn(ml_result);
 }
 
-CAMLprim value ml_interface_read(value interface, value buffer, value len)
+CAMLprim value ml_interface_set_server_features(value interface, value v)
 {
-       CAMLparam3(interface, buffer, len);
-       CAMLlocal1(result);
-       int res;
+       CAMLparam2(interface, v);
+       struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
 
-       res = xs_ring_read(GET_C_STRUCT(interface),
-                          String_val(buffer), Int_val(len));
-       if (res == -1)
-               caml_failwith("bad connection");
-       result = Val_int(res);
-       CAMLreturn(result);
+       intf->server_features = Int_val(v);
+
+       CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_server_features(value interface)
+{
+       CAMLparam1(interface);
+       struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+       CAMLreturn(Val_int (intf->server_features));
 }
 
-CAMLprim value ml_interface_write(value interface, value buffer, value len)
+CAMLprim value ml_interface_close(value interface)
 {
-       CAMLparam3(interface, buffer, len);
-       CAMLlocal1(result);
-       int res;
-
-       res = xs_ring_write(GET_C_STRUCT(interface),
-                           String_val(buffer), Int_val(len));
-       result = Val_int(res);
-       CAMLreturn(result);
+       CAMLparam1(interface);
+       struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+       int i;
+
+       intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
+       /* Ensure the unused space is full of invalid xenstore packets. */
+       for (i = 0; i < XENSTORE_RING_SIZE; i++) {
+               intf->req[i] = 0xff; /* XS_INVALID = 0xffff */
+               intf->rsp[i] = 0xff;
+       }
+       xen_mb ();
+       intf->connection = XENSTORE_CONNECTED;
+       CAMLreturn(Val_unit);
 }
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 585f0c8..0a0cdbc 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -49,7 +49,9 @@ enum xsd_sockmsg_type
     XS_RESUME,
     XS_SET_TARGET,
     XS_RESTRICT,
-    XS_RESET_WATCHES
+    XS_RESET_WATCHES,
+
+    XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
 };
 
 #define XS_WRITE_NONE "NONE"
@@ -116,6 +118,8 @@ struct xenstore_domain_interface {
     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
     XENSTORE_RING_IDX req_cons, req_prod;
     XENSTORE_RING_IDX rsp_cons, rsp_prod;
+    uint32_t server_features; /* Bitmap of features supported by the server */
+    uint32_t connection;
 };
 
 /* Violating this is very bad.  See docs/misc/xenstore.txt. */
@@ -125,6 +129,13 @@ struct xenstore_domain_interface {
 #define XENSTORE_ABS_PATH_MAX 3072
 #define XENSTORE_REL_PATH_MAX 2048
 
+/* The ability to reconnect a ring */
+#define XENSTORE_SERVER_FEATURE_RECONNECTION 1
+
+/* Valid values for the connection field */
+#define XENSTORE_CONNECTED 0 /* the steady-state */
+#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
+
 #endif /* _XS_WIRE_H */
 
 /*
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.