During some discussions and handwaving, including discussions with
some experts on the Xenserver/XCP storage architecture, we came up
with what we think might be a plausible proposal for an architecture
for communication between toolstack and driver domain, for storage at

I offered to write it up.  The abstract proposal is as I understand
the consensus from our conversation.  The concrete protocol is my own

Please comments.  After a round of review here we should consider
whether some of the assumptions need review from the communities
involved in "other" backends (particularly, the BSDs).

(FAOD the implementation of something like this is not 4.3 material,
but it may inform some API decisions etc. we take in 4.2.)




    Might be the toolstack domain, or an (intended) guest vm.

 driver domain
    Responsible for providing the disk service to guests.
    Consists, internally, of (at least):
       control plane
    but we avoid exposing this internal implementation detail.

    We permit different driver domains on a single host, serving
    different guests or the same guests.  

    The toolstack is expected to know the domid of the driver domain.

 driver domain kind
    We permit different "kinds" of driver domain, perhaps implemented
    by completely different code, which support different facilities.

    Each driver domain kind needs to document what targets (see
    below) are valid and how they are specified, and what preparatory
    steps may need to be taken eg at system boot.

    Driver domain kinds do not have a formal presence in the API.


     A kind of name.

     Combination of a physical location and data format plus all other
     information needed by the underlying mechanisms, or relating to
     the data format, needed to access it.

     These names are assigned by the driver domain kind; the names may
     be an open class; no facility provided via this API to enumerate

     Syntactically, these are key/value pairs, mapping short string
     keys to shortish string values, suitable for storage in a
     xenstore directory.

     This host's intent to access a specific target.
     Non-persistent, created on request by toolstack, enumerable.
     Possible states: inactive/active.
     Abstract operations: prepare, activate, deactivate, unprepare.

     (We call the "create" operation for this object "prepare" to
     avoid confusion with other kinds of "create".)

     The toolstack promises that no two vdis for the same target
     will simultaneously be active, even if the two vdis are on
     different hosts.

     Provision of a facility for a guest to access a particular target
     via a particular vdi.  There may be zero or more of these at any
     point for a particular vdi.

     Non-persistent, created on request by toolstack, enumerable.
     Abstract operations: plug, unplug.

     (We call the "create" operation for this object "plug" to avoid
     confusion with other kinds of "create".)

     vbds may be created/destroyed, and the underlying vdi
     activated/deactivated, in any other.  However IO is only possible
     to a vbd when the corresponding vdi is active.  The reason for
     requiring activation as a separate step is to allow as much of
     the setup for an incoming migration domain's storage to be done
     before committing to the migration and entering the "domain is
     down" stage, during which access is switched from the old to the
     new host.

     We will consider here the case of a vbd which provides
     service as a Xen vbd backend.  Other cases (eg, the driver domain
     is the same as the toolstack domain and the vbd provides a block
     device in the toolstack domain) can be regarded as

Concrete protocol

 The toolstack gives instructions to the driver domain, and receives
 results, via xenstore, in the path:
 Both driver domain and toolstack have write access to the whole of
 this area.

 Each vdi which has been requested and/or exists, corresponds to a
 path .../backendctrl/vdi/<vdi> where <vdi> is a string (of
 alphanumerics, hyphens and underscores) chosen by the toolstack.
 Inside this, there are the following nodes:

   state       The current state.  Values are "inactive", "active",
               or ENOENT meaning the vdi does not exist.
               Set by the driver domain in response to requests.

   request     Operation requested by the toolstack and currently
               being performed.  Created by the toolstack, but may
               then not be modified by the toolstack.  Deleted
               by the driver domain when the operation has completed.

               The values of "request" are:
                 plug <vbd>
                 unplug <vbd>
               <vbd> is an id chosen by the toolstack like <vdi>

   result      errno value (in decimal, Xen error number) best
               describing the results of the most recently completed
               operation; 0 means success.  Created or set by the
               driver domain in the same transaction as it deletes
               request.  The toolstack may delete this.

   result_msg  Optional UTF-8 string explaining any error; does not
               exist when result is "0".  Created or deleted by the
               driver domain whenever the driver domain sets result.
               The toolstack may delete this.

   t/*         The target name.  Must be written by the toolstack.
               But may not be removed or changed while either of
               state or request exist.

               The state of a vbd, "ok" or ENOENT.
               Set or deleted by the driver domain in response to

               The frontend path (complete path in xenstore) which the
               xen vbd should be servicing.  Set by the toolstack
               with the plug request and not modified until after
               completion of unplug.

               The backend path (complete path in xenstore) which the
               driver domain has chosen for the vbd.  Set by the
               driver domain in response to a plug request.

               The driver domain may request, in response to plug,
               that the toolstack copy these values to the specified
               backend directory, in the same transaction as it
               creates the frontend.  Set by the driver domain in
               response to a plug request; may be deleted by the
               toolstack.  DEPRECATED, see below.

The operations:

        Creates a vdi from a target.
            state ENOENT
            request ENOENT
        Request (xenstore writes by toolstack):
            request = "prepare"
            t/* as appropriate
        Results on success (xenstore writes by driver domain):
            request ENOENT    } applies to success from all operations,
            result = "0"      }  will not be restated below
            state = "inactive"
        Results on error (applies to all operations):  }
            request ENOENT                             }  applies
            result = some decimal integer errno value  }   to all
            result_msg = ENOENT or a string            }    failures

            state = "inactive"
            request ENOENT
            request = "activate"
        Results on success:
            state = "active"

            state = "active"
            request ENOENT
            request = "deactivate"
        Results on success:
            state = "inactive"

            state != ENOENT
            request ENOENT
            request = "unprepare"
        Results on success:
            state = ENOENT

 removal, modification, etc. of an unprepared vdi:
            state ENOENT
            request ENOENT
            any changes to <vdi> directory which do
             not create "state" or "request"
            ignored - no response from driver domain

 plug <vbd>
            state ENOENT
            request ENOENT
            vbd/<vbd>/state ENOENT
            <frontend> ENOENT
            request = "plug <vbd>"
            vbd/<vbd>/frontend = <frontend> ("/local/domain/<guest>/...")
        Results on success:
            vbd/<vbd>/state = "ok"
            vbd/<vbd>/backend = <rel-backend>
                (<rel-backend> is the backend path relative to the
                 driver domain's home directory in xenstore)
            vbd/<vbd>/b-copy/*  may be created    } at least one of these
            <backend>/*  may come into existence  }  must happen
        Next step (xenstore write) by toolstack:
            <frontend>  created and populated, specifically
            <frontend>/backend = <backend>
            <backend>    created if necessary
            <backend>/*  copied from  vbd/<vbd>/b-copy/*  if any
            <backend>/frontend = <frontend>  unless already set

 unplug <vbd>
            state ENOENT
            request ENOENT
            vbd/<vbd>/state "ok"
            request = "unplug <vbd>"
            <frontend> ENOENT
        Results on success:
            vbd/<vbd>/state ENOENT
            <backend> ENOENT

 The toolstack and driver domains should not store state of their own,
 not required for these communication purposes, in the backendctrl/
 directory in xenstore.  If the driver domain wishes to make records
 for its own use in xenstore, it should do so in a different directory
 of its choice (eg, /local/domain/<driverdomid>/private/<something>.

Notes regarding driver domains whose block backend implementation is
controlled from the actual xenstore backend directory:

 The b-copy/* feature exists for compatibility with some of these.  If
 such a backend cannot cope with the backend directory coming into
 existence before the corresponding frontend directory, then it is
 necessary to create and populate the backend in the same xenstore
 transaction as the creation of the frontend.  However, such backends
 should be fixed; the b-copy/* feature is deprecated and will be
 withdrawn at some point.

 Note that a vbd may be created with the vdi inactive.  In this case
 the frontend and backend directories will exist, but the information
 needed to start up the backend properly may be lacking until the vdi
 is activated.  For example, if the existence of a suitable block
 device in the driver domain depends on vdi activation, the block
 device id cannot be made known to the backend until after the backend
 directory has already been created and perhaps has existed for some
 time.  It is believed that existing backends cope with this, because
 they use a "hotplug script" approach - where the backend directory is
 created without specifying the device node, and this backend directory
 creation causes the invocation of machinery which establishes the
 device node, which is subsequently written to xenstore.


 What about network interfaces and other kinds of backend ?

