[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-API] the xapi storage interface

  • To: "xen-api@xxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxx>
  • From: Dave Scott <Dave.Scott@xxxxxxxxxxxxx>
  • Date: Fri, 15 Jun 2012 16:31:06 +0100
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • Delivery-date: Fri, 15 Jun 2012 15:32:19 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>
  • Thread-index: Ac1LC+Gxe16tiMILQ2qf4vonjxuVHg==
  • Thread-topic: the xapi storage interface

Over the last few months the storage interface has become a lot bigger,
with support for disk mirroring, cross-SR disk copies, task cancellation
and event notification.

It's become clear that the interface isn't very well-factored. Clearly
a driver domain will only want to implement some basic subset of the
functionality, and have other parts layered on top.

The code in xapi in particular is quite confusing, with several badly
named modules and lots of 'undefined' functions (let foo _ = assert false)

I propose that we split the API into the following chunks ( = modules):

core: basic stuff that all storage implementations must support
  Query.query: return version and capability information
  Query.diagnostics: stuff to be recorded in a bug report
  SR.create: create/format a new SR
  SR.destroy: destroy/erase an existing SR
  SR.attach: establish a connection with the storage substrate
  SR.detach: shut down the connection with the storage substrate
  SR.scan: return a list of all the VDIs in an SR
  VDI.create: make an empty disk
  VDI.snapshot: make a read/only snapshot of a disk
  VDI.clone: make a read/write fast clone of a disk
  VDI.destroy: destroy a disk (or clone or snapshot)
  VDI.stat: query the current state of a disk
  VDI.set_persistent: set whether updates are preserved over VM reboot
  VDI.set_content_id: declare a disk to have particular contents (for 
  VDI.epoch_begin: declare that we are about to hotplug a disk or plug into a 
rebooted/starting VM
  VDI.epoch_end: declare that we are about to hotunplug a disk or unplug from a 
rebooting/shutting down VM
  VDI.attach: connect to a particular disk (all I/O is paused)
  VDI.activate: allow I/O to reach an attached disk
  VDI.deactivate: flush outstanding I/O to an activated disk and leave it 
  VDI.detach: disconnect from a particular disk

host: manage the state of SRs, VDIs in a 'Core' implementation
  -- this includes all the "reference counting" and connection-tracking
  -- currently done by xapi
  SR.reset: declare that a 'Core' implementation has restarted
  SR.list: list the currently-attached SRs
  DP.get_attach_info: return the original result of VDI.attach
  DP.stat_vdi: return the state of this VDI from the PoV of one user

disk_differencing: operations that make sense on differencing disks (eg vhd)
  VDI.similar_content: make a list of disks with similar data blocks
  VDI.compose: layer one disk on top of another (or "reparent" a vhd)

load_balancing: decides which 'Core' implementation shall be used for a given 
user (eg VM)
  -- this will allow multiple driver domains per-host per-SR
  lookup: return the address of the 'Core' implementation to use
  list: return all registered implementations

task_tracking: allows long-running operations to be tracked
  Task.cancel: cancel a long-running operation
  Task.destroy: forget about a finished task
  Task.list: list currently-known tasks
  Task.stat: query the state of a particular task
  Updates.get: block waiting for updates, with timeout

data_transfer: data copying and mirroring 
  Data.copy_into: copy a VDI's contents into a remote URL
  Data.copy: copy a VDI's contents into a fresh VDI in a remote SR
  Mirror.start: establish a disk mirror
  Mirror.stop: shutdown an active disk mirror
  Mirror.stat: query the state of a disk mirror
  Mirror.receive_stop: part of the Mirror.stop protocol
  Mirror.receive_finalize: part of the Mirror.stop protocol
  Mirror.receive_cancel: part of the Mirror.stop protocol
  Mirror.list: list active mirrors

I propose that we keep these as separate rpc-light.idl interface definitions
until we get round to deploying a new interface generator (later). We could
have a directory structure like:


I think we can tidy up some of our existing code by:

 * rename Storage_impl.Wrapper to something like "Disk state tracker"
 * make Storage_access.SMAPIv1 its own first-class module
 * remove the Storage_mux
 * binding a separate unix domain socket (for easier testing) for each backend
   implementation which would be implemented by a functor application:

   Server(Disk state tracker(SMAPIv1))
      -- for all the legacy implementations
   Server(Disk state tracker(Proxy))
      -- for the SMAPIv2 implementations

 * using a convention where multiple rpc-light.idl interfaces can
   be handled from the same HTTP server, using the URI to disambiguate
         HTTP POST /core Query.diagnostics
           could be different from
         HTTP POST /data_transfer Query.diagnostics
   This would allow backends to implement multiple interfaces and allows
   us to keep the namespaces separate.

 * when xenopsd wants to attach a disk, it will first look up the address
   of the backend implementation to use, which will be either an internal
   IP or a unix domain socket (for now, we can use a leaner interdomain
   transport later). It will then forward the request to the correct

Comments welcome!


Xen-api mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.