|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [DRAFT v5] PV Calls protocol design document (former XenSock)
Ping?
On Thu, 4 Aug 2016, Stefano Stabellini wrote:
> Hi all,
>
> This is the design document of the PV Calls protocol. You can find
> prototypes of the Linux frontend and backend drivers here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git pvcalls-5
>
> To use them, make sure to enable CONFIG_PVCALLS in your kernel config
> and add "pvcalls=1" to the command line of your DomU Linux kernel. You
> also need the toolstack to create the initial xenstore nodes for the
> protocol. To do that, please apply the attached patch to libxl (the
> patch is based on Xen 4.7.0-rc3) and add "pvcalls=1" to your DomU config
> file.
>
> Note that previous versions of the protocols were named XenSock. It has
> been renamed for clarity of scope and to avoid confusion with hv_sock
> and vsock, which are used for inter-VMs communications.
>
> Cheers,
>
> Stefano
>
> Changes in v5:
> - clarify text
> - rename id to req_id
> - rename sockid to id
> - move id to request and response specific fields
> - add version node to xenstore
>
> Changes in v4:
> - rename xensock to pvcalls
>
> Changes in v3:
> - add a dummy element to struct xen_xensock_request to make sure the
> size of the struct is the same on both x86_32 and x86_64
>
> Changes in v2:
> - add max-dataring-page-order
> - add "Publish backend features and transport parameters" to backend
> xenbus workflow
> - update new cmd values
> - update xen_xensock_request
> - add backlog parameter to listen and binary layout
> - add description of new data ring format (interface+data)
> - modify connect and accept to reflect new data ring format
> - add link to POSIX docs
> - add error numbers
> - add address format section and relevant numeric definitions
> - add explicit mention of unimplemented commands
> - add protocol node name
> - add xenbus shutdown diagram
> - add socket operation
>
> ---
>
> # PV Calls Protocol version 1
>
> ## Rationale
>
> PV Calls is a paravirtualized protocol that allows the implementation of
> a set of POSIX functions in a different domain. The PV Calls frontend
> sends POSIX function calls to the backend, which implements them and
> returns a value to the frontend.
>
> This version of the document covers networking function calls, such as
> connect, accept, bind, release, listen, poll, recvmsg and sendmsg; but
> the protocol is meant to be easily extended to cover different sets of
> calls. Unimplemented commands return ENOTSUPP.
>
> PV Calls provide the following benefits:
> * full visibility of the guest behavior on the backend domain, allowing
> for inexpensive filtering and manipulation of any guest calls
> * excellent performance
>
> Specifically, PV Calls for networking offer these advantages:
> * guest networking works out of the box with VPNs, wireless networks and
> any other complex configurations on the host
> * guest services listen on ports bound directly to the backend domain IP
> addresses
> * localhost becomes a secure namespace for inter-VMs communications
>
>
> ## Design
>
> ### Xenstore
>
> The frontend and the backend connect to each other exchanging information via
> xenstore. The toolstack creates front and back nodes with state
> XenbusStateInitialising. The protocol node name is **pvcalls**. There can only
> be one PV Calls frontend per domain.
>
> #### Frontend XenBus Nodes
>
> port
> Values: <uint32_t>
>
> The identifier of the Xen event channel used to signal activity
> in the ring buffer.
>
> ring-ref
> Values: <uint32_t>
>
> The Xen grant reference granting permission for the backend to map
> the sole page in a single page sized ring buffer.
>
> #### Backend XenBus Nodes
>
> version
> Values: <uint32_t>
>
> Protocol version supported by the backend.
>
> max-dataring-page-order
> Values: <uint32_t>
>
> The maximum supported size of the data ring in units of lb(machine
> pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, etc.).
>
> #### State Machine
>
> Initialization:
>
> *Front* *Back*
> XenbusStateInitialising XenbusStateInitialising
> - Query virtual device - Query backend device
> properties. identification data.
> - Setup OS device instance. - Publish backend features
> - Allocate and initialize the and transport parameters
> request ring. |
> - Publish transport parameters |
> that will be in effect during V
> this connection. XenbusStateInitWait
> |
> |
> V
> XenbusStateInitialised
>
> - Query frontend transport
> parameters.
> - Connect to the request ring and
> event channel.
> |
> |
> V
> XenbusStateConnected
>
> - Query backend device properties.
> - Finalize OS virtual device
> instance.
> |
> |
> V
> XenbusStateConnected
>
> Once frontend and backend are connected, they have a shared page, which
> will is used to exchange messages over a ring, and an event channel,
> which is used to send notifications.
>
> Shutdown:
>
> *Front* *Back*
> XenbusStateConnected XenbusStateConnected
> |
> |
> V
> XenbusStateClosing
>
> - Unmap grants
> - Unbind evtchns
> |
> |
> V
> XenbusStateClosing
>
> - Unbind evtchns
> - Free rings
> - Free data structures
> |
> |
> V
> XenbusStateClosed
>
> - Free remaining data structures
> |
> |
> V
> XenbusStateClosed
>
>
> ### Commands Ring
>
> The shared ring is used by the frontend to forward POSIX function calls to the
> backend. I'll refer to this ring as **commands ring** to distinguish it from
> other rings which can be created later in the lifecycle of the protocol (data
> rings). The ring format is defined using the familiar `DEFINE_RING_TYPES`
> macro
> (`xen/include/public/io/ring.h`). Frontend requests are allocated on the ring
> using the `RING_GET_REQUEST` macro.
>
> The format is defined as follows:
>
> #define PVCALLS_SOCKET 0
> #define PVCALLS_CONNECT 1
> #define PVCALLS_RELEASE 2
> #define PVCALLS_BIND 3
> #define PVCALLS_LISTEN 4
> #define PVCALLS_ACCEPT 5
> #define PVCALLS_POLL 6
>
> struct xen_pvcalls_request {
> uint32_t req_id; /* private to guest, echoed in response */
> uint32_t cmd; /* command to execute */
> union {
> struct xen_pvcalls_socket {
> uint64_t id;
> uint32_t domain;
> uint32_t type;
> uint32_t protocol;
> } socket;
> struct xen_pvcalls_connect {
> uint64_t id;
> uint8_t addr[28];
> uint32_t len;
> uint32_t flags;
> grant_ref_t ref;
> uint32_t evtchn;
> } connect;
> struct xen_pvcalls_release {
> uint64_t id;
> } release;
> struct xen_pvcalls_bind {
> uint64_t id;
> uint8_t addr[28];
> uint32_t len;
> } bind;
> struct xen_pvcalls_listen {
> uint64_t id;
> uint32_t backlog;
> } listen;
> struct xen_pvcalls_accept {
> uint64_t id;
> uint64_t id_new;
> grant_ref_t ref;
> uint32_t evtchn;
> } accept;
> struct xen_pvcalls_poll {
> uint64_t id;
> } poll;
> /* dummy member to force sizeof(struct xen_pvcalls_request) to
> match across archs */
> struct xen_pvcalls_dummy {
> uint8_t dummy[56];
> } dummy;
> } u;
> };
>
> The first two fields are common for every command. Their binary layout
> is:
>
> 0 4 8
> +-------+-------+
> |req_id | cmd |
> +-------+-------+
>
> - **req_id** is generated by the frontend and identifies one specific request
> - **cmd** is the command requested by the frontend:
>
> - `PVCALLS_SOCKET`: 0
> - `PVCALLS_CONNECT`: 1
> - `PVCALLS_RELEASE`: 2
> - `PVCALLS_BIND`: 3
> - `PVCALLS_LISTEN`: 4
> - `PVCALLS_ACCEPT`: 5
> - `PVCALLS_POLL`: 6
>
> Both fields are echoed back by the backend.
>
> As for the other Xen ring based protocols, after writing a request to the
> ring,
> the frontend calls `RING_PUSH_REQUESTS_AND_CHECK_NOTIFY` and issues an event
> channel notification when a notification is required.
>
> Backend responses are allocated on the ring using the `RING_GET_RESPONSE`
> macro.
> The format is the following:
>
> struct xen_pvcalls_response {
> uint32_t req_id;
> uint32_t cmd;
> int32_t ret;
> uint32_t pad;
> union {
> struct _xen_pvcalls_socket {
> uint64_t id;
> } socket;
> struct _xen_pvcalls_connect {
> uint64_t id;
> } connect;
> struct _xen_pvcalls_release {
> uint64_t id;
> } release;
> struct _xen_pvcalls_bind {
> uint64_t id;
> } bind;
> struct _xen_pvcalls_listen {
> uint64_t id;
> } listen;
> struct _xen_pvcalls_accept {
> uint64_t id;
> } accept;
> struct _xen_pvcalls_poll {
> uint64_t id;
> } poll;
> struct _xen_pvcalls_dummy {
> uint8_t dummy[8];
> } dummy;
> } u;
> };
>
> The first four fields are common for every response. Their binary layout
> is:
>
> 0 4 8 12 16
> +-------+-------+-------+-------+
> |req_id | cmd | ret | pad |
> +-------+-------+-------+-------+
>
> - **req_id**: echoed back from request
> - **cmd**: echoed back from request
> - **ret**: return value, identifies success (0) or failure (see error numbers
> below). If the **cmd** is not supported by the backend, ret is ENOTSUPP.
> - **pad**: padding
>
> After calling `RING_PUSH_RESPONSES_AND_CHECK_NOTIFY`, the backend checks
> whether
> it needs to notify the frontend and does so via event channel.
>
> A description of each command, their additional request and response
> fields follow.
>
>
> #### Socket
>
> The **socket** operation corresponds to the POSIX [socket][socket] function.
> It
> creates a new socket of the specified family, type and protocol. **id** is
> freely chosen by the frontend and references this specific socket from this
> point forward. See "Socket families and address format" below.
>
> Request fields:
>
> - **cmd** value: 0
> - additional fields:
> - **id**: generated by the frontend, it identifies the new socket
> - **domain**: the communication domain
> - **type**: the socket type
> - **protocol**: the particular protocol to be used with the socket, usually > 0
>
> Request binary layout:
>
> 8 12 16 20 24 28
> +-------+-------+-------+-------+-------+
> | id |domain | type |protoco|
> +-------+-------+-------+-------+-------+
>
> Response additional fields:
>
> - **id**: echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +-------+--------+
> | id |
> +-------+--------+
>
> Return value:
>
> - 0 on success
> - See the [POSIX socket function][connect] for error names; the
> corresponding
> error numbers are specified later in this document.
>
> #### Connect
>
> The **connect** operation corresponds to the POSIX [connect][connect]
> function.
> It connects a previously created socket (identified by **id**) to the
> specified address.
>
> The connect operation creates a new shared ring, which we'll call **data
> ring**. The data ring is used to send and receive data from the socket.
> The connect operation passes two additional parameters which are
> utilized to setup the new ring: **evtchn** and **ref**. **evtchn** is the
> port number of a new event channel which will be used for notifications
> of activity on the data ring. **ref** is the grant reference of a page
> which containes shared pointers to write and read data from the data ring
> and the full array of grant references for the ring buffers. It will be
> described in more detailed later. The data ring is unmapped and freed upon
> issuing a **release** command on the active socket identified by **id**.
>
> When the frontend issues a **connect** command, the backend:
> - finds its own internal socket corresponding to **id**
> - connects the socket to **addr**
> - maps the grant reference **ref**, the shared page contains the data
> ring interface (`struct pvcalls_data_intf`)
> - maps all the grant references listed in `struct pvcalls_data_intf` and
> uses them as shared memory for the ring buffers
> - bind the **evtchn**
> - replies to the frontend
>
> The data ring format will be described in the following section.
>
> Request fields:
>
> - **cmd** value: 0
> - additional fields:
> - **id**: identifies the socket
> - **addr**: address to connect to, see the address format section for more
> information
> - **len**: address length
> - **flags**: flags for the connection, reserved for future usage
> - **ref**: grant reference of the page containing `struct
> pvcalls_data_intf`
> - **evtchn**: port number of the evtchn to signal activity on the data ring
>
> Request binary layout:
>
> 8 12 16 20 24 28 32 36 40 44
> +-------+-------+-------+-------+-------+-------+-------+-------+-------+
> | id | addr |
> +-------+-------+-------+-------+-------+-------+-------+-------+-------+
> | len | flags | ref |evtchn |
> +-------+-------+-------+-------+
>
> Response additional fields:
>
> - **id**: echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +-------+-------+
> | id |
> +-------+-------+
>
> Return value:
>
> - 0 on success
> - See the [POSIX connect function][connect] for error names; the
> corresponding
> error numbers are specified later in this document.
>
> #### Release
>
> The **release** operation closes an existing active or a passive socket.
>
> When a release command is issued on a passive socket, the backend releases it
> and frees its internal mappings. When a release command is issued for an
> active
> socket, the data ring is also unmapped and freed:
>
> - frontend sends release command for an active socket
> - backend releases the socket
> - backend unmaps the data ring buffers
> - backend unmaps the data ring interface
> - backend unbinds the evtchn
> - backend replies to frontend
> - frontend frees ring and unbinds evtchn
>
> Request fields:
>
> - **cmd** value: 1
> - additional fields:
> - **id**: identifies the socket
>
> Request binary layout:
>
> 8 12 16
> +-------+-------+
> | id |
> +-------+-------+
>
> Response additional fields:
>
> - **id**: echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +-------+-------+
> | id |
> +-------+-------+
>
> Return value:
>
> - 0 on success
> - See the [POSIX shutdown function][shutdown] for error names; the
> corresponding error numbers are specified later in this document.
>
> #### Bind
>
> The **bind** operation corresponds to the POSIX [bind][bind] function. It
> assigns the address passed as parameter to a previously created socket,
> identified by **id**. **Bind**, **listen** and **accept** are the three
> operations required to have fully working passive sockets and should be
> issued in this order.
>
> Request fields:
>
> - **cmd** value: 2
> - additional fields:
> - **id**: identifies the socket
> - **addr**: address to connect to, see the address format section for more
> information
> - **len**: address length
>
> Request binary layout:
>
> 8 12 16 20 24 28 32 36 40 44
> +-------+-------+-------+-------+-------+-------+-------+-------+-------+
> | id | addr |
> +-------+-------+-------+-------+-------+-------+-------+-------+-------+
> | len |
> +-------+
>
> Response additional fields:
>
> - **id**: echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +-------+-------+
> | id |
> +-------+-------+
>
> Return value:
>
> - 0 on success
> - See the [POSIX bind function][bind] for error names; the corresponding
> error
> numbers are specified later in this document.
>
>
> #### Listen
>
> The **listen** operation marks the socket as a passive socket. It corresponds
> to
> the [POSIX listen function][listen].
>
> Reuqest fields:
>
> - **cmd** value: 3
> - additional fields:
> - **id**: identifies the socket
> - **backlog**: the maximum length to which the queue of pending
> connections may grow
>
> Request binary layout:
>
> 8 12 16 20
> +-------+-------+-------+
> | id |backlog|
> +-------+-------+-------+
>
> Response additional fields:
>
> - **id**: echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +-------+-------+
> | id |
> +-------+-------+
>
> Return value:
> - 0 on success
> - See the [POSIX listen function][listen] for error names; the corresponding
> error numbers are specified later in this document.
>
>
> #### Accept
>
> The **accept** operation extracts the first connection request on the
> queue of pending connections for the listening socket identified by
> **id** and creates a new connected socket. The id of the new socket is
> also chosen by the frontend and passed as an additional field of the
> accept request struct (**id_new**). See the [POSIX accept function][accept]
> as reference.
>
> Similarly to the **connect** operation, **accept** creates a new data ring.
> Information necessary to setup the new ring, such the grant table reference of
> the page containing the data ring interface (`struct pvcalls_data_intf`) and
> event channel port, are passed from the frontend to the backend as part of the
> request.
>
> The backend will reply to the request only when a new connection is
> successfully
> accepted, i.e. the backend does not return EAGAIN or EWOULDBLOCK.
>
> Example workflow:
>
> - frontend issues an **accept** request
> - backend waits for a connection to be available on the socket
> - a new connection becomes available
> - backend accepts the new connection
> - backend creates an internal mapping from **id_new** to the new socket
> - backend maps the grant reference **ref**, the shared page contains the
> data ring interface (`struct pvcalls_data_intf`)
> - backend maps all the grant references listed in `struct
> pvcalls_data_intf` and uses them as shared memory for the new data
> ring
> - backend binds the **evtchn**
> - backend replies to the frontend
>
> Request fields:
>
> - **cmd** value: 4
> - additional fields:
> - **id**: id of listening socket
> - **id_new**: id of the new socket
> - **ref**: grant reference of the data ring interface (`struct
> pvcalls_data_intf`)
> - **evtchn**: port number of the evtchn to signal activity on the data ring
>
> Request binary layout:
>
> 8 12 16 20 24 28 32
> +-------+-------+-------+-------+-------+-------+
> | id | id_new | ref |evtchn |
> +-------+-------+-------+-------+-------+-------+
>
> Response additional fields:
>
> - **id**: id of the listening socket, echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +-------+-------+
> | id |
> +-------+-------+
>
> Return value:
>
> - 0 on success
> - See the [POSIX accept function][accept] for error names; the corresponding
> error numbers are specified later in this document.
>
>
> #### Poll
>
> In this version of the protocol, the **poll** operation is only valid
> for passive sockets. For active sockets, the frontend should look at the
> state of the data ring. When a new connection is available in the queue
> of the passive socket, the backend generates a response and notifies the
> frontend.
>
> Request fields:
>
> - **cmd** value: 5
> - additional fields:
> - **id**: identifies the listening socket
>
> Request binary layout:
>
> 8 12 16
> +-------+-------+
> | id |
> +-------+-------+
>
>
> Response additional fields:
>
> - **id**: echoed back from request
>
> Response binary layout:
>
> 16 20 24
> +--------+--------+
> | id |
> +--------+--------+
>
> Return value:
>
> - 0 on success
> - See the [POSIX poll function][poll] for error names; the corresponding
> error
> numbers are specified later in this document.
>
> #### Error numbers
>
> The numbers corresponding to the error names specified by POSIX are:
>
> [EPERM] -1
> [ENOENT] -2
> [ESRCH] -3
> [EINTR] -4
> [EIO] -5
> [ENXIO] -6
> [E2BIG] -7
> [ENOEXEC] -8
> [EBADF] -9
> [ECHILD] -10
> [EAGAIN] -11
> [EWOULDBLOCK] -11
> [ENOMEM] -12
> [EACCES] -13
> [EFAULT] -14
> [EBUSY] -16
> [EEXIST] -17
> [EXDEV] -18
> [ENODEV] -19
> [EISDIR] -21
> [EINVAL] -22
> [ENFILE] -23
> [EMFILE] -24
> [ENOSPC] -28
> [EROFS] -30
> [EMLINK] -31
> [EDOM] -33
> [ERANGE] -34
> [EDEADLK] -35
> [EDEADLOCK] -35
> [ENAMETOOLONG] -36
> [ENOLCK] -37
> [ENOTEMPTY] -39
> [ENOSYS] -38
> [ENODATA] -61
> [ETIME] -62
> [EBADMSG] -74
> [EOVERFLOW] -75
> [EILSEQ] -84
> [ERESTART] -85
> [ENOTSOCK] -88
> [EOPNOTSUPP] -95
> [EAFNOSUPPORT] -97
> [EADDRINUSE] -98
> [EADDRNOTAVAIL] -99
> [ENOBUFS] -105
> [EISCONN] -106
> [ENOTCONN] -107
> [ETIMEDOUT] -110
> [ENOTSUPP] -524
>
> #### Socket families and address format
>
> The following definitions and explicit sizes, together with POSIX
> [sys/socket.h][address] and [netinet/in.h][in] define socket families and
> address format. Please be aware that only the **domain** `AF_INET`, **type**
> `SOCK_STREAM` and **protocol** `0` are supported by this version of the spec.
>
> #define AF_UNSPEC 0
> #define AF_UNIX 1 /* Unix domain sockets */
> #define AF_LOCAL 1 /* POSIX name for AF_UNIX */
> #define AF_INET 2 /* Internet IP Protocol */
> #define AF_INET6 10 /* IP version 6 */
>
> #define SOCK_STREAM 1
> #define SOCK_DGRAM 2
> #define SOCK_RAW 3
>
> /* generic address format */
> struct sockaddr {
> uint16_t sa_family_t;
> char sa_data[26];
> };
>
> struct in_addr {
> uint32_t s_addr;
> };
>
> /* AF_INET address format */
> struct sockaddr_in {
> uint16_t sa_family_t;
> uint16_t sin_port;
> struct in_addr sin_addr;
> char sin_zero[20];
> };
>
>
> ### Data ring
>
> Data rings are used for sending and receiving data over a connected socket.
> They
> are created upon a successful **accept** or **connect** command.
>
> A data ring is composed of two pieces: the interface and the **in** and
> **out**
> buffers. The interface, represented by `struct pvcalls_ring_intf` is shared
> first and resides on the page whose grant reference is passed by **accept**
> and
> **connect** as parameter. `struct pvcalls_ring_intf` contains the list of
> grant
> references which constitute the **in** and **out** data buffers.
>
> #### Data ring interface
>
> struct pvcalls_data_intf {
> PVCALLS_RING_IDX in_cons, in_prod;
> PVCALLS_RING_IDX out_cons, out_prod;
> int32_t in_error, out_error;
>
> uint32_t ring_order;
> grant_ref_t ref[];
> };
>
> /* not actually C compliant (ring_order changes from socket to socket) */
> struct pvcalls_data {
> char in[((1<<ring_order)<<PAGE_SHIFT)/2];
> char out[((1<<ring_order)<<PAGE_SHIFT)/2];
> };
>
> - **ring_order**
> It represents the order of the data ring. The following list of grant
> references is of `(1 << ring_order)` elements. It cannot be greater than
> **max-dataring-page-order**, as specified by the backend on XenBus.
> - **ref[]**
> The list of grant references which will contain the actual data. They are
> mapped contiguosly in virtual memory. The first half of the pages is the
> **in** array, the second half is the **out** array.
> - **in** is an array used as circular buffer
> It contains data read from the socket. The producer is the backend, the
> consumer is the frontend.
> - **out** is an array used as circular buffer
> It contains data to be written to the socket. The producer is the frontend,
> the consumer is the backend.
> - **in_cons** and **in_prod**
> Consumer and producer pointers for data read from the socket. They keep
> track
> of how much data has already been consumed by the frontend from the **in**
> array. **in_prod** is increased by the backend, after writing data to
> **in**.
> **in_cons** is increased by the frontend, after reading data from **in**.
> - **out_cons**, **out_prod**
> Consumer and producer pointers for the data to be written to the socket.
> They
> keep track of how much data has been written by the frontend to **out** and
> how much data has already been consumed by the backend. **out_prod** is
> increased by the frontend, after writing data to **out**. **out_cons** is
> increased by the backend, after reading data from **out**.
> - **in_error** and **out_error** They signal errors when reading from the
> socket
> (**in_error**) or when writing to the socket (**out_error**). 0 means no
> errors. When an error occurs, no further reads or writes operations are
> performed on the socket. In the case of an orderly socket shutdown (i.e.
> read
> returns 0) **in_error** is set to ENOTCONN. **in_error** and **out_error**
> are never set to EAGAIN or EWOULDBLOCK.
>
> The binary layout of `struct pvcalls_data_intf` follows:
>
> 0 4 8 12 16 20 24 28
> +---------+---------+---------+---------+---------+---------+----------+
> | in_cons | in_prod |out_cons |out_prod |in_error |out_error|ring_order|
> +---------+---------+---------+---------+---------+---------+----------+
>
> 28 32 36 40 4092 4096
> +---------+---------+---------+----//---+---------+
> | ref[0] | ref[1] | ref[2] | | ref[N] |
> +---------+---------+---------+----//---+---------+
>
> The binary layout of the ring buffers follow:
>
> 0 ((1<<ring_order)<<PAGE_SHIFT)/2
> ((1<<ring_order)<<PAGE_SHIFT)
> +------------//-------------+------------//-------------+
> | in | out |
> +------------//-------------+------------//-------------+
>
> #### Workflow
>
> The **in** and **out** arrays are used as circular buffers:
>
> 0 sizeof(array) ==
> ((1<<ring_order)<<PAGE_SHIFT)/2
> +-----------------------------------+
> |to consume| free |to consume |
> +-----------------------------------+
> ^ ^
> prod cons
>
> 0 sizeof(array)
> +-----------------------------------+
> | free | to consume | free |
> +-----------------------------------+
> ^ ^
> cons prod
>
> The following function is provided to calculate how many bytes are currently
> left unconsumed in an array:
>
> #define _MASK_PVCALLS_IDX(idx, ring_size) ((idx) & (ring_size-1))
>
> static inline PVCALLS_RING_IDX pvcalls_ring_queued(PVCALLS_RING_IDX prod,
> PVCALLS_RING_IDX cons,
> PVCALLS_RING_IDX ring_size)
> {
> PVCALLS_RING_IDX size;
>
> if (prod == cons)
> return 0;
>
> prod = _MASK_PVCALLS_IDX(prod, ring_size);
> cons = _MASK_PVCALLS_IDX(cons, ring_size);
>
> if (prod == cons)
> return ring_size;
>
> if (prod > cons)
> size = prod - cons;
> else {
> size = ring_size - cons;
> size += prod;
> }
> return size;
> }
>
> The producer (the backend for **in**, the frontend for **out**) writes to the
> array in the following way:
>
> - read *cons*, *prod*, *error* from shared memory
> - memory barrier
> - return on *error*
> - write to array at position *prod* up to *cons*, wrapping around the circular
> buffer when necessary
> - memory barrier
> - increase *prod*
> - notify the other end via evtchn
>
> The consumer (the backend for **out**, the frontend for **in**) reads from the
> array in the following way:
>
> - read *prod*, *cons*, *error* from shared memory
> - memory barrier
> - return on *error*
> - read from array at position *cons* up to *prod*, wrapping around the
> circular
> buffer when necessary
> - memory barrier
> - increase *cons*
> - notify the other end via evtchn
>
> The producer takes care of writing only as many bytes as available in the
> buffer
> up to *cons*. The consumer takes care of reading only as many bytes as
> available
> in the buffer up to *prod*. *error* is set by the backend when an error occurs
> writing or reading from the socket.
>
>
> [address]: http://pubs.opengroup.org/onlinepubs/7908799/xns/syssocket.h.html
> [in]:
> http://pubs.opengroup.org/onlinepubs/000095399/basedefs/netinet/in.h.html
> [socket]: http://pubs.opengroup.org/onlinepubs/009695399/functions/socket.html
> [connect]: http://pubs.opengroup.org/onlinepubs/7908799/xns/connect.html
> [shutdown]: http://pubs.opengroup.org/onlinepubs/7908799/xns/shutdown.html
> [bind]: http://pubs.opengroup.org/onlinepubs/7908799/xns/bind.html
> [listen]: http://pubs.opengroup.org/onlinepubs/7908799/xns/listen.html
> [accept]: http://pubs.opengroup.org/onlinepubs/7908799/xns/accept.html
> [poll]: http://pubs.opengroup.org/onlinepubs/7908799/xsh/poll.html
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |