Xen project Mailing List

Re: [Xen-devel] [DRAFT 1] XenSock protocol design document

On 07/08/2016 12:23 PM, Stefano Stabellini wrote: > Hi all, > Hey! [...] > > ## Design > > ### Xenstore > > The frontend and the backend connect to each other exchanging information via > xenstore. The toolstack creates front and back nodes with state > XenbusStateInitialising. There can only be one XenSock frontend per domain. > > #### Frontend XenBus Nodes > > port > Values: <uint32_t> > > The identifier of the Xen event channel used to signal activity > in the ring buffer. > > ring-ref > Values: <uint32_t> > > The Xen grant reference granting permission for the backend to map > the sole page in a single page sized ring buffer. Would it make sense to export minimum, default and maximum size of the socket over xenstore entries? It normally follows a convention depending on the type of socket (and OS) you have, or then through settables on socket options. > ### Commands Ring > > The shared ring is used by the frontend to forward socket API calls to the > backend. I'll refer to this ring as **commands ring** to distinguish it from > other rings which will be created later in the lifecycle of the protocol (data > rings). The ring format is defined using the familiar `DEFINE_RING_TYPES` > macro > (`xen/include/public/io/ring.h`). Frontend requests are allocated on the ring > using the `RING_GET_REQUEST` macro. > > The format is defined as follows: > > #define XENSOCK_DATARING_ORDER 6 > #define XENSOCK_DATARING_PAGES (1 << XENSOCK_DATARING_ORDER) > #define XENSOCK_DATARING_SIZE (XENSOCK_DATARING_PAGES << PAGE_SHIFT) > > #define XENSOCK_CONNECT 0 > #define XENSOCK_RELEASE 3 > #define XENSOCK_BIND 4 > #define XENSOCK_LISTEN 5 > #define XENSOCK_ACCEPT 6 > #define XENSOCK_POLL 7 > > struct xen_xensock_request { > uint32_t id; /* private to guest, echoed in response */ > uint32_t cmd; /* command to execute */ > uint64_t sockid; /* id of the socket */ > union { > struct xen_xensock_connect { > uint8_t addr[28]; > uint32_t len; > uint32_t flags; > grant_ref_t ref[XENSOCK_DATARING_PAGES]; > uint32_t evtchn; > } connect; > struct xen_xensock_bind { > uint8_t addr[28]; /* ipv6 ready */ > uint32_t len; > } bind; > struct xen_xensock_accept { > uint64_t sockid; > grant_ref_t ref[XENSOCK_DATARING_PAGES]; > uint32_t evtchn; > } accept; > } u; > }; > > The first three fields are common for every command. Their binary layout > is: > > 0 4 8 12 16 > +-------+-------+-------+-------+ > | id | cmd | sockid | > +-------+-------+-------+-------+ > > - **id** is generated by the frontend and identifies one specific request > - **cmd** is the command requested by the frontend: > - `XENSOCK_CONNECT`: 0 > - `XENSOCK_RELEASE`: 3 > - `XENSOCK_BIND`: 4 > - `XENSOCK_LISTEN`: 5 > - `XENSOCK_ACCEPT`: 6 > - `XENSOCK_POLL`: 7 > - **sockid** is generated by the frontend and identifies the socket to > connect, > bind, etc. A new sockid is required on `XENSOCK_CONNECT` and `XENSOCK_BIND` > commands. A new sockid is also required on `XENSOCK_ACCEPT`, for the new > socket. > Interesting - Have you consider setsockopt and getsockopt to be part of this? There are some common options (as in POSIX defined) and then some more exotic flavors Linux or FreeBSD specific. Say SO_REUSEPORT used on nginx that is good for load balancing across a set of workers or Linux SO_BUSY_POLL for low latency sockets. Though not sure how sensible it is to start exposing all of these socket options but to limit to a specific subset? Or maybe doesn't make sense for your case - see further suggestion regarding data ring part. > All three fields are echoed back by the backend. > > As for the other Xen ring based protocols, after writing a request to the > ring, > the frontend calls `RING_PUSH_REQUESTS_AND_CHECK_NOTIFY` and issues an event > channel notification when a notification is required. > > Backend responses are allocated on the ring using the `RING_GET_RESPONSE` > macro. > The format is the following: > > struct xen_xensock_response { > uint32_t id; > uint32_t cmd; > uint64_t sockid; > int32_t ret; > }; > > 0 4 8 12 16 20 > +-------+-------+-------+-------+-------+ > | id | cmd | sockid | ret | > +-------+-------+-------+-------+-------+ > > - **id**: echoed back from request > - **cmd**: echoed back from request > - **sockid**: echoed back from request > - **ret**: return value, identifies success or failure > Are these fields taken from a specific OS (I assumed Linux)? Probably ids, cmd and ret size could be less big overall or may be not - in which case could be useful specifying in the spec if it's following a specific OS. [...] > The design is flexible and can support different ring sizes (at compile time). > The following description is based on order 6 rings, chosen because they > provide > excellent performance. > > - **in** is an array of 65536 bytes, used as circular buffer > It contains data read from the socket. The producer is the backend, the > consumer is the frontend. > - **out** is an array of 131072 bytes, used as circular buffer > It contains data to be written to the socket. The producer is the frontend, > the consumer is the backend. Could this size be a tunable intercepting RCVBUF and SNDBUF sockopt adjustments (these two are POSIX defined) ofc under the assumption that in this proposal you want to replicate local and remote socket? IOW to dynamically allocate how much the socket will use for sending/receiving which would turn into the amount of grants in use? Even doing with xenstore entries in the backend is better - even though user may want to adjust send/receive buffer for whatever aplication needs. Ideally this would be dynamic per socket, instead of compile-time defined - and would allow more sockets on the same VM without overshooting the grant table limits. Joao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.