[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
> -----Original Message----- > From: Ian Campbell [mailto:ian.campbell@xxxxxxxxxx] > Sent: 08 January 2016 15:54 > To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org) > Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and > toeplitz hashing > > On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote: > > > > diff --git a/xen/include/public/io/netif.h > > b/xen/include/public/io/netif.h > > index 1790ea0..06e0b61 100644 > > --- a/xen/include/public/io/netif.h > > +++ b/xen/include/public/io/netif.h > > @@ -151,6 +151,270 @@ > > Â */ > > > > Â/* > > + * Control ring > > + * ============ > > + * > > + * Some features, such as toeplitz hashing (detailed below), require a > > + * significant amount of out-of-band data to be passed from frontend to > > + * backend. Use of xenstore is not suitable for large quantities of data > > + * because of quota limitations and so a dedicated 'control ring' is used. > > + * The ability of the backend to use a control ring is advertised by > > + * setting: > > + * > > + * /local/domain/X/backend///feature-control-ring = "1" > > + * > > + * The frontend provides a control ring to the backend by setting: > > + * > > + * /local/domain//device/vif//ctrl-ring-ref = > > + * /local/domain//device/vif//event-channel-ctrl = > > + * > > + * where is the grant reference of the shared page used to > > + * implement the control ring and is an event channel to be used > > + * as a mailbox interrupt, before the frontend moves into the connected > > + * state. > > I read this as saying that the mailbox interrupt can be used until the > frontend moves into the connected state, which I realise isn't what you > meant but having so much stuff between "by setting" and "before" makes it > easy to read wrongly. > > How about "... as a mailbox interrupt. These keys must be written before > moving into the connected state." ? Yes, that's clearer. > > > + * The control ring uses a fixed request/response message size and is > > + * balanced (i.e. one request to one response), so operationally it is much > > + * the same as a tramsmit or receive ring. > > "transmit". > > Is it intended to support out of order responses? I didn't want to preclude it in case someone wants to add a potentially long running control operation in future. > > > + */ > > + > > +/* > > + * Toeplitz hash types > > + * =================== > > + * > > + * For the purposes of the definitions below, 'Packet[]' is an array of > > + * octets containing an IP packet without options, 'Array[X..Y]' means a > > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is > > + * used to indicate concatenation of arrays. > > + */ > > + > > +/* > > + * A hash calculated over an IP version 4 header as follows: > > + * > > + * Buffer[0..8] = Packet[12..15] + Packet[16..19] > > Is there a reason to write it like this rather than: > + * Buffer[0..8] = Packet[12..19] > ? > > Maybe something which is obvious if you know about Toeplitz in more than > just a passing fashion? The reason is actually so it more resembles the Microsoft documentation at https://msdn.microsoft.com/en-us/library/windows/hardware/ff570725%28v=vs.85%29.aspx. Perhaps it would make more sense if I pointed out that Packet[12..15] is the source address and Packet[16..19] is the dest address (and do the same in the subsequent comments)? > > > > +/* > > + * Control requests (netif_ctrl_request_t) > > + * ======================================= > > + * > > + * All requests have the following format: > > + * > > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂidÂÂÂÂÂ|ÂÂÂtypeÂÂÂÂ|ÂÂÂÂÂÂÂÂÂdata[0]ÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂÂÂÂÂÂdata[1]ÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+ > > These are packed in the ring, i.e. there are no implicit padding bytes, > right? DEFINE_RING_TYPES unions the req and rsp structures and then defines an array but since req and rsp are the same size in this case the reqs should end up packed. > > I know the Toeplitz stuff doesn't need it, but we should consider designing > in a scheme to handle control commands which need >8 octets of data, or > else we risk ending up with something as confusing as the data path... > I'm ok with that, but you can always use a gref (as is done for the key and mapping table). If 8 seems to small, what do you feel would be the right number to choose? > > + * > > + * id: the request identifier, echoed in response. > > + * type: the type of request (see below) > > + * data[]: any data associated with the request (determined by type) > > + */ > > + > > +struct netif_ctrl_request { > > +ÂÂÂÂuint16_t id; > > +ÂÂÂÂuint16_t type; > > + > > +#define NETIF_CTRL_TYPE_INVALIDÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0 > > +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGSÂÂÂ1 > > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGSÂÂÂ2 > > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEYÂÂÂÂÂ3 > > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4 > > + > > +ÂÂÂÂuint32_t data[2]; > > +}; > > +typedef struct netif_ctrl_request netif_ctrl_request_t; > > + > > +/* > > + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS: > > + * > > + * This is sent by the frontend to query the types of toeplitz > > + * hash supported by the backend. No data is required and to the > > "and to the" has an extra "to" in it I think? > > > > + * data[] field is set to 0. > > "fields are" ? > > > + * > > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS: > > + * > > + * This is sent by the frontend to set the types of toeplitz hash that > > + * the backend should calculate. Note that the 'maximal' type of hash > > + * should always be chosen. For example, if the frontend sets both IPV4 > > + * and IPV4_TCP hash types then the latter hash type should be calculated > > + * for any TCP packet and the former only calculated for non-TCP packets. > > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* > values > > + * defined above. The data[1] field is set to 0. > > + * > > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend > > + *ÂÂÂÂÂÂÂis free to choose how it steers packets to queues (which is the > > + *ÂÂÂÂÂÂÂdefault state). > > + * > > + * type = NETIF_CTRL_TYPE_SET_`TOEPLITZ_KEY: > > Can this be sent before FLAGS? > > Is it permissible to send this for a hash scheme not included in flags? > Yes, it can but the idea is that nothing has any effect until a valid set of flags is specified. > > + * > > + * This is sent by the frontend to set the key of toeplitz hash that > > + * the backend should calculate. The toeplitz algorithm is illustrated > > + * by the following pseudo-code: > > + * > > + * (Buffer[] and Key[] are treated as shift-registers where the MSB of > > + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N- > > 1] > > + * is the 'right-most'). > > + * > > + * Value = 0 > > + * For number of bits in Buffer[] > > + *ÂÂÂÂIf (left-most bit of Buffer[] is 1) > > + *ÂÂÂÂÂÂÂÂValue ^= left-most 32 bits of Key[] > > + *ÂÂÂÂKey[] << 1 > > + *ÂÂÂÂBuffer[] << 1 > > + * > > + * The data[0] field is set to the size of key in octets. The data[1] > > + * field is set to a grant reference of a page containing the key. > > I suppose it is implicit that the key starts at offset 0 in the page? > Yes. I'll call that out. > > The > > + * reference must remain valid until the corresponding > > + * netif_ctrl_response_t has been processed. > > May the ref be r/o? If so, should that be encouraged? > Yes, it can and should be r/o. > > + * > > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING: > > + * > > + * This is sent by the frontend to set the mapping of toeplitz hash to > > + * queue number to be applied by the backend. > > + * > > + * The data[0] field is set to the order of the mapping. The data[1] field > > + * is set to a grant reference of a page containing the mapping. The > > + * reference must remain valid until the corresponding > > + * netif_ctrl_response_t has been processed. > > + * > > + * The format of the mapping is: > > + * > > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[0]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[1]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[2]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[3]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > > + *ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ. > > + *ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ. > > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[N-1]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * > > + * where each queue value is less than "multi-queue-num-queues" (see > above) > > + * and N is 1 << data[0]. > > + * > > + * NOTE: Before a specific mapping is set using this request, the backend > > + *ÂÂÂÂÂÂÂshould map all toeplitz hash values to queue 0 (which is the only > > + *ÂÂÂÂÂÂÂqueue guaranteed to exist in all cases). > > i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non- > zero > set of flags andÂNETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do > this? > Yes. The default map should be zeroed. > If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen > one > with flags = 0 then as before the b/e is free to steer as it likes? > Yes. No change to default behaviour. > Is it valid to send this mapping command first before the flags? > As explained above, yes. > > > + */ > > + > > +/* > > + * Control responses (netif_ctrl_response_t) > > + * ========================================= > > + * > > + * All responses have the following format: > > + * > > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂidÂÂÂÂÂ|ÂÂÂpadÂÂÂÂÂ|ÂÂÂÂÂÂÂÂÂstatusÂÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > > + * |ÂÂÂÂÂÂÂÂÂdataÂÂÂÂÂÂÂÂÂÂ| > > + * +-----+-----+-----+-----+ > > + * > > + * id: the corresponding request identifier > > + * pad: set to 0 > > + * status: the status of request processing > > + * data: any data associated with the response (determined by type and > > + *ÂÂÂÂÂÂÂstatus) > > type needs to be remembered via the id? Why not echo it in the pad field > for convenience? > I originally had that but thought it would be superfluous given that the id is echoed. I can add it back if you think that it would be a good idea. All my implementation did with it though was ASSERT that it matched the req. > > > + */ > > + > > +struct netif_ctrl_response { > > +ÂÂÂÂuint16_t id; > > +ÂÂÂÂuint16_t pad; > > +ÂÂÂÂuint32_t status; > > + > > +#define NETIF_CTRL_STATUS_SUCCESSÂÂÂÂÂÂÂÂÂÂÂ0 > > +#define NETIF_CTRL_STATUS_NOT_SUPPORTEDÂÂÂÂÂ1 > > +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2 > > +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOWÂÂÂ3 > > Is the intention that these will be the union of all possible failure > modes, rather than making provision for type-specific failures? > Yes, they are just supposed to be errno-like things. > These are all pretty generic though, and I suppose we could always figure > that out when some command has a very obscure failure case. > Yes. > > + > > +ÂÂÂÂuint32_t data; > > +}; > > +typedef struct netif_ctrl_response netif_ctrl_response_t; > > + > > +/* > > + * type = > > + * > > + * The default response for any unrecognised request has the status field > > + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to > 0. > > + * > > + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS: > > + * > > + * Since the request carries no data there is no reason for processing to > > + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and > the > > + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values > (defined > > + * above) indicating which hash types are supported by the backend. > > + * If no hashing is supported then the data field should be set to 0. > > I wonder if it would be clearer to define the generic request and response > structures up front followed by the specific commands and their > request+response formats at the same time, rather than splitting the > specific command requests and responses into different places? > > Like: > > Command:ÂNETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS > > Description: Blah blah > > Request data format: Words words > Valid response status codes: > Â - One > Â - Another > Responds data format: Words > > (i.e. more like API than protocol documentation in the form of independent > looking things going back and forth) > Yes, that may be easier to read. I'll do that. > > + * > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS: > > + * > > + * If the data[0] field in the request is invalid (i.e. contains > > unsupported > > + * hash types) then the status field is set to > > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset > should > > "request" > > > succeed > > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS. > > + * The data field should be set to 0. > > + * > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY: > > + * > > + * If the data[0] field in the request is an invalid key length (too big) > > Can it not be too small as well? What is the behaviour if it is? > I guess the way the algorithm is defined, a key of less than 32 bits doesn't make much sense. > > + * then the status field is set to > NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the > > + * data[1] field is an invalid grant reference then the status field is set > > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request > should > > + * succeed and hence the status field is set to > NETIF_CTRL_STATUS_SUCCESS. > > + * The data field should be set to 0. > > + * > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING: > > + * > > + * If the data[0] field in the request is an invalid mapping order (too > > big) > > or too small? An order 0 mapping would still contain a single entry and you can't get any smaller than that :-) > > > + * then the status field is set to > NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the > > + * data[1] field is an invalid grant reference then the status field is set > > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset > should > > "request" > Thanks for all the typo/spelling corrections. I'll pick those up too. Paul > Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |