[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing



> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@xxxxxxxxxx]
> Sent: 08 January 2016 15:54
> To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and
> toeplitz hashing
> 
> On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> >
> > diff --git a/xen/include/public/io/netif.h
> > b/xen/include/public/io/netif.h
> > index 1790ea0..06e0b61 100644
> > --- a/xen/include/public/io/netif.h
> > +++ b/xen/include/public/io/netif.h
> > @@ -151,6 +151,270 @@
> > Â */
> >
> > Â/*
> > + * Control ring
> > + * ============
> > + *
> > + * Some features, such as toeplitz hashing (detailed below), require a
> > + * significant amount of out-of-band data to be passed from frontend to
> > + * backend. Use of xenstore is not suitable for large quantities of data
> > + * because of quota limitations and so a dedicated 'control ring' is used.
> > + * The ability of the backend to use a control ring is advertised by
> > + * setting:
> > + *
> > + * /local/domain/X/backend///feature-control-ring = "1"
> > + *
> > + * The frontend provides a control ring to the backend by setting:
> > + *
> > + * /local/domain//device/vif//ctrl-ring-ref =
> > + * /local/domain//device/vif//event-channel-ctrl =
> > + *
> > + * where  is the grant reference of the shared page used to
> > + * implement the control ring and  is an event channel to be used
> > + * as a mailbox interrupt, before the frontend moves into the connected
> > + * state.
> 
> I read this as saying that the mailbox interrupt can be used until the
> frontend moves into the connected state, which I realise isn't what you
> meant but having so much stuff between "by setting" and "before" makes it
> easy to read wrongly.
> 
> How about "... as a mailbox interrupt. These keys must be written before
> moving into the connected state." ?

Yes, that's clearer.

> 
> > + * The control ring uses a fixed request/response message size and is
> > + * balanced (i.e. one request to one response), so operationally it is much
> > + * the same as a tramsmit or receive ring.
> 
> "transmit".
> 
> Is it intended to support out of order responses?

I didn't want to preclude it in case someone wants to add a potentially long 
running control operation in future.

> 
> > + */
> > +
> > +/*
> > + * Toeplitz hash types
> > + * ===================
> > + *
> > + * For the purposes of the definitions below, 'Packet[]' is an array of
> > + * octets containing an IP packet without options, 'Array[X..Y]' means a
> > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
> > + * used to indicate concatenation of arrays.
> > + */
> > +
> > +/*
> > + * A hash calculated over an IP version 4 header as follows:
> > + *
> > + * Buffer[0..8] = Packet[12..15] + Packet[16..19]
> 
> Is there a reason to write it like this rather than:
> + * Buffer[0..8] = Packet[12..19]
> ?
> 
> Maybe something which is obvious if you know about Toeplitz in more than
> just a passing fashion?

The reason is actually so it more resembles the Microsoft documentation at 
https://msdn.microsoft.com/en-us/library/windows/hardware/ff570725%28v=vs.85%29.aspx.
 Perhaps it would make more sense if I pointed out that Packet[12..15] is the 
source address and Packet[16..19] is the dest address (and do the same in the 
subsequent comments)?

> 
> 
> > +/*
> > + * Control requests (netif_ctrl_request_t)
> > + * =======================================
> > + *
> > + * All requests have the following format:
> > + *
> > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂidÂÂÂÂÂ|ÂÂÂtypeÂÂÂÂ|ÂÂÂÂÂÂÂÂÂdata[0]ÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂÂÂÂÂÂdata[1]ÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+
> 
> These are packed in the ring, i.e. there are no implicit padding bytes,
> right?

DEFINE_RING_TYPES unions the req and rsp structures and then defines an array 
but since req and rsp are the same size in this case the reqs should end up 
packed.

> 
> I know the Toeplitz stuff doesn't need it, but we should consider designing
> in a scheme to handle control commands which need >8 octets of data, or
> else we risk ending up with something as confusing as the data path...
> 

I'm ok with that, but you can always use a gref (as is done for the key and 
mapping table). If 8 seems to small, what do you feel would be the right number 
to choose?

> > + *
> > + * id: the request identifier, echoed in response.
> > + * type: the type of request (see below)
> > + * data[]: any data associated with the request (determined by type)
> > + */
> > +
> > +struct netif_ctrl_request {
> > +ÂÂÂÂuint16_t id;
> > +ÂÂÂÂuint16_t type;
> > +
> > +#define NETIF_CTRL_TYPE_INVALIDÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0
> > +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGSÂÂÂ1
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGSÂÂÂ2
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEYÂÂÂÂÂ3
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4
> > +
> > +ÂÂÂÂuint32_t data[2];
> > +};
> > +typedef struct netif_ctrl_request netif_ctrl_request_t;
> > +
> > +/*
> > + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
> > + *
> > + * This is sent by the frontend to query the types of toeplitz
> > + * hash supported by the backend. No data is required and to the
> 
> "and to the" has an extra "to" in it I think?
> 
> 
> > + * data[] field is set to 0.
> 
> "fields are" ?
> 
> > + *
> > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> > + *
> > + * This is sent by the frontend to set the types of toeplitz hash that
> > + * the backend should calculate. Note that the 'maximal' type of hash
> > + * should always be chosen. For example, if the frontend sets both IPV4
> > + * and IPV4_TCP hash types then the latter hash type should be calculated
> > + * for any TCP packet and the former only calculated for non-TCP packets.
> > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_*
> values
> > + * defined above. The data[1] field is set to 0.
> > + *
> > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend
> > + *ÂÂÂÂÂÂÂis free to choose how it steers packets to queues (which is the
> > + *ÂÂÂÂÂÂÂdefault state).
> > + *
> > + * type = NETIF_CTRL_TYPE_SET_`TOEPLITZ_KEY:
> 
> Can this be sent before FLAGS?
> 
> Is it permissible to send this for a hash scheme not included in flags?
> 

Yes, it can but the idea is that nothing has any effect until a valid set of 
flags is specified.

> > + *
> > + * This is sent by the frontend to set the key of toeplitz hash that
> > + * the backend should calculate. The toeplitz algorithm is illustrated
> > + * by the following pseudo-code:
> > + *
> > + * (Buffer[] and Key[] are treated as shift-registers where the MSB of
> > + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-
> > 1]
> > + * is the 'right-most').
> > + *
> > + * Value = 0
> > + * For number of bits in Buffer[]
> > + *ÂÂÂÂIf (left-most bit of Buffer[] is 1)
> > + *ÂÂÂÂÂÂÂÂValue ^= left-most 32 bits of Key[]
> > + *ÂÂÂÂKey[] << 1
> > + *ÂÂÂÂBuffer[] << 1
> > + *
> > + * The data[0] field is set to the size of key in octets. The data[1]
> > + * field is set to a grant reference of a page containing the key.
> 
> I suppose it is implicit that the key starts at offset 0 in the page?
> 

Yes. I'll call that out.

> >  The
> > + * reference must remain valid until the corresponding
> > + * netif_ctrl_response_t has been processed.
> 
> May the ref be r/o? If so, should that be encouraged?
> 

Yes, it can and should be r/o.

> > + *
> > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
> > + *
> > + * This is sent by the frontend to set the mapping of toeplitz hash to
> > + * queue number to be applied by the backend.
> > + *
> > + * The data[0] field is set to the order of the mapping. The data[1] field
> > + * is set to a grant reference of a page containing the mapping. The
> > + * reference must remain valid until the corresponding
> > + * netif_ctrl_response_t has been processed.
> > + *
> > + * The format of the mapping is:
> > + *
> > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[0]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[1]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[2]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[3]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|
> > + *ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ.
> > + *ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ.
> > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[N-1]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + *
> > + * where each queue value is less than "multi-queue-num-queues" (see
> above)
> > + * and N is 1 << data[0].
> > + *
> > + * NOTE: Before a specific mapping is set using this request, the backend
> > + *ÂÂÂÂÂÂÂshould map all toeplitz hash values to queue 0 (which is the only
> > + *ÂÂÂÂÂÂÂqueue guaranteed to exist in all cases).
> 
> i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non-
> zero
> set of flags andÂNETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do
> this?
> 

Yes. The default map should be zeroed.

> If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen
> one
> with flags = 0 then as before the b/e is free to steer as it likes?
> 

Yes. No change to default behaviour.

> Is it valid to send this mapping command first before the flags?
> 

As explained above, yes.

> 
> > + */
> > +
> > +/*
> > + * Control responses (netif_ctrl_response_t)
> > + * =========================================
> > + *
> > + * All responses have the following format:
> > + *
> > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂidÂÂÂÂÂ|ÂÂÂpadÂÂÂÂÂ|ÂÂÂÂÂÂÂÂÂstatusÂÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |ÂÂÂÂÂÂÂÂÂdataÂÂÂÂÂÂÂÂÂÂ|
> > + * +-----+-----+-----+-----+
> > + *
> > + * id: the corresponding request identifier
> > + * pad: set to 0
> > + * status: the status of request processing
> > + * data: any data associated with the response (determined by type and
> > + *ÂÂÂÂÂÂÂstatus)
> 
> type needs to be remembered via the id? Why not echo it in the pad field
> for convenience?
> 

I originally had that but thought it would be superfluous given that the id is 
echoed. I can add it back if you think that it would be a good idea. All my 
implementation did with it though was ASSERT that it matched the req.

> 
> > + */
> > +
> > +struct netif_ctrl_response {
> > +ÂÂÂÂuint16_t id;
> > +ÂÂÂÂuint16_t pad;
> > +ÂÂÂÂuint32_t status;
> > +
> > +#define NETIF_CTRL_STATUS_SUCCESSÂÂÂÂÂÂÂÂÂÂÂ0
> > +#define NETIF_CTRL_STATUS_NOT_SUPPORTEDÂÂÂÂÂ1
> > +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
> > +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOWÂÂÂ3
> 
> Is the intention that these will be the union of all possible failure
> modes, rather than making provision for type-specific failures?
> 

Yes, they are just supposed to be errno-like things.

> These are all pretty generic though, and I suppose we could always figure
> that out when some command has a very obscure failure case.
> 

Yes.

> > +
> > +ÂÂÂÂuint32_t data;
> > +};
> > +typedef struct netif_ctrl_response netif_ctrl_response_t;
> > +
> > +/*
> > + * type =
> > + *
> > + * The default response for any unrecognised request has the status field
> > + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to
> 0.
> > + *
> > + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
> > + *
> > + * Since the request carries no data there is no reason for processing to
> > + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and
> the
> > + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
> (defined
> > + * above) indicating which hash types are supported by the backend.
> > + * If no hashing is supported then the data field should be set to 0.
> 
> I wonder if it would be clearer to define the generic request and response
> structures up front followed by the specific commands and their
> request+response formats at the same time, rather than splitting the
> specific command requests and responses into different places?
> 
> Like:
> 
> Command:ÂNETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS
> 
> Description: Blah blah
> 
> Request data format: Words words
> Valid response status codes:
> Â - One
> Â - Another
> Responds data format: Words
> 
> (i.e. more like API than protocol documentation in the form of independent
> looking things going back and forth)
> 

Yes, that may be easier to read. I'll do that.

> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > + *
> > + * If the data[0] field in the request is invalid (i.e. contains 
> > unsupported
> > + * hash types) then the status field is set to
> > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> should
> 
> "request"
> 
> > succeed
> > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> > + * The data field should be set to 0.
> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > + *
> > + * If the data[0] field in the request is an invalid key length (too big)
> 
> Can it not be too small as well? What is the behaviour if it is?
> 

I guess the way the algorithm is defined, a key of less than 32 bits doesn't 
make much sense.

> > + * then the status field is set to
> NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > + * data[1] field is an invalid grant reference then the status field is set
> > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request
> should
> > + * succeed and hence the status field is set to
> NETIF_CTRL_STATUS_SUCCESS.
> > + * The data field should be set to 0.
> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > + *
> > + * If the data[0] field in the request is an invalid mapping order (too 
> > big)
> 
> or too small?

An order 0 mapping would still contain a single entry and you can't get any 
smaller than that :-)

> 
> > + * then the status field is set to
> NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > + * data[1] field is an invalid grant reference then the status field is set
> > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> should
> 
> "request"
> 

Thanks for all the typo/spelling corrections. I'll pick those up too.

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.