[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 0/1] netif: staging grants for I/O requests


  • To: 'Joao Martins' <joao.m.martins@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>
  • From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
  • Date: Mon, 18 Sep 2017 09:45:06 +0000
  • Accept-language: en-GB, en-US
  • Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
  • Delivery-date: Mon, 18 Sep 2017 09:45:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHTLLujuhA3ia/Gi02v6bzdzfWG4aK6a3TA
  • Thread-topic: [PATCH v3 0/1] netif: staging grants for I/O requests

> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@xxxxxxxxxx]
> Sent: 13 September 2017 19:11
> To: Xen-devel <xen-devel@xxxxxxxxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>;
> Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; Joao Martins
> <joao.m.martins@xxxxxxxxxx>
> Subject: [PATCH v3 0/1] netif: staging grants for I/O requests
> 
> Hey,
> 
> This is v3 taking into consideration all comments received from v2 (changelog
> in the first patch). The specification is right after the diffstat.
> 
> Reference implementation also here (on top of net-next):
> 
> https://github.com/jpemartins/linux.git xen-net-stg-gnts-v3
> 
> Although I am satisfied with how things are being done above, I wanted
> to request some advise/input on whether there could be a simpler way of
> achieving the same. Specifically because these control messages
> adds up significant code on the frontend to pregrant, and in other cases the
> control message might be limitative if frontend tries to keep a dinamically
> changed buffer pool in different queues. *Maybe* it could be simpler to
> adjust
> the TX/RX ring ABI in a compatible matter (Disclaimer: I haven't implemented
> this just yet):

But the whole point of pre-granting is to separate the grant/ungrant operations 
from the rx/tx operations, right? So, why would the extra control messages 
really be an overhead?

  Paul

> 
>  1) Add a flag `NETTXF_persist` to `netif_tx_request`
> 
>  2) Replace RX `netif_rx_request` padding with `flags` and adda
>  `NETRXF_persist` with the same purpose as 1).
> 
>  3) This remains backwards compatible as backends not supporting this
> wouldn't
>  act on this new flag, and given we replace padding with flags means
> unsupported
>  backends won't simply be aware of RX *request* `flags`. This is under the
>  assumption that there's no requirement that padding must be zero
> throughout
>  the netif.h specification.
> 
>  4) Keeping `GET_GREF_MAPPING_SIZE` ctrl msg for frontend to do better
>  decisions?
> 
>  5) Semantics are simple: slots with flags marked as NET{RX,TX}F_persist
>  represent a permanent mapped ref and therefore mapped if non-existent.
>  *future* omissions of the flag signals the mapping should be removed.
> 
> This would allow guests which reuse buffers (apparently Windows :)) to scale
> better as unmaps would be done on the individual queue context  plus
> allowing
> frontend to remain a more simple in the management of "permanent"
> buffers. The
> drawback seems to be the added complexity (and somewhat racy behaviour)
> on the
> datapath, to map or unmap accordingly. Because now we would have to
> differentiate between long vs short lived map/unmap ops in addition to
> looking
> up on our mappings table. Thoughts, or perhaps people may prefer the one
> already described in the series?
> 
> Cheers,
> 
> Joao Martins (1):
>   public/io/netif.h: add gref mapping control messages
> 
>  xen/include/public/io/netif.h | 115
> ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 115 insertions(+)
> ---
> % Staging grants for network I/O requests
> % Joao Martins <<joao.m.martins@xxxxxxxxxx>>
> % Revision 3
> 
> \clearpage
> 
> --------------------------------------------------------------------
> Architecture(s): Any
> --------------------------------------------------------------------
> 
> # Background and Motivation
> 
> At the Xen hackaton '16 networking session, we spoke about having a
> permanently
> mapped region to describe header/linear region of packet buffers. This
> document
> outlines the proposal covering motivation of this and applicability for other
> use-cases alongside the necessary changes. This proposal is an RFC and also
> includes alternative solutions.
> 
> The motivation of this work is to eliminate grant ops for packet I/O intensive
> workloads such as those observed with smaller requests size (i.e. <= 256
> bytes
> or <= MTU). Currently on Xen, only bulk transfer (e.g. 32K..64K packets) are
> the
> only ones performing really good (up to 80 Gbit/s in few CPUs), usually
> backing end-hosts and server appliances. Anything that involves higher
> packet
> rates (<= 1500 MTU) or without sg, performs badly almost like a 1 Gbit/s
> throughput.
> 
> # Proposal
> 
> The proposal is to leverage the already implicit copy from and to packet 
> linear
> data on netfront and netback, to be done instead from a permanently
> mapped
> region. In some (physical) NICs this is known as header/data split.
> 
> Specifically some workloads (e.g. NFV) it would provide a big increase in
> throughput when we switch to (zero)copying in the backend/frontend,
> instead of
> the grant hypercalls. Thus this extension aims at futureproofing the netif
> protocol by adding the possibility of guests setting up a list of grants that
> are set up at device creation and revoked at device freeing - without taking
> too much grant entries in account for the general case (i.e. to cover only the
> header region <= 256 bytes, 16 grants per ring) while configurable by kernel
> when one wants to resort to a copy-based as opposed to grant copy/map.
> 
> \clearpage
> 
> # General Operation
> 
> Here we describe how netback and netfront general operate, and where the
> proposed
> solution will fit. The security mechanism currently involves grants references
> which in essence are round-robin recycled 'tickets' stamped with the GPFNs,
> permission attributes, and the authorized domain:
> 
> (This is an in-memory view of struct grant_entry_v1):
> 
>      0     1     2     3     4     5     6     7 octet
>     +------------+-----------+------------------------+
>     | flags      | domain id | frame                  |
>     +------------+-----------+------------------------+
> 
> Where there are N grant entries in a grant table, for example:
> 
>     @0:
>     +------------+-----------+------------------------+
>     | rw         | 0         | 0xABCDEF               |
>     +------------+-----------+------------------------+
>     | rw         | 0         | 0xFA124                |
>     +------------+-----------+------------------------+
>     | ro         | 1         | 0xBEEF                 |
>     +------------+-----------+------------------------+
> 
>       .....
>     @N:
>     +------------+-----------+------------------------+
>     | rw         | 0         | 0x9923A                |
>     +------------+-----------+------------------------+
> 
> Each entry consumes 8 bytes, therefore 512 entries can fit on one page.
> The `gnttab_max_frames` which is a default of 32 pages. Hence 16,384
> grants. The ParaVirtualized (PV) drivers will use the grant reference (index
> in the grant table - 0 .. N) in their command ring.
> 
> \clearpage
> 
> ## Guest Transmit
> 
> The view of the shared transmit ring is the following:
> 
>      0     1     2     3     4     5     6     7 octet
>     +------------------------+------------------------+
>     | req_prod               | req_event              |
>     +------------------------+------------------------+
>     | rsp_prod               | rsp_event              |
>     +------------------------+------------------------+
>     | pvt                    | pad[44]                |
>     +------------------------+                        |
>     | ....                                            | [64bytes]
>     +------------------------+------------------------+-\
>     | gref                   | offset    | flags      | |
>     +------------+-----------+------------------------+ +-'struct
>     | id         | size      | id        | status     | | 
> netif_tx_sring_entry'
>     +-------------------------------------------------+-/
>     |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/| .. N
>     +-------------------------------------------------+
> 
> Each entry consumes 16 octets therefore 256 entries can fit on one
> page.`struct
> netif_tx_sring_entry` includes both `struct netif_tx_request` (first 12 
> octets)
> and `struct netif_tx_response` (last 4 octets).  Additionally a `struct
> netif_extra_info` may overlay the request in which case the format is:
> 
>     +------------------------+------------------------+-\
>     | type |flags| type specific data (gso, hash, etc)| |
>     +------------+-----------+------------------------+ +-'struct
>     | padding for tx         | unused                 | | netif_extra_info'
>     +-------------------------------------------------+-/
> 
> In essence the transmission of a packet in a from frontend to the backend
> network stack goes as following:
> 
> **Frontend**
> 
> 1) Calculate how many slots are needed for transmitting the packet.
>    Fail if there are aren't enough slots.
> 
> [ Calculation needs to estimate slots taking into account 4k page boundary ]
> 
> 2) Make first request for the packet.
>    The first request contains the whole packet size, checksum info,
>    flag whether it contains extra metadata, and if following slots contain
>    more data.
> 
> 3) Put grant in the `gref` field of the tx slot.
> 
> 4) Set extra info if packet requires special metadata (e.g. GSO size)
> 
> 5) If there's still data to be granted set flag `NETTXF_more_data` in
> request `flags`.
> 
> 6) Grant remaining packet pages one per slot. (grant boundary is 4k)
> 
> 7) Fill resultant grefs in the slots setting `NETTXF_more_data` for the N-1.
> 
> 8) Fill the total packet size in the first request.
> 
> 9) Set checksum info of the packet (if the chksum offload if supported)
> 
> 10) Update the request producer index (`req_prod`)
> 
> 11) Check whether backend needs a notification
> 
> 11.1) Perform hypercall `EVTCHNOP_send` which might mean a __VMEXIT__
>       depending on the guest type.
> 
> **Backend**
> 
> 12) Backend gets an interrupt and runs its interrupt service routine.
> 
> 13) Backend checks if there are unconsumed requests
> 
> 14) Backend consume a request from the ring
> 
> 15) Process extra info (e.g. if GSO info was set)
> 
> 16) Counts all requests for this packet to be processed (while
> `NETTXF_more_data` is set) and performs a few validation tests:
> 
> 16.1) Fail transmission if total packet size is smaller than Ethernet
> minimum allowed;
> 
>   Failing transmission means filling `id` of the request and
>   `status` of `NETIF_RSP_ERR` of `struct netif_tx_response`;
>   update rsp_prod and finally notify frontend (through `EVTCHNOP_send`).
> 
> 16.2) Fail transmission if one of the slots (size + offset) crosses the page
> boundary
> 
> 16.3) Fail transmission if number of slots are bigger than spec defined
> (18 slots max in netif.h)
> 
> 17) Allocate packet metadata
> 
> [ *Linux specific*: This structure emcompasses a linear data region which
> generally accomodates the protocol header and such. Netback allocates up
> to 128
> bytes for that. ]
> 
> 18) *Linux specific*: Setup up a `GNTTABOP_copy` to copy up to 128 bytes to
> this small
> region (linear part of the skb) *only* from the first slot.
> 
> 19) Setup GNTTABOP operations to copy/map the packet
> 
> 20) Perform the `GNTTABOP_copy` (grant copy) and/or
> `GNTTABOP_map_grant_ref`
>     hypercalls.
> 
> [ *Linux-specific*: does a copy for the linear region (<=128 bytes) and maps
> the
>          remaining slots as frags for the rest of the data ]
> 
> 21) Check if the grant operations were successful and fail transmission if
> any of the resultant operation `status` were different than `GNTST_okay`.
> 
> 21.1) If it's a grant copying backend, therefore produce responses for all the
> the copied grants like in 16.1). Only difference is that status is
> `NETIF_RSP_OKAY`.
> 
> 21.2) Update the response producer index (`rsp_prod`)
> 
> 22) Set up gso info requested by frontend [optional]
> 
> 23) Set frontend provided checksum info
> 
> 24) *Linux-specific*: Register destructor callback when packet pages are
> freed.
> 
> 25) Call into to the network stack.
> 
> 26) Update `req_event` to `request consumer index + 1` to receive a
> notification
>     on the first produced request from frontend.
>     [optional, if backend is polling the ring and never sleeps]
> 
> 27) *Linux-specific*: Packet destructor callback is called.
> 
> 27.1) Set up `GNTTABOP_unmap_grant_ref` ops for the designated packet
> pages.
> 
> 27.2) Once done, perform `GNTTABOP_unmap_grant_ref` hypercall.
> Underlying
> this hypercall a TLB flush of all backend vCPUS is done.
> 
> 27.3) Produce Tx response like step 21.1) and 21.2)
> 
> [*Linux-specific*: It contains a thread that is woken for this purpose. And
> it batch these unmap operations. The callback just queues another unmap.]
> 
> 27.4) Check whether frontend requested a notification
> 
> 27.4.1) If so, Perform hypercall `EVTCHNOP_send` which might mean a
> __VMEXIT__
>       depending on the guest type.
> 
> **Frontend**
> 
> 28) Transmit interrupt is raised which signals the packet transmission
> completion.
> 
> 29) Transmit completion routine checks for unconsumed responses
> 
> 30) Processes the responses and revokes the grants provided.
> 
> 31) Updates `rsp_cons` (request consumer index)
> 
> This proposal aims at removing steps 19) 20) 21) by using grefs previously
> mapped at guest request. Guest decides how to distribute or use these
> premapped
> grefs with either linear or full packet. This allows us to replace step 27)
> (the unmap) preventing the TLB flush.
> 
> Note that a grant copy does the following (in pseudo code):
> 
>       rcu_lock(src_domain);
>       rcu_lock(dst_domain);
> 
>       for (op = gntcopy[0]; op < nr_ops; op++) {
>               src_frame = __acquire_grant_for_copy(src_domain,
> <op.src.gref>);
>               ^ here implies a holding a potential contended per CPU lock
> on the
>                 remote grant table.
>               src_vaddr = map_domain_page(src_frame);
> 
>               dst_frame = __get_paged_frame(dst_domain,
> <op.dst.mfn>)
>               dst_vaddr = map_domain_page(dst_frame);
> 
>               memcpy(dst_vaddr + <op.dst.offset>,
>                       src_frame + <op.src.offset>,
>                       <op.size>);
> 
>               unmap_domain_page(src_frame);
>               unmap_domain_page(dst_frame);
> 
>       rcu_unlock(src_domain);
>       rcu_unlock(dst_domain);
> 
> Linux netback implementation copies the first 128 bytes into its network
> buffer
> linear region. Hence on the case of the first region it is replaced by a 
> memcpy
> on backend, as opposed to a grant copy.
> 
> \clearpage
> 
> ## Guest Receive
> 
> The view of the shared receive ring is the following:
> 
>      0     1     2     3     4     5     6     7 octet
>     +------------------------+------------------------+
>     | req_prod               | req_event              |
>     +------------------------+------------------------+
>     | rsp_prod               | rsp_event              |
>     +------------------------+------------------------+
>     | pvt                    | pad[44]                |
>     +------------------------+                        |
>     | ....                                            | [64bytes]
>     +------------------------+------------------------+
>     | id         | pad       | gref                   | ->'struct 
> netif_rx_request'
>     +------------+-----------+------------------------+
>     | id         | offset    | flags     | status     | ->'struct 
> netif_rx_response'
>     +-------------------------------------------------+
>     |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/| .. N
>     +-------------------------------------------------+
> 
> 
> Each entry in the ring occupies 16 octets which means a page fits 256 entries.
> Additionally a `struct netif_extra_info` may overlay the rx request in which
> case the format is:
> 
>     +------------------------+------------------------+
>     | type |flags| type specific data (gso, hash, etc)| ->'struct 
> netif_extra_info'
>     +------------+-----------+------------------------+
> 
> Notice the lack of padding, and that is because it's not used on Rx, as Rx
> request boundary is 8 octets.
> 
> In essence the steps for receiving of a packet in a Linux frontend is as
>  from backend to frontend network stack:
> 
> **Backend**
> 
> 1) Backend transmit function starts
> 
> [*Linux-specific*: It means we take a packet and add to an internal queue
>  (protected by a lock) whereas a separate thread takes it from that queue
> and
>  process the actual like the steps below. This thread has the purpose of
>  aggregating as much copies as possible.]
> 
> 2) Checks if there are enough rx ring slots that can accomodate the packet.
> 
> 3) Gets a request from the ring for the first data slot and fetches the `gref`
>    from it.
> 
> 4) Create grant copy op from packet page to `gref`.
> 
> [ It's up to the backend to choose how it fills this data. E.g. backend may
>   choose to merge as much as data from different pages into this single gref,
>   similar to mergeable rx buffers in vhost. ]
> 
> 5) Sets up flags/checksum info on first request.
> 
> 6) Gets a response from the ring for this data slot.
> 
> 7) Prefill expected response ring with the request `id` and slot size.
> 
> 8) Update the request consumer index (`req_cons`)
> 
> 9) Gets a request from the ring for the first extra info [optional]
> 
> 10) Sets up extra info (e.g. GSO descriptor) [optional] repeat step 8).
> 
> 11) Repeat steps 3 through 8 for all packet pages and set
> `NETRXF_more_data`
>    in the N-1 slot.
> 
> 12) Perform the `GNTTABOP_copy` hypercall.
> 
> 13) Check if the grant operations status was incorrect and if so set `status`
>     of the `struct netif_rx_response` field to NETIF_RSP_ERR.
> 
> 14) Update the response producer index (`rsp_prod`)
> 
> **Frontend**
> 
> 15) Frontend gets an interrupt and runs its interrupt service routine
> 
> 16) Checks if there's unconsumed responses
> 
> 17) Consumes a response from the ring (first response for a packet)
> 
> 18) Revoke the `gref` in the response
> 
> 19) Consumes extra info response [optional]
> 
> 20) While N-1 requests has `NETRXF_more_data`, then fetch each of
> responses
>     and revoke the designated `gref`.
> 
> 21) Update the response consumer index (`rsp_cons`)
> 
> 22) *Linux-specific*: Copy (from first slot gref) up to 256 bytes to the 
> linear
>     region of the packet metadata structure (skb). The rest of the pages
>     processed in the responses are then added as frags.
> 
> 23) Set checksum info based on first response flags.
> 
> 24) Call packet into the network stack.
> 
> 25) Allocate new pages and any necessary packet metadata strutures to new
>     requests. These requests will then be used in step 1) and so forth.
> 
> 26) Update the request producer index (`req_prod`)
> 
> 27) Check whether backend needs notification:
> 
> 27.1) If so, Perform hypercall `EVTCHNOP_send` which might mean a
> __VMEXIT__
>       depending on the guest type.
> 
> 28) Update `rsp_event` to `response consumer index + 1` such that frontend
>     receive a notification on the first newly produced response.
>     [optional, if frontend is polling the ring and never sleeps]
> 
> This proposal aims at replacing step 4), 12) and  22) with memcpy if the
> grefs on the Rx ring were requested to be mapped by the guest. Frontend
> may use
> strategies to allow fast recycling of grants for replinishing the ring,
> hence letting Domain-0 replace the grant copies with  memcpy instead,
> which is
> faster.
> 
> Depending on the implementation, it would mean that we no longer
> would need to aggregate as much as grant ops as possible (step 1) and could
> transmit the packet on the transmit function (e.g. Linux ```ndo_start_xmit```)
> as previously proposed
> here\[[0](http://lists.xenproject.org/archives/html/xen-devel/2015-
> 05/msg01504.html)\].
> This would heavily improve efficiency specifially for smaller packets. Which 
> in
> return would decrease RTT, having data being acknoledged much quicker.
> 
> \clearpage
> 
> # Proposed Extension
> 
> The idea is to allow guest more controllability on how its grants are mapped
> or
> not. Currently there's no control over it for frontends or backends, and 
> latter
> cannot make assumptions on the mapping transmit or receive grants, hence
> we
> need frontend to take initiative into managing its own mapping of grants.
> Guests may then opportunistically recycle these grants (e.g. Linux) and avoid
> resorting to copies which come when using a fixed amount of buffers. Other
> frameworks (e.g.  XDP, netmap, DPDK) use a fixed set of buffers which also
> makes the case for this extension.
> 
> ## Terminology
> 
> `staging grants` is a term used in this document to refer to the whole concept
> of having a set of grants permanently mapped with backend, containing data
> staging until completion. Therefore the term should not be confused with a
> new
> kind of grants on the hypervisor.
> 
> ## Control Ring Messages
> 
> ### `XEN_NETIF_CTRL_TYPE_GET_GREF_MAPPING_SIZE`
> 
> This message is sent by the frontend to fetch the number of grefs that can
> be kept mapped in the backend. It only receives the queue as argument, and
> data representing amount of free entries in the mapping table.
> 
> ### `XEN_NETIF_CTRL_TYPE_ADD_GREF_MAPPING`
> 
> This is sent by the frontend to map a list of grant references in the backend.
> It receives the queue index, the grant containing the list (offset is
> implicitly zero) and how many entries in the list. Each entry in this list
> has the following format:
> 
>           0     1     2     3     4     5     6     7  octet
>        +-----+-----+-----+-----+-----+-----+-----+-----+
>        | grant ref             |  flags    |  status   |
>        +-----+-----+-----+-----+-----+-----+-----+-----+
> 
>        grant ref: grant reference
>        flags: flags describing the control operation
>        status: XEN_NETIF_CTRL_STATUS_*
> 
> The list can have a maximum of 512 entries to be mapped at once.
> 
> ### `XEN_NETIF_CTRL_TYPE_DEL_GREF_MAPPING`
> 
> This is sent by the frontend for backend to unmap a list of grant
> references. The arguments are the same as
> `XEN_NETIF_CTRL_TYPE_ADD_GREF_MAPPING`,
> including the format of the list. However entries to be specified on the list
> can only refer to the ones previously added with
> `XEN_NETIF_CTRL_TYPE_ADD_GREF_MAPPING` and additionally these can
> not be
> inflight grant references in ring at the time the user has requested to unmap
> them.
> 
> ## Datapath Changes
> 
> Control ring is only available after backend state is `XenbusConnected`
> therefore only on this state change can the frontend query the total amount
> of
> maps it can keep. It then grants N entries per queue on both TX and RX ring
> which will create the underying backend gref -> page association (e.g.  stored
> in hash table). Frontend may wish to recycle these pregranted buffers or
> choose
> a copy approach to replace granting.
> 
> On steps 19) of Guest Transmit and 3) of Guest Receive, data gref is first
> looked up in this table and uses the underlying page if it already exists a
> mapping. On the successfull cases, steps 20) 21) and 27) of Guest Transmit
> are
> skipped, with 19) being replaced with a memcpy of up to 128 bytes. On Guest
> Receive, 4) 12) and 22) are replaced with memcpy instead of a grant copy.
> 
> Failing to obtain the total number of mappings
> (`XEN_NETIF_CTRL_TYPE_GET_GREF_MAPPING_SIZE`) means the guest falls
> back to the
> normal usage without pre granting buffers.
> 
> \clearpage
> 
> # Wire Performance
> 
> This section is a glossary meant to keep in mind numbers on the wire.
> 
> The minimum size that can fit in a single packet with size N is calculated as:
> 
>   Packet = Ethernet Header (14) + Protocol Data Unit (46 - 1500) = 60 bytes
> 
> In the wire it's a bit more:
> 
>   Preamble (7) + Start Frame Delimiter (1) + Packet + CRC (4) + Interframe gap
> (12) = 84 bytes
> 
> For given Link-speed in Bits/sec and Packet size, real packet rate is
>       calculated as:
> 
>   Rate = Link-speed / ((Preamble + Packet + CRC + Interframe gap) * 8)
> 
> Numbers to keep in mind (packet size excludes PHY layer, though packet
> rates
> disclosed by vendors take those into account, since it's what goes on the
> wire):
> 
> | Packet + CRC (bytes)   | 10 Gbit/s  |  40 Gbit/s |  100 Gbit/s  |
> |------------------------|:----------:|:----------:|:------------:|
> | 64                     | 14.88  Mpps|  59.52 Mpps|  148.80 Mpps |
> | 128                    |  8.44  Mpps|  33.78 Mpps|   84.46 Mpps |
> | 256                    |  4.52  Mpps|  18.11 Mpps|   45.29 Mpps |
> | 1500                   |   822  Kpps|   3.28 Mpps|    8.22 Mpps |
> | 65535                  |   ~19  Kpps|  76.27 Kpps|  190.68 Kpps |
> 
> Caption:  Mpps (Million packets per second) ; Kpps (Kilo packets per second)
> 
> \clearpage
> 
> # Performance
> 
> Numbers between a Linux v4.11 guest and another host connected by a 100
> Gbit/s
> NIC on a E5-2630 v4 2.2 GHz host to give an idea on the performance benefits
> of
> this extension. Please refer to this presentation[7] for a better overview of
> the results.
> 
> ( Numbers include protocol overhead )
> 
> **bulk transfer (Guest TX/RX)**
> 
>  Queues  Before (Gbit/s) After (Gbit/s)
>  ------  -------------   ------------
>  1queue  17244/6000      38189/28108
>  2queue  24023/9416      54783/40624
>  3queue  29148/17196     85777/54118
>  4queue  39782/18502     99530/46859
> 
> ( Guest -> Dom0 )
> 
> **Packet I/O (Guest TX/RX) in UDP 64b**
> 
>  Queues  Before (Mpps)  After (Mpps)
>  ------  -------------  ------------
>  1queue  0.684/0.439    2.49/2.96
>  2queue  0.953/0.755    4.74/5.07
>  4queue  1.890/1.390    8.80/9.92
> 
> \clearpage
> 
> # References
> 
> [0] http://lists.xenproject.org/archives/html/xen-devel/2015-
> 05/msg01504.html
> 
> [1]
> https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/netmap
> _mem2.c#L362
> 
> [2] https://www.freebsd.org/cgi/man.cgi?query=vale&sektion=4&n=1
> 
> [3] https://github.com/iovisor/bpf-
> docs/blob/master/Express_Data_Path.pdf
> 
> [4] http://prototype-
> kernel.readthedocs.io/en/latest/networking/XDP/design/requirements.htm
> l#write-access-to-packet-data
> 
> [5] http://lxr.free-
> electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L207
> 3
> 
> [6] http://lxr.free-
> electrons.com/source/drivers/net/ethernet/mellanox/mlx4/en_rx.c#L52
> 
> [7]
> https://schd.ws/hosted_files/xendeveloperanddesignsummit2017/e6/ToGr
> antOrNotToGrant-XDDS2017_v3.pdf
> 
> # History
> 
> A table of changes to the document, in chronological order.
> 
> ------------------------------------------------------------------------
> Date       Revision Version  Notes
> ---------- -------- -------- -------------------------------------------
> 2016-12-14 1        Xen 4.9  Initial version for RFC
> 
> 2017-09-01 2        Xen 4.10 Rework to use control ring
> 
>                              Trim down the specification
> 
>                              Added some performance numbers from the
>                              presentation
> 
> 2017-09-13 3        Xen 4.10 Addressed changes from Paul Durrant
> 
> ------------------------------------------------------------------------


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.