Xen project Mailing List

Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation

To: Simon Kuenzer <simon.kuenzer@xxxxxxxxx>, minios-devel@xxxxxxxxxxxxx

From: Costin Lupu <costin.lupu@xxxxxxxxx>

Date: Fri, 21 Aug 2020 14:24:51 +0300

Cc: sharan.santhanam@xxxxxxxxx, Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>

Delivery-date: Fri, 21 Aug 2020 11:25:02 +0000

List-id: Mini-os development list <minios-devel.lists.xenproject.org>

On 8/21/20 1:49 PM, Simon Kuenzer wrote: > On 21.08.20 11:32, Costin Lupu wrote: >> On 8/20/20 6:49 PM, Simon Kuenzer wrote: >>> On 13.08.20 10:53, Costin Lupu wrote: >>>> Whenever a packet is transmitted, the request describing it is written >>>> in the >>>> shared ring. The packet itself is written in a page referenced by the >>>> request >>>> and shared, as well, with the backend. At the end of each transmit >>>> operation, a >>>> cleanup operation is performed free'ing the resources allocated for the >>>> previously transmitted packets. >>>> >>>> Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx> >>>> Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx> >>>> --- >>>> plat/xen/drivers/net/netfront.c | 102 >>>> ++++++++++++++++++++++++++++++++ >>>> 1 file changed, 102 insertions(+) >>>> >>>> diff --git a/plat/xen/drivers/net/netfront.c >>>> b/plat/xen/drivers/net/netfront.c >>>> index 858d124a..f2b81329 100644 >>>> --- a/plat/xen/drivers/net/netfront.c >>>> +++ b/plat/xen/drivers/net/netfront.c >>>> @@ -39,6 +39,7 @@ >>>> #include <uk/alloc.h> >>>> #include <uk/netdev_driver.h> >>>> #include <xen-x86/mm.h> >>>> +#include <xen-x86/irq.h> >>>> #include <xenbus/xenbus.h> >>>> #include "netfront.h" >>>> #include "netfront_xb.h" >>>> @@ -78,6 +79,106 @@ static uint16_t get_id_from_freelist(uint16_t >>>> *freelist) >>>> return id; >>>> } >>>> +static int network_tx_buf_gc(struct uk_netdev_tx_queue *txq) >>>> +{ >>>> + RING_IDX prod, cons; >>>> + netif_tx_response_t *tx_rsp; >>>> + uint16_t id; >>>> + bool more_to_do; >>>> + int count = 0; >>>> + >>>> + do { >>>> + prod = txq->ring.sring->rsp_prod; >>>> + rmb(); /* Ensure we see responses up to 'rp'. */ >>>> + >>>> + for (cons = txq->ring.rsp_cons; cons != prod; cons++) { >>>> + tx_rsp = RING_GET_RESPONSE(&txq->ring, cons); >>>> + >>>> + if (tx_rsp->status == NETIF_RSP_NULL) >>>> + continue; >>>> + >>>> + if (tx_rsp->status == NETIF_RSP_ERROR) >>>> + uk_pr_err("packet error\n"); >>> >>> "netdev%u: Transmission error on txq %u\n" ? >>> >> >> Ack. >> >>>> + >>>> + id = tx_rsp->id; >>>> + UK_ASSERT(id < NET_TX_RING_SIZE); >>>> + >>>> + gnttab_end_access(txq->gref[id]); >>>> + txq->gref[id] = GRANT_INVALID_REF; >>>> + >>>> + add_id_to_freelist(id, txq->freelist); >>>> + uk_semaphore_up(&txq->sem); >>>> + >>>> + count++; >>>> + } >>>> + >>>> + txq->ring.rsp_cons = prod; >>>> + >>>> + RING_FINAL_CHECK_FOR_RESPONSES(&txq->ring, more_to_do); >>>> + } while (more_to_do); >>>> + >>>> + return count; >>>> +} >>>> + >>>> +static int netfront_xmit(struct uk_netdev *n, >>>> + struct uk_netdev_tx_queue *txq, >>>> + struct uk_netbuf *pkt) >>>> +{ >>>> + struct netfront_dev *nfdev; >>>> + unsigned long flags; >>>> + uint16_t id; >>>> + RING_IDX req_prod; >>>> + netif_tx_request_t *tx_req; >>>> + int notify; >>>> + int status = 0, count; >>>> + >>>> + UK_ASSERT(n != NULL); >>>> + UK_ASSERT(txq != NULL); >>>> + UK_ASSERT(pkt != NULL); >>>> + UK_ASSERT(pkt->len < PAGE_SIZE); >>>> + >>>> + nfdev = to_netfront_dev(n); >>>> + >>>> + /* get request id */ >>>> + uk_semaphore_down(&txq->sem); >>>> + local_irq_save(flags); >>>> + id = get_id_from_freelist(txq->freelist); >>> >>> What happens when we are out of space and the ring is full? >>> The xmit function should be non-blocking so that we can use it in >>> polling mode. We should return in such a case that the transmission >>> queue is currently full... >>> >> >> Ack. >> >>>> + local_irq_restore(flags); >>>> + >>>> + /* get request */ >>>> + req_prod = txq->ring.req_prod_pvt; >>>> + tx_req = RING_GET_REQUEST(&txq->ring, req_prod); > + >>>> + /* setup grant for buffer data */ >>>> + txq->gref[id] = tx_req->gref = >>>> + gnttab_grant_access(nfdev->xendev->otherend_id, >>>> + virt_to_mfn(pkt->data), 1); >>>> + UK_ASSERT(tx_req->gref != GRANT_INVALID_REF); >>>> + >>>> + tx_req->offset = 0; >>>> + tx_req->size = (uint16_t) pkt->len; >>>> + tx_req->flags = 0; >>>> + tx_req->id = id; >>>> + >>>> + txq->ring.req_prod_pvt = req_prod + 1; >>>> + wmb(); /* Ensure backend sees requests */ >>>> + >>>> + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify); >>>> + if (notify) >>>> + notify_remote_via_evtchn(txq->evtchn); > + >>>> + status |= UK_NETDEV_STATUS_SUCCESS; >>>> + >>>> + /* some cleanup */ >>>> + local_irq_save(flags); >>>> + count = network_tx_buf_gc(txq); >>> >>> The clean-up should happen before enqueuing the given packet to the >>> transmit queue, it is the first thing of xmit(). Since we anyway need to >>> do it and we do not have an advantage by delaying this operation. >> >> Of course there is an advantage. You push the packets in order for them >> to be processed ASAP by the backend and deal with cleaning up >> afterwards. It's a kind of parallelization. Otherwise you would delay >> the transmissions. >> >>> We can also deal much better with full rings. Otherwise we >>> unnecessarily drop a packets for transmission although there would >>> have been space when clean-up finished. After calling cleaning you >>> can check if there is space on the ring and return if we run out of >>> it (use an `unlikely` to the condition so that we speed up the >>> successful sending case). >>> >> >> The only scenario when this would solve anything would be when you have >> the ring full and just before you decide to send a new packet the >> backend starts processing your previous transmitted packets, leaving you >> at least one free slot. There are 2 unlikely events here: first, the >> ring is full, and second, the backend starts to process again when the >> ring gets full (for the latter event, it's unlikely that the backend >> will wake up given that it couldn't process 256 transmitted packets, >> where 256 is the current number of slots). And for this very unlikely >> context you would introduce a delay for all the transmissions if we >> followed your suggestion. >> >> Anyway, both Linux and FreeBSD do the cleanup after pushing the >> requests, so I don't think we have much more to say about this. >> > > Hum... I double-checked with Intel DPDK's ixgbe (10 Gig) driver: > > http://git.dpdk.org/dpdk-stable/tree/drivers/net/ixgbe/ixgbe_rxtx.c?h=20.02#n230 > > > They clean the slots before sending. I derived my conclusion from there. > However I noticed now that there is one minor difference: They do both - > cleanup and enqueuing - before they notify the card about changes on the > ring. In that case it does not add any delay if you swap the order of > cleanup and enqueuing. In your case this is probably different because > you can cleanup without notifying. Here, I agree with you that we add > unnecessary xmit delays by swapping the order. > ixgbe is a native driver, so there is no backend driver in that case. It's a different scenario. > However and in general, having a full ring is not an uncommon and > unlikely case. To my experience, this can already happen with a single > TCP connection if your network stack is feeding next packets fast enough > and if the current TCP window allows to send them. It also depends on > when the driver domain is scheduled; on which CPU core your netfront > domain and the driver domain sit; and even on the underlying physical > device that might be busy with other traffic. > What I tried to say with the events probabilities explanation is that you add delay for the common case in order to remove delay for the much less probable corner case. > Another thing we are having in mind for uknetdev - and this is probably > different to Linux/BSD's implementation - we plan to change the xmit > function to do batching (as DPDK does). Sharan could push virtio-net > performance further with this. You are able to reduce the number of > notifies to the backend because you enqueue multiple packets at once and > due to locality of batching you can utilize CPU caches ways better. > > What about this idea for netfront: We could do first a check for > available space and - if needed - clean-up just as many extra slots that > you need to complete the transmission (for now this is anyways just one > slot). Then we proceed with enqueuing and do the cleanup of the rest of > the slots after we notified the backend. What do you think? > This will stay just as it is for the current patches. Feel free to change anything after upstreaming these patches. >> >> Btw, I'm waiting for reviews on the other patches as well before I send >> the v5. > > Yes, sure. > >> >> >>>> + local_irq_restore(flags); >>>> + >>>> + status |= likely(count > 0) ? UK_NETDEV_STATUS_MORE : 0x0; >>>> + >>>> + return status; >>>> +} >>>> + >>>> static int netfront_rxq_enqueue(struct uk_netdev_rx_queue *rxq, >>>> struct uk_netbuf *netbuf) >>>> { >>>> @@ -598,6 +699,7 @@ static int netfront_add_dev(struct xenbus_device >>>> *xendev) >>>> } >>>> /* register netdev */ >>>> + nfdev->netdev.tx_one = netfront_xmit; >>>> nfdev->netdev.ops = &netfront_ops; >>>> rc = uk_netdev_drv_register(&nfdev->netdev, drv_allocator, >>>> DRIVER_NAME); >>>> if (rc < 0) { >>>> >>> > > Thanks, > > Simon >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.