[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation

To: Simon Kuenzer <simon.kuenzer@xxxxxxxxx>, Costin Lupu <costin.lupu@xxxxxxxxx>, <minios-devel@xxxxxxxxxxxxx>
From: Sharan Santhanam <sharan.santhanam@xxxxxxxxx>
Date: Thu, 22 Oct 2020 15:38:23 +0200
Cc: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
Delivery-date: Thu, 22 Oct 2020 13:39:00 +0000
List-id: Mini-os development list <minios-devel.lists.xenproject.org>

Hello Costin,

Please find the review comments inline:

Thanks & Regards

Sharan

On 8/21/20 3:04 PM, Simon Kuenzer wrote:

On 21.08.20 13:24, Costin Lupu wrote:

On 8/21/20 1:49 PM, Simon Kuenzer wrote:

On 21.08.20 11:32, Costin Lupu wrote:

On 8/20/20 6:49 PM, Simon Kuenzer wrote:

On 13.08.20 10:53, Costin Lupu wrote:

Whenever a packet is transmitted, the request describing it iswritten

in the

shared ring. The packet itself is written in a page referenced bythe

request
and shared, as well, with the backend. At the end of each transmit
operation, a

cleanup operation is performed free'ing the resources allocatedfor the

previously transmitted packets.

Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx>
Signed-off-by: Razvan Cojocaru <razvan.cojocaru93@xxxxxxxxx>
---
    plat/xen/drivers/net/netfront.c | 102
++++++++++++++++++++++++++++++++
    1 file changed, 102 insertions(+)

diff --git a/plat/xen/drivers/net/netfront.c
b/plat/xen/drivers/net/netfront.c
index 858d124a..f2b81329 100644
--- a/plat/xen/drivers/net/netfront.c
+++ b/plat/xen/drivers/net/netfront.c
@@ -39,6 +39,7 @@
    #include <uk/alloc.h>
    #include <uk/netdev_driver.h>
    #include <xen-x86/mm.h>
+#include <xen-x86/irq.h>
    #include <xenbus/xenbus.h>
    #include "netfront.h"
    #include "netfront_xb.h"
@@ -78,6 +79,106 @@ static uint16_t get_id_from_freelist(uint16_t
*freelist)
        return id;
    }
    +static int network_tx_buf_gc(struct uk_netdev_tx_queue *txq)
+{
+    RING_IDX prod, cons;
+    netif_tx_response_t *tx_rsp;
+    uint16_t id;
+    bool more_to_do;
+    int count = 0;
+
+    do {
+        prod = txq->ring.sring->rsp_prod;
+        rmb(); /* Ensure we see responses up to 'rp'. */
+
+        for (cons = txq->ring.rsp_cons; cons != prod; cons++) {
+            tx_rsp = RING_GET_RESPONSE(&txq->ring, cons);
+
+            if (tx_rsp->status == NETIF_RSP_NULL)
+                continue;
+
+            if (tx_rsp->status == NETIF_RSP_ERROR)
+                uk_pr_err("packet error\n");


"netdev%u: Transmission error on txq %u\n" ?


Ack.

+
+            id  = tx_rsp->id;
+            UK_ASSERT(id < NET_TX_RING_SIZE);
+
+            gnttab_end_access(txq->gref[id]);
+            txq->gref[id] = GRANT_INVALID_REF;
+
+            add_id_to_freelist(id, txq->freelist);
+            uk_semaphore_up(&txq->sem);

Shouldn't we free up the netbuf we receive from the ring?

+
+            count++;
+        }
+
+        txq->ring.rsp_cons = prod;
+
+ RING_FINAL_CHECK_FOR_RESPONSES(&txq->ring, more_to_do);
+    } while (more_to_do);
+
+    return count;
+}
+
+static int netfront_xmit(struct uk_netdev *n,
+        struct uk_netdev_tx_queue *txq,
+        struct uk_netbuf *pkt)
+{
+    struct netfront_dev *nfdev;
+    unsigned long flags;
+    uint16_t id;
+    RING_IDX req_prod;
+    netif_tx_request_t *tx_req;
+    int notify;
+    int status = 0, count;
+
+    UK_ASSERT(n != NULL);
+    UK_ASSERT(txq != NULL);
+    UK_ASSERT(pkt != NULL);
+    UK_ASSERT(pkt->len < PAGE_SIZE);
+
+    nfdev = to_netfront_dev(n);
+
+    /* get request id */
+    uk_semaphore_down(&txq->sem);
+    local_irq_save(flags);
+    id = get_id_from_freelist(txq->freelist);


What happens when we are out of space and the ring is full?
The xmit function should be non-blocking so that we can use it in
polling mode. We should return in such a case that the transmission
queue is currently full...


Ack.

+    local_irq_restore(flags);
+
+    /* get request */
+    req_prod = txq->ring.req_prod_pvt;
+    tx_req = RING_GET_REQUEST(&txq->ring, req_prod); > +
+    /* setup grant for buffer data */
+    txq->gref[id] = tx_req->gref =
+ gnttab_grant_access(nfdev->xendev->otherend_id,
+            virt_to_mfn(pkt->data), 1);
+    UK_ASSERT(tx_req->gref != GRANT_INVALID_REF);
+
+    tx_req->offset = 0;
+    tx_req->size = (uint16_t) pkt->len;
+    tx_req->flags = 0;
+    tx_req->id = id;
+
+    txq->ring.req_prod_pvt = req_prod + 1;
+    wmb(); /* Ensure backend sees requests */
+
+ RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify);
+    if (notify)
+        notify_remote_via_evtchn(txq->evtchn); > +
+    status |= UK_NETDEV_STATUS_SUCCESS;
+
+    /* some cleanup */
+    local_irq_save(flags);
+    count = network_tx_buf_gc(txq);


The clean-up should happen before enqueuing the given packet to the

transmit queue, it is the first thing of xmit(). Since we anywayneed to

do it and we do not have an advantage by delaying this operation.

Of course there is an advantage. You push the packets in order forthem

to be processed ASAP by the backend and deal with cleaning up
afterwards. It's a kind of parallelization. Otherwise you would delay
the transmissions.

We can also deal much better with full rings. Otherwise we
unnecessarily drop a packets for transmission although there would
have been space when clean-up finished. After calling cleaning you
can check if there is space on the ring and return if we run out of
it (use an `unlikely` to the condition so that we speed up the
successful sending case).

The only scenario when this would solve anything would be when youhave

the ring full and just before you decide to send a new packet the

backend starts processing your previous transmitted packets,leaving you

at least one free slot. There are 2 unlikely events here: first, the
ring is full, and second, the backend starts to process again when the
ring gets full (for the latter event, it's unlikely that the backend
will wake up given that it couldn't process 256 transmitted packets,
where 256 is the current number of slots). And for this very unlikely
context you would introduce a delay for all the transmissions if we
followed your suggestion.

Anyway, both Linux and FreeBSD do the cleanup after pushing the
requests, so I don't think we have much more to say about this.


Hum... I double-checked with Intel DPDK's ixgbe (10 Gig) driver:

http://git.dpdk.org/dpdk-stable/tree/drivers/net/ixgbe/ixgbe_rxtx.c?h=20.02#n230

They clean the slots before sending. I derived my conclusion fromthere.However I noticed now that there is one minor difference: They doboth -cleanup and enqueuing - before they notify the card about changes onthe

ring. In that case it does not add any delay if you swap the order of
cleanup and enqueuing. In your case this is probably different because
you can cleanup without notifying. Here, I agree with you that we add
unnecessary xmit delays by swapping the order.


ixgbe is a native driver, so there is no backend driver in that case.
It's a different scenario.

You could consider the card itself as a backend driver, but fine.There is a ABI difference between ixgbe and netback.

However and in general, having a full ring is not an uncommon and
unlikely case. To my experience, this can already happen with a single

TCP connection if your network stack is feeding next packets fastenough

and if the current TCP window allows to send them. It also depends on
when the driver domain is scheduled; on which CPU core your netfront
domain and the driver domain sit; and even on the underlying physical
device that might be busy with other traffic.


What I tried to say with the events probabilities explanation is that
you add delay for the common case in order to remove delay for the much
less probable corner case.


I got that and agreed.

Another thing we are having in mind for uknetdev - and this is probably
different to Linux/BSD's implementation - we plan to change the xmit
function to do batching (as DPDK does). Sharan could push virtio-net
performance further with this. You are able to reduce the number of

notifies to the backend because you enqueue multiple packets at onceand

due to locality of batching you can utilize CPU caches ways better.

What about this idea for netfront: We could do first a check for

available space and - if needed - clean-up just as many extra slotsthat

you need to complete the transmission (for now this is anyways just one
slot). Then we proceed with enqueuing and do the cleanup of the rest of
the slots after we notified the backend. What do you think?


This will stay just as it is for the current patches. Feel free to
change anything after upstreaming these patches.

No, it won't stay like this. At least, you have to call clean-up whenthe ring is full before you leave the non-blocking function. Otherwiseyou end up in never getting space back on a full ring, but I thinkthis is clear.

Btw, I'm waiting for reviews on the other patches as well before Isend
the v5.


Yes, sure.

+    local_irq_restore(flags);
+
+    status |= likely(count > 0) ? UK_NETDEV_STATUS_MORE : 0x0;
+
+    return status;
+}
+
    static int netfront_rxq_enqueue(struct uk_netdev_rx_queue *rxq,
            struct uk_netbuf *netbuf)
    {
@@ -598,6 +699,7 @@ static int netfront_add_dev(struct xenbus_device
*xendev)
        }
          /* register netdev */
+    nfdev->netdev.tx_one = netfront_xmit;
        nfdev->netdev.ops = &netfront_ops;
        rc = uk_netdev_drv_register(&nfdev->netdev, drv_allocator,
DRIVER_NAME);
        if (rc < 0) {


Thanks,

Simon

Follow-Ups:
- Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation
  - From: Costin Lupu

Prev by Date: Re: [UNIKRAFT PATCH v4 09/12] plat/xen/drivers/net: Start netfront device
Next by Date: Re: [UNIKRAFT PATCH v4 11/12] plat/xen/drivers/net: Add receive operation
Previous by thread: Re: [UNIKRAFT PATCH v4 09/12] plat/xen/drivers/net: Start netfront device
Next by thread: Re: [UNIKRAFT PATCH v4 10/12] plat/xen/drivers/net: Add transmit operation
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.