[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next v5 1/9] xen-netback: Introduce TX grant map definitions

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
From: Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>
Date: Tue, 18 Feb 2014 20:36:29 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, jonathan.davies@xxxxxxxxxx, wei.liu2@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx
Delivery-date: Tue, 18 Feb 2014 20:36:59 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 18/02/14 17:06, Ian Campbell wrote:

On Mon, 2014-01-20 at 21:24 +0000, Zoltan Kiss wrote:

This patch contains the new definitions necessary for grant mapping.


Is this just adding a bunch of (currently) unused functions? That's a
slightly odd way to structure a series. They don't seem to be "generic
helpers" or anything so it would be more normal to introduce these as
they get used -- it's a bit hard to review them out of context.

I've created two patches because they are quite huge even now,separately. Together they would be a ~500 line change. That was the bestI could figure out keeping in mind that bisect should work. But as Iwrote in the first email, I welcome other suggestions. If you and Weiprefer this two patch in one big one, I merge them in the next version.

v2:


This sort of intraversion changelog should go after the S-o-b and a
"---" marker. This way they are not included in the final commit
message.

Ok, I'll do that.

@@ -226,6 +248,12 @@ bool xenvif_rx_ring_slots_available(struct xenvif *vif, 
int needed);

  void xenvif_stop_queue(struct xenvif *vif);

+/* Callback from stack when TX packet can be released */
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+
+/* Unmap a pending page, usually has to be called before xenvif_idx_release */


"usually" or always? How does one determine when it is or isn't
appropriate to call it later?

If you haven't unmapped it before, then you have to call it. I'llclarify the comment

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 7669d49..f0f0c3d 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -38,6 +38,7 @@

  #include <xen/events.h>
  #include <asm/xen/hypercall.h>
+#include <xen/balloon.h>


What is this for?

For alloc/free_xenballooned_pages

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index bb241d0..195602f 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -773,6 +773,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
        return page;
  }

+static inline void xenvif_tx_create_gop(struct xenvif *vif,
+                                       u16 pending_idx,
+                                       struct xen_netif_tx_request *txp,
+                                       struct gnttab_map_grant_ref *gop)
+{
+       vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
+       gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
+                         GNTMAP_host_map | GNTMAP_readonly,
+                         txp->gref, vif->domid);
+
+       memcpy(&vif->pending_tx_info[pending_idx].req, txp,
+              sizeof(*txp));


Can this not go in xenvif_tx_build_gops? Or conversely should the
non-mapping code there be factored out?

Given the presence of both kinds of gop the name of this function needs
to be more specific I think.

It is called from tx_build_gop and get_requests, and the non-mappingcode will go away. I have a patch on top of this series which does grantcopy for the header part, but it doesn't create a separate function forthe single copy operation, and you'll still call this function frombuild_gops to handle the rest of the first slot (if any)

So TX will have only one kind of gop.

+}
+
  static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
                                               struct sk_buff *skb,
                                               struct xen_netif_tx_request *txp,
@@ -1612,6 +1626,107 @@ static int xenvif_tx_submit(struct xenvif *vif)
        return work_done;
  }

+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+{
+       unsigned long flags;
+       pending_ring_idx_t index;
+       u16 pending_idx = ubuf->desc;
+       struct pending_tx_info *temp =
+               container_of(ubuf, struct pending_tx_info, callback_struct);
+       struct xenvif *vif = container_of(temp - pending_idx,


This is subtracting a u16 from a pointer?

Yes. I moved this to an ubuf_to_vif helper for the next version of thepatch series

+                                         struct xenvif,
+                                         pending_tx_info[0]);
+
+       spin_lock_irqsave(&vif->dealloc_lock, flags);
+       do {
+               pending_idx = ubuf->desc;
+               ubuf = (struct ubuf_info *) ubuf->ctx;
+               index = pending_index(vif->dealloc_prod);
+               vif->dealloc_ring[index] = pending_idx;
+               /* Sync with xenvif_tx_dealloc_action:
+                * insert idx then incr producer.
+                */
+               smp_wmb();


Is this really needed given that there is a lock held?

Yes, as the comment right above explains. This actually comes fromclassic kernel's netif_idx_release


Or what is dealloc_lock protecting against?

The callbacks from each other. So it is checjed only in this function.

+               vif->dealloc_prod++;


What happens if the dealloc ring becomes full, will this wrap and cause
havoc?

Nope, if the dealloc ring is full, the value of the last increment won'tbe used to index the dealloc ring again until some space made available.Of course if something broke and we have more pending slots than tx ringor dealloc slots then it can happen. Do you suggest aBUG_ON(vif->dealloc_prod - vif->dealloc_cons >= MAX_PENDING_REQS)?

+       } while (ubuf);
+       wake_up(&vif->dealloc_wq);
+       spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+}
+
+static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
+{
+       struct gnttab_unmap_grant_ref *gop;
+       pending_ring_idx_t dc, dp;
+       u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
+       unsigned int i = 0;
+
+       dc = vif->dealloc_cons;
+       gop = vif->tx_unmap_ops;
+
+       /* Free up any grants we have finished using */
+       do {
+               dp = vif->dealloc_prod;
+
+               /* Ensure we see all indices enqueued by all
+                * xenvif_zerocopy_callback().
+                */
+               smp_rmb();
+
+               while (dc != dp) {
+                       pending_idx =
+                               vif->dealloc_ring[pending_index(dc++)];
+
+                       /* Already unmapped? */
+                       if (vif->grant_tx_handle[pending_idx] ==
+                               NETBACK_INVALID_HANDLE) {
+                               netdev_err(vif->dev,
+                                          "Trying to unmap invalid handle! "
+                                          "pending_idx: %x\n", pending_idx);
+                               BUG();
+                       }
+
+                       pending_idx_release[gop-vif->tx_unmap_ops] =
+                               pending_idx;
+                       vif->pages_to_unmap[gop-vif->tx_unmap_ops] =
+                               vif->mmap_pages[pending_idx];
+                       gnttab_set_unmap_op(gop,
+                                           idx_to_kaddr(vif, pending_idx),
+                                           GNTMAP_host_map,
+                                           vif->grant_tx_handle[pending_idx]);
+                       vif->grant_tx_handle[pending_idx] =
+                               NETBACK_INVALID_HANDLE;
+                       ++gop;


Can we run out of space in the gop array?

No, unless the same thing happen as at my previous answer. BUG_ON() hereas well?

+               }
+
+       } while (dp != vif->dealloc_prod);
+
+       vif->dealloc_cons = dc;


No barrier here?

dealloc_cons only used in the dealloc_thread. dealloc_prod is used bythe callback and the thread as well, that's why we need mb() inprevious. Btw. this function comes from classic's net_tx_action_dealloc

+       if (gop - vif->tx_unmap_ops > 0) {
+               int ret;
+               ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+                                       vif->pages_to_unmap,
+                                       gop - vif->tx_unmap_ops);
+               if (ret) {
+                       netdev_err(vif->dev, "Unmap fail: nr_ops %x ret %d\n",
+                                  gop - vif->tx_unmap_ops, ret);
+                       for (i = 0; i < gop - vif->tx_unmap_ops; ++i) {


This seems liable to be a lot of spew on failure. Perhaps only log the
ones where gop[i].status != success.

Ok, I'll change that.


Have you considered whether or not the frontend can force this error to
occur?

Not yet, good point. I guess if we successfully mapped the page, thenthere is no way a frontend to prevent unmapping. But worth further checking.

+                               netdev_err(vif->dev,
+                                          " host_addr: %llx handle: %x status: 
%d\n",
+                                          gop[i].host_addr,
+                                          gop[i].handle,
+                                          gop[i].status);
+                       }
+                       BUG();
+               }
+       }
+
+       for (i = 0; i < gop - vif->tx_unmap_ops; ++i)
+               xenvif_idx_release(vif, pending_idx_release[i],
+                                  XEN_NETIF_RSP_OKAY);
+}
+
+
  /* Called after netfront has transmitted */
  int xenvif_tx_action(struct xenvif *vif, int budget)
  {
@@ -1678,6 +1793,25 @@ static void xenvif_idx_release(struct xenvif *vif, u16 
pending_idx,
        vif->mmap_pages[pending_idx] = NULL;
  }

+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)


This is a single shot version of the batched xenvif_tx_dealloc_action
version? Why not just enqueue the idx to be unmapped later?

This is called only from the NAPI instance. Using the dealloc ringrequire synchronization with the callback which can increase lockcontention. On the other hand, if the guest sends small packets(<PAGE_SIZE), the TLB flushing can cause performance penalty. The abovementioned upcoming patch which gntcopy the header can prevent that(together with Malcolm's Xen side patch, which prevents TLB flush if thepage were not touched in Dom0)

@@ -1826,6 +1965,28 @@ int xenvif_kthread(void *data)
        return 0;
  }

+int xenvif_dealloc_kthread(void *data)


Is this going to be a thread per vif?

Yes. In the first versions I've put the dealloc in the NAPI instance(similarly as in classic, where it happened in tx_action), but that hadan unexpected performance penalty: the callback has to notify whoeverdoes the dealloc, that there is something to do. If it is the NAPIinstance, it has to call napi_schedule. But if the packet were deliveredto an another guest, the callback is called from thread context, andaccording to Eric Dumazet, napi_schedule from thread context cansignificantly delay softirq handling. So NAPI instance were delayed withmiliseconds, and it caused terrible performance.Moving this to the RX thread haven't seemed like a wise decision, so Imade a new thread.Actually in the next version of the patches I'll reintroduce__napi_schedule in the callback again, because if the NAPI instancestill have unconsumed requests but not enough pending slots, itdeschedule itself, and the callback has to schedule it again, if:

- unconsumed requests in the ring < XEN_NETBK_LEGACY_SLOTS_MAX
- there are enough free pending slots to handle them
- and the NAPI instance is not scheduled yet

This should really happen if netback is faster than target devices, butthen it doesn't mean a bottleneck.


Zoli


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH net-next v5 1/9] xen-netback: Introduce TX grant map definitions
  - From: Ian Campbell

References:
- Re: [Xen-devel] [PATCH net-next v5 1/9] xen-netback: Introduce TX grant map definitions
  - From: Ian Campbell

Prev by Date: Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces
Next by Date: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge
Previous by thread: Re: [Xen-devel] [PATCH net-next v5 1/9] xen-netback: Introduce TX grant map definitions
Next by thread: Re: [Xen-devel] [PATCH net-next v5 1/9] xen-netback: Introduce TX grant map definitions
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.