[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used


  • To: Matt Wilson <msw@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>
  • From: "Palagummi, Siva" <Siva.Palagummi@xxxxxx>
  • Date: Wed, 5 Dec 2012 11:56:32 +0000
  • Accept-language: en-US
  • Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Wed, 05 Dec 2012 11:56:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: Ac2F4HBfCFIrW+OHQRWCwjPvwOEhAgAd+rmAEvv3JQAAGcEEQA==
  • Thread-topic: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Matt,

> -----Original Message-----
> From: Matt Wilson [mailto:msw@xxxxxxxxxx]
> Sent: Wednesday, December 05, 2012 4:53 AM
> To: Ian Campbell; Palagummi, Siva
> Cc: xen-devel@xxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots
> properly when larger MTU sizes are used
> 
> On Thu, Aug 30, 2012 at 09:07:11AM +0100, Ian Campbell wrote:
> > On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote:
> > > This patch contains the modifications that are discussed in thread
> > > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html
> 
> [...]
> 
> > > Instead of using max_required_rx_slots, I used the count that we
> > > already have in hand to verify if we have enough room in the batch
> > > queue for next skb. Please let me know if that is not appropriate.
> > > Things worked fine in my environment. Under heavy load now we seems
> to
> > > be consuming most of the slots in the queue and no BUG_ON :-)
> >
> > > From: Siva Palagummi <Siva.Palagummi@xxxxxx>
> > >
> > > count variable in xen_netbk_rx_action need to be incremented
> > > correctly to take into account of extra slots required when
> skb_headlen is
> > > greater than PAGE_SIZE when larger MTU values are used. Without
> this change
> > > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback
> thread
> > > to exit.
> > >
> > > The fix is to stash the counting already done in
> xen_netbk_count_skb_slots
> > > in skb_cb_overlay and use it directly in xen_netbk_rx_action.
> 
> You don't need to stash the estimated value to use it in
> xen_netbk_rx_action() - you have the actual number of slots consumed
> in hand from the return value of netbk_gop_skb(). See below.
> 
> > > Also improved the checking for filling the batch queue.
> > >
> > > Also merged a change from a patch created for
> xen_netbk_count_skb_slots
> > > function as per thread
> > > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html
> > >
> > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network
> > >
> > >
> > > Signed-off-by: Siva Palagummi <Siva.Palagummi@xxxxxx>
> > > ---
> > > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> > > --- a/drivers/net/xen-netback/netback.c   2012-01-25
> 19:39:32.000000000 -0500
> > > +++ b/drivers/net/xen-netback/netback.c   2012-08-28
> 17:31:22.000000000 -0400
> > > @@ -80,6 +80,11 @@ union page_ext {
> > >   void *mapping;
> > >  };
> > >
> > > +struct skb_cb_overlay {
> > > + int meta_slots_used;
> > > + int count;
> > > +};
> 
> We don't actually need a separate variable for the estimate. We could
> rename meta_slots_used to meta_slots. It could hold the estimate
> before netbk_gop_skb() is called and the actual number of slots
> consumed after. That might be confusing, though, so maybe it's better
> off left as two variables.
> 
> > >  struct xen_netbk {
> > >   wait_queue_head_t wq;
> > >   struct task_struct *task;
> > > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s
> > >  {
> > >   unsigned int count;
> > >   int i, copy_off;
> > > + struct skb_cb_overlay *sco;
> > >
> > > - count = DIV_ROUND_UP(
> > > -                 offset_in_page(skb->data)+skb_headlen(skb),
> PAGE_SIZE);
> > > + count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> >
> > This hunk appears to be upstream already (see
> > e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working
> > against? You should either base patches on Linus' branch or on the
> > net-next branch.
> >
> > Other than this the patch looks good, thanks.
> 
> I don't think that this patch completely addresses problems
> calculating the number of slots required when large MTUs are used.
> 
> For example: if netback is handling a skb with a large linear data
> portion, say 8157 bytes, that begins at 64 bytes from the start of the
> page. Assume that GSO is disabled and there are no frags.
> xen_netbk_count_skb_slots() will calculate that two slots are needed:
> 
>     count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> 
> but netbk_gop_frag_copy() will actually use three. Let's walk through
> the loop:
You are right. The above chunk which is already part of the upstream is 
unfortunately incorrect for some cases. We also ran into issues in our 
environment around a week back and found this problem. The count will be 
different based on head len because of the optimization that 
start_new_rx_buffer is trying to do for large buffers.  A hole of size 
"offset_in_page" will be left in first page during copy if the remaining buffer 
size is >=PAG_SIZE. This subsequently affects the copy_off as well.


So xen_netbk_count_skb_slots actually needs a fix to calculate the count 
correctly based on head len. And also a fix to calculate the copy_off properly 
to which the data from fragments gets copied.

max_required_rx_slots also may require a fix to account the additional slot 
that may be required in case mtu >= PAG_SIZE. For worst case scenario atleast 
another +1.  One thing that is still puzzling here is, max_required_rx_slots 
seems to be assuming that linear length in head will never be greater than mtu 
size. But that doesn't seem to be the case all the time. I wonder if it 
requires some kind of fix there or special handling when count_skb_slots 
exceeds max_required_rx_slots.

> 
>         data = skb->data;
>         while (data < skb_tail_pointer(skb)) {
>                 unsigned int offset = offset_in_page(data);
>                 unsigned int len = PAGE_SIZE - offset;
> 
>                 if (data + len > skb_tail_pointer(skb))
>                         len = skb_tail_pointer(skb) - data;
> 
>                 netbk_gop_frag_copy(vif, skb, npo,
>                                     virt_to_page(data), len, offset,
> &head);
>                 data += len;
>         }
> 
> The first pass will call netbk_gop_frag_copy() with len=4032,
> offset=64, and head=1. After the call, head will be set to 0. Inside
> netbk_gop_frag_copy(), start_new_rx_buffer() will be called with
> offset=0, size=4032, head=1. We'll return false due to the checks for
> "offset" and "!head".
> 
> The second pass will call netbk_gop_frag_copy() with len=4096,
> offset=0, head=0. netbk_gop_frag_copy() will get called with
> offset=4032, bytes=4096, head=0. We'll return true here, which we
> shouldn't since it's just going to lead to buffer waste for the last
> bit.
> 
> The third pass will call with len=29 and offset=0.
> start_new_rx_buffer()
> will be called with offset=4096, bytes=29, head=0. We'll start a new
> buffer for the last bit.
> 
> So you can see that we underestimate the buffers / meta slots required
> to handle a skb with a large linear buffer, as we commonly have to
> handle with large MTU sizes. This can lead to problems later on.
> 
> I'm not as familiar with the new compound page frag handling code, but
> I can imagine that the same problem could exist there. But since the
> calculation logic in xen_netbk_count_skb_slots() directly models the
> code setting up the copy operation, at least it will be estimated
> properly.
> 
> > >
> > >   copy_off = skb_headlen(skb) % PAGE_SIZE;
> > >
> > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s
> > >                   size -= bytes;
> > >           }
> > >   }
> > > + sco = (struct skb_cb_overlay *)skb->cb;
> > > + sco->count = count;
> > >   return count;
> > >  }
> > >
> > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str
> > >   }
> > >  }
> > >
> > > -struct skb_cb_overlay {
> > > - int meta_slots_used;
> > > -};
> > >
> > >  static void xen_netbk_rx_action(struct xen_netbk *netbk)
> > >  {
> > > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x
> > >           sco = (struct skb_cb_overlay *)skb->cb;
> > >           sco->meta_slots_used = netbk_gop_skb(skb, &npo);
> > >
> > > -         count += nr_frags + 1;
> > > +         count += sco->count;
> 
> Why increment count by the /estimated/ count instead of the actual
> number of slots used? We have the number of slots in the line just
> above, in sco->meta_slots_used.
> 
Count actually refers to ring slots consumed rather than meta_slots used.  
Count can be different from meta_slots_used.

> > >           __skb_queue_tail(&rxq, skb);
> > >
> > > +         skb = skb_peek(&netbk->rx_queue);
> > > +         if (skb == NULL)
> > > +                 break;
> > > +         sco = (struct skb_cb_overlay *)skb->cb;
> > >           /* Filled the batch queue? */
> > > -         if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> > > +         if (count + sco->count >= XEN_NETIF_RX_RING_SIZE)
> > >                   break;
> > >   }
> > >
> 
> This change I like.
> 
> We're working on a patch to improve the buffer efficiency and the
> miscalculation problem. Siva, I'd be happy to re-base and re-submit
> this patch (with minor adjustments) as part of that work, unless you
> want to handle that.
> 
> Matt

Thanks!!  Please feel free to re-base and re-submit :-)

Siva


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.