Xen project Mailing List

Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

To: Matt Wilson <msw@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: "Palagummi, Siva" <Siva.Palagummi@xxxxxx>

Date: Wed, 5 Dec 2012 11:56:32 +0000

Accept-language: en-US

Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 05 Dec 2012 11:56:57 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac2F4HBfCFIrW+OHQRWCwjPvwOEhAgAd+rmAEvv3JQAAGcEEQA==

Thread-topic: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Matt, > -----Original Message----- > From: Matt Wilson [mailto:msw@xxxxxxxxxx] > Sent: Wednesday, December 05, 2012 4:53 AM > To: Ian Campbell; Palagummi, Siva > Cc: xen-devel@xxxxxxxxxxxxx > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots > properly when larger MTU sizes are used > > On Thu, Aug 30, 2012 at 09:07:11AM +0100, Ian Campbell wrote: > > On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote: > > > This patch contains the modifications that are discussed in thread > > > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html > > [...] > > > > Instead of using max_required_rx_slots, I used the count that we > > > already have in hand to verify if we have enough room in the batch > > > queue for next skb. Please let me know if that is not appropriate. > > > Things worked fine in my environment. Under heavy load now we seems > to > > > be consuming most of the slots in the queue and no BUG_ON :-) > > > > > From: Siva Palagummi <Siva.Palagummi@xxxxxx> > > > > > > count variable in xen_netbk_rx_action need to be incremented > > > correctly to take into account of extra slots required when > skb_headlen is > > > greater than PAGE_SIZE when larger MTU values are used. Without > this change > > > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback > thread > > > to exit. > > > > > > The fix is to stash the counting already done in > xen_netbk_count_skb_slots > > > in skb_cb_overlay and use it directly in xen_netbk_rx_action. > > You don't need to stash the estimated value to use it in > xen_netbk_rx_action() - you have the actual number of slots consumed > in hand from the return value of netbk_gop_skb(). See below. > > > > Also improved the checking for filling the batch queue. > > > > > > Also merged a change from a patch created for > xen_netbk_count_skb_slots > > > function as per thread > > > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html > > > > > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network > > > > > > > > > Signed-off-by: Siva Palagummi <Siva.Palagummi@xxxxxx> > > > --- > > > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > netback/netback.c > > > --- a/drivers/net/xen-netback/netback.c 2012-01-25 > 19:39:32.000000000 -0500 > > > +++ b/drivers/net/xen-netback/netback.c 2012-08-28 > 17:31:22.000000000 -0400 > > > @@ -80,6 +80,11 @@ union page_ext { > > > void *mapping; > > > }; > > > > > > +struct skb_cb_overlay { > > > + int meta_slots_used; > > > + int count; > > > +}; > > We don't actually need a separate variable for the estimate. We could > rename meta_slots_used to meta_slots. It could hold the estimate > before netbk_gop_skb() is called and the actual number of slots > consumed after. That might be confusing, though, so maybe it's better > off left as two variables. > > > > struct xen_netbk { > > > wait_queue_head_t wq; > > > struct task_struct *task; > > > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s > > > { > > > unsigned int count; > > > int i, copy_off; > > > + struct skb_cb_overlay *sco; > > > > > > - count = DIV_ROUND_UP( > > > - offset_in_page(skb->data)+skb_headlen(skb), > PAGE_SIZE); > > > + count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > > > This hunk appears to be upstream already (see > > e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working > > against? You should either base patches on Linus' branch or on the > > net-next branch. > > > > Other than this the patch looks good, thanks. > > I don't think that this patch completely addresses problems > calculating the number of slots required when large MTUs are used. > > For example: if netback is handling a skb with a large linear data > portion, say 8157 bytes, that begins at 64 bytes from the start of the > page. Assume that GSO is disabled and there are no frags. > xen_netbk_count_skb_slots() will calculate that two slots are needed: > > count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > but netbk_gop_frag_copy() will actually use three. Let's walk through > the loop: You are right. The above chunk which is already part of the upstream is unfortunately incorrect for some cases. We also ran into issues in our environment around a week back and found this problem. The count will be different based on head len because of the optimization that start_new_rx_buffer is trying to do for large buffers. A hole of size "offset_in_page" will be left in first page during copy if the remaining buffer size is >=PAG_SIZE. This subsequently affects the copy_off as well. So xen_netbk_count_skb_slots actually needs a fix to calculate the count correctly based on head len. And also a fix to calculate the copy_off properly to which the data from fragments gets copied. max_required_rx_slots also may require a fix to account the additional slot that may be required in case mtu >= PAG_SIZE. For worst case scenario atleast another +1. One thing that is still puzzling here is, max_required_rx_slots seems to be assuming that linear length in head will never be greater than mtu size. But that doesn't seem to be the case all the time. I wonder if it requires some kind of fix there or special handling when count_skb_slots exceeds max_required_rx_slots. > > data = skb->data; > while (data < skb_tail_pointer(skb)) { > unsigned int offset = offset_in_page(data); > unsigned int len = PAGE_SIZE - offset; > > if (data + len > skb_tail_pointer(skb)) > len = skb_tail_pointer(skb) - data; > > netbk_gop_frag_copy(vif, skb, npo, > virt_to_page(data), len, offset, > &head); > data += len; > } > > The first pass will call netbk_gop_frag_copy() with len=4032, > offset=64, and head=1. After the call, head will be set to 0. Inside > netbk_gop_frag_copy(), start_new_rx_buffer() will be called with > offset=0, size=4032, head=1. We'll return false due to the checks for > "offset" and "!head". > > The second pass will call netbk_gop_frag_copy() with len=4096, > offset=0, head=0. netbk_gop_frag_copy() will get called with > offset=4032, bytes=4096, head=0. We'll return true here, which we > shouldn't since it's just going to lead to buffer waste for the last > bit. > > The third pass will call with len=29 and offset=0. > start_new_rx_buffer() > will be called with offset=4096, bytes=29, head=0. We'll start a new > buffer for the last bit. > > So you can see that we underestimate the buffers / meta slots required > to handle a skb with a large linear buffer, as we commonly have to > handle with large MTU sizes. This can lead to problems later on. > > I'm not as familiar with the new compound page frag handling code, but > I can imagine that the same problem could exist there. But since the > calculation logic in xen_netbk_count_skb_slots() directly models the > code setting up the copy operation, at least it will be estimated > properly. > > > > > > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > > > > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s > > > size -= bytes; > > > } > > > } > > > + sco = (struct skb_cb_overlay *)skb->cb; > > > + sco->count = count; > > > return count; > > > } > > > > > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str > > > } > > > } > > > > > > -struct skb_cb_overlay { > > > - int meta_slots_used; > > > -}; > > > > > > static void xen_netbk_rx_action(struct xen_netbk *netbk) > > > { > > > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x > > > sco = (struct skb_cb_overlay *)skb->cb; > > > sco->meta_slots_used = netbk_gop_skb(skb, &npo); > > > > > > - count += nr_frags + 1; > > > + count += sco->count; > > Why increment count by the /estimated/ count instead of the actual > number of slots used? We have the number of slots in the line just > above, in sco->meta_slots_used. > Count actually refers to ring slots consumed rather than meta_slots used. Count can be different from meta_slots_used. > > > __skb_queue_tail(&rxq, skb); > > > > > > + skb = skb_peek(&netbk->rx_queue); > > > + if (skb == NULL) > > > + break; > > > + sco = (struct skb_cb_overlay *)skb->cb; > > > /* Filled the batch queue? */ > > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > > > break; > > > } > > > > > This change I like. > > We're working on a patch to improve the buffer efficiency and the > miscalculation problem. Siva, I'd be happy to re-base and re-submit > this patch (with minor adjustments) as part of that work, unless you > want to handle that. > > Matt Thanks!! Please feel free to re-base and re-submit :-) Siva _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.