[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel 3.7.0-pre-rc1 kernel BUG at drivers/net/xen-netback/netback.c:405 RIP: e030:[<ffffffff814714f9>] [<ffffffff814714f9>] netbk_gop_frag_copy+0x379/0x380



On Mon, 2012-10-08 at 00:34 +0100, Konrad Rzeszutek Wilk wrote:
> On Sat, Oct 06, 2012 at 12:20:54AM +0200, Sander Eikelenboom wrote:
> > 
> > Friday, October 5, 2012, 9:26:31 PM, you wrote:
> > 
> > > Sorry for top posting - on mobile.
> > 
> > > I saw it too yesterday but only on a specific hardware - AMD FX8. What 
> > > type of CPU do you have?  Does xsave=off on Xen line help?
> > 
> > Nope the xsave=off doesn't help
> > 
> > > Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote:
> > 
> > >>Hi Konrad,
> > >>
> > >>Just tested kernel 3.7.0-pre-rc1 but ran into a oops in netback on boot 
> > >>after starting some guests:
> > >>
> > >>[  402.723915] ------------[ cut here ]------------
> > >>[  402.734629] kernel BUG at drivers/net/xen-netback/netback.c:405!
> 
> Looking at the code, this is what we get:
> 
>         /* Data must not cross a page boundary. */
>         BUG_ON(size + offset > PAGE_SIZE);
> 
> Looking at the commits, the one recently added was:
> commit c571898ffc24a1768e1b2dabeac0fc7dd4c14601
> Author: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
> Date:   Fri Sep 14 14:26:59 2012 +0000
> 
>     xen/gndev: Xen backend support for paged out grant targets V4.
>     
> 
> But after reverting it and trying the kernel I still got the crash.
> 
> So .. the weirdness is that this seems to be only happening on
> certain AMD machines - for example on my AMD A8 box I did not see this.

I took a look at this last week and can't repro.

The code which calls this function is supposed to ensure that the buffer
doesn't cross a page boundary.

There are two places which call it, one is looping over the skb's frags,
which just can't cross page boundaries and if they did it would be
breaking left and right for everyone AFAICT (although I'm very behind on
my LKML and netdev reading, so maybe it is ;-)).

The other case is processing the SKB's linear data area, which can cross
a page boundary but the code loops over it and processes it in chunks
which fit in single pages. I was suspicious of this code so I pulled it
out into a little userspace test harness and fed it some corner cases
but it looked like it was doing the right thing.

I speculated that this might be NIC rather than processor related
(perhaps there's some weak correlation between certain NICs and certain
processor manufacturers).

Konrad seems to have an r8169 but the module list wasn't in Sander's
output -- do you know what you have?

> I fear that the next step is to do a bit off git bisection to
> get an idea of which merge it might be. I am going to be AFK
> on Monday so I won't get to this until Tuesday/Wednesay :-(
> 
> .. Thought to help speed this process, this looks like a
> candidate:
> 
> commit 229993001282e128a49a59ec43d255614775703a
> Merge: 7687b80 fd0f586
> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date:   Mon Oct 1 11:13:33 2012 -0700
> 
>     Merge branch 'x86-mm-for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>     
>     Pull x86/mm changes from Ingo Molnar:
>      "The biggest change is new TLB partial flushing code for AMD CPUs.
>       (The v3.6 kernel had the Intel CPU side code, see commits
>       e0ba94f14f74..effee4b9b3b.)

Would be interesting to try although I don't think anything in this area
is actively messing with page table mappings (that happens later, and
doesn't effect the non-data bits of the skb like the sizes and offsets).

Perhaps this debug patch might shed some light? PG_compound or THP might
be an interesting case?

Ian.

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 05593d8..ca4c47d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -386,7 +386,7 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct 
xenvif *vif,
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
  */
-static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
+static int netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
                                struct netrx_pending_operations *npo,
                                struct page *page, unsigned long size,
                                unsigned long offset, int *head)
@@ -402,7 +402,8 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct 
sk_buff *skb,
        unsigned long bytes;
 
        /* Data must not cross a page boundary. */
-       BUG_ON(size + offset > PAGE_SIZE);
+       if (size + offset > PAGE_SIZE)
+               return -1;
 
        meta = npo->meta + npo->meta_prod - 1;
 
@@ -459,6 +460,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct 
sk_buff *skb,
                *head = 0; /* There must be something in this buffer now. */
 
        }
+       return 0;
 }
 
 /*
@@ -517,17 +519,31 @@ static int netbk_gop_skb(struct sk_buff *skb,
                if (data + len > skb_tail_pointer(skb))
                        len = skb_tail_pointer(skb) - data;
 
-               netbk_gop_frag_copy(vif, skb, npo,
-                                   virt_to_page(data), len, offset, &head);
+               if (netbk_gop_frag_copy(vif, skb, npo,
+                               virt_to_page(data), len, offset, &head) < 0) {
+printk(KERN_CRIT "netbk_gop_frag_copy failed: skb head %p-%p\n",
+       skb->data, skb_tail_pointer);
+printk(KERN_CRIT "copying from %p-%p, offset %x, len %x\n",
+       data, data+len, offset, len);
+dump_page(virt_to_page(data));
+BUG();
+               }
                data += len;
        }
 
        for (i = 0; i < nr_frags; i++) {
-               netbk_gop_frag_copy(vif, skb, npo,
+               if (netbk_gop_frag_copy(vif, skb, npo,
                                    skb_frag_page(&skb_shinfo(skb)->frags[i]),
                                    skb_frag_size(&skb_shinfo(skb)->frags[i]),
                                    skb_shinfo(skb)->frags[i].page_offset,
-                                   &head);
+                                   &head) < 0) {
+printk(KERN_CRIT "netbk_gop_frag_copy failed: skb frag %d page\n", i);
+printk(KERN_CRIT "copying from offset %x, len %x\n",
+       skb_shinfo(skb)->frags[i].page_offset,
+       skb_frag_size(&skb_shinfo(skb)->frags[i]));
+dump_page(skb_frag_page(&skb_shinfo(skb)->frags[i]));
+BUG();
+               }
        }
 
        return npo->meta_prod - old_meta_prod;



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.