[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Kernel 3.7.0-pre-rc1 kernel BUG at drivers/net/xen-netback/netback.c:405 RIP: e030:[<ffffffff814714f9>] [<ffffffff814714f9>] netbk_gop_frag_copy+0x379/0x380
Monday, October 8, 2012, 10:54:21 AM, you wrote: > On Mon, 2012-10-08 at 00:34 +0100, Konrad Rzeszutek Wilk wrote: >> On Sat, Oct 06, 2012 at 12:20:54AM +0200, Sander Eikelenboom wrote: >> > >> > Friday, October 5, 2012, 9:26:31 PM, you wrote: >> > >> > > Sorry for top posting - on mobile. >> > >> > > I saw it too yesterday but only on a specific hardware - AMD FX8. What >> > > type of CPU do you have? Does xsave=off on Xen line help? >> > >> > Nope the xsave=off doesn't help >> > >> > > Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote: >> > >> > >>Hi Konrad, >> > >> >> > >>Just tested kernel 3.7.0-pre-rc1 but ran into a oops in netback on boot >> > >>after starting some guests: >> > >> >> > >>[ 402.723915] ------------[ cut here ]------------ >> > >>[ 402.734629] kernel BUG at drivers/net/xen-netback/netback.c:405! >> >> Looking at the code, this is what we get: >> >> /* Data must not cross a page boundary. */ >> BUG_ON(size + offset > PAGE_SIZE); >> >> Looking at the commits, the one recently added was: >> commit c571898ffc24a1768e1b2dabeac0fc7dd4c14601 >> Author: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx> >> Date: Fri Sep 14 14:26:59 2012 +0000 >> >> xen/gndev: Xen backend support for paged out grant targets V4. >> >> >> But after reverting it and trying the kernel I still got the crash. >> >> So .. the weirdness is that this seems to be only happening on >> certain AMD machines - for example on my AMD A8 box I did not see this. > I took a look at this last week and can't repro. > The code which calls this function is supposed to ensure that the buffer > doesn't cross a page boundary. > There are two places which call it, one is looping over the skb's frags, > which just can't cross page boundaries and if they did it would be > breaking left and right for everyone AFAICT (although I'm very behind on > my LKML and netdev reading, so maybe it is ;-)). > The other case is processing the SKB's linear data area, which can cross > a page boundary but the code loops over it and processes it in chunks > which fit in single pages. I was suspicious of this code so I pulled it > out into a little userspace test harness and fed it some corner cases > but it looked like it was doing the right thing. > I speculated that this might be NIC rather than processor related > (perhaps there's some weak correlation between certain NICs and certain > processor manufacturers). > Konrad seems to have an r8169 but the module list wasn't in Sander's > output -- do you know what you have? >> I fear that the next step is to do a bit off git bisection to >> get an idea of which merge it might be. I am going to be AFK >> on Monday so I won't get to this until Tuesday/Wednesay :-( >> >> .. Thought to help speed this process, this looks like a >> candidate: >> It doesn't seem to be this commit, tested before and after, both seem to work. I don't see a r8169 related commit to test, will see for a net related one. -- Sander >> commit 229993001282e128a49a59ec43d255614775703a >> Merge: 7687b80 fd0f586 >> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> >> Date: Mon Oct 1 11:13:33 2012 -0700 >> >> Merge branch 'x86-mm-for-linus' of >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip >> >> Pull x86/mm changes from Ingo Molnar: >> "The biggest change is new TLB partial flushing code for AMD CPUs. >> (The v3.6 kernel had the Intel CPU side code, see commits >> e0ba94f14f74..effee4b9b3b.) > Would be interesting to try although I don't think anything in this area > is actively messing with page table mappings (that happens later, and > doesn't effect the non-data bits of the skb like the sizes and offsets). > Perhaps this debug patch might shed some light? PG_compound or THP might > be an interesting case? > Ian. > diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 05593d8..ca4c47d 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -386,7 +386,7 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct > xenvif *vif, > * Set up the grant operations for this fragment. If it's a flipping > * interface, we also set up the unmap request from here. > */ > -static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, > +static int netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, > struct netrx_pending_operations *npo, > struct page *page, unsigned long size, > unsigned long offset, int *head) > @@ -402,7 +402,8 @@ static void netbk_gop_frag_copy(struct xenvif *vif, > struct sk_buff *skb, > unsigned long bytes; > > /* Data must not cross a page boundary. */ > - BUG_ON(size + offset > PAGE_SIZE); > + if (size + offset > PAGE_SIZE) > + return -1; > > meta = npo->meta + npo->meta_prod - 1; > > @@ -459,6 +460,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, > struct sk_buff *skb, > *head = 0; /* There must be something in this buffer now. */ > > } > + return 0; > } > > /* > @@ -517,17 +519,31 @@ static int netbk_gop_skb(struct sk_buff *skb, > if (data + len > skb_tail_pointer(skb)) > len = skb_tail_pointer(skb) - data; > > - netbk_gop_frag_copy(vif, skb, npo, > - virt_to_page(data), len, offset, &head); > + if (netbk_gop_frag_copy(vif, skb, npo, > + virt_to_page(data), len, offset, &head) < 0) { > +printk(KERN_CRIT "netbk_gop_frag_copy failed: skb head %p-%p\n", + skb->>data, skb_tail_pointer); > +printk(KERN_CRIT "copying from %p-%p, offset %x, len %x\n", > + data, data+len, offset, len); > +dump_page(virt_to_page(data)); > +BUG(); > + } > data += len; > } > > for (i = 0; i < nr_frags; i++) { > - netbk_gop_frag_copy(vif, skb, npo, > + if (netbk_gop_frag_copy(vif, skb, npo, > skb_frag_page(&skb_shinfo(skb)->frags[i]), > skb_frag_size(&skb_shinfo(skb)->frags[i]), > skb_shinfo(skb)->frags[i].page_offset, > - &head); > + &head) < 0) { > +printk(KERN_CRIT "netbk_gop_frag_copy failed: skb frag %d page\n", i); > +printk(KERN_CRIT "copying from offset %x, len %x\n", > + skb_shinfo(skb)->frags[i].page_offset, > + skb_frag_size(&skb_shinfo(skb)->frags[i])); > +dump_page(skb_frag_page(&skb_shinfo(skb)->frags[i])); > +BUG(); > + } > } > > return npo->meta_prod - old_meta_prod; _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |