[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel 3.7.0-pre-rc1 kernel BUG at drivers/net/xen-netback/netback.c:405 RIP: e030:[<ffffffff814714f9>] [<ffffffff814714f9>] netbk_gop_frag_copy+0x379/0x380



Monday, October 8, 2012, 10:54:21 AM, you wrote:

> On Mon, 2012-10-08 at 00:34 +0100, Konrad Rzeszutek Wilk wrote:
>> On Sat, Oct 06, 2012 at 12:20:54AM +0200, Sander Eikelenboom wrote:
>> > 
>> > Friday, October 5, 2012, 9:26:31 PM, you wrote:
>> > 
>> > > Sorry for top posting - on mobile.
>> > 
>> > > I saw it too yesterday but only on a specific hardware - AMD FX8. What 
>> > > type of CPU do you have?  Does xsave=off on Xen line help?
>> > 
>> > Nope the xsave=off doesn't help
>> > 
>> > > Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote:
>> > 
>> > >>Hi Konrad,
>> > >>
>> > >>Just tested kernel 3.7.0-pre-rc1 but ran into a oops in netback on boot 
>> > >>after starting some guests:
>> > >>
>> > >>[  402.723915] ------------[ cut here ]------------
>> > >>[  402.734629] kernel BUG at drivers/net/xen-netback/netback.c:405!
>> 
>> Looking at the code, this is what we get:
>> 
>>         /* Data must not cross a page boundary. */
>>         BUG_ON(size + offset > PAGE_SIZE);
>> 
>> Looking at the commits, the one recently added was:
>> commit c571898ffc24a1768e1b2dabeac0fc7dd4c14601
>> Author: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
>> Date:   Fri Sep 14 14:26:59 2012 +0000
>> 
>>     xen/gndev: Xen backend support for paged out grant targets V4.
>>     
>> 
>> But after reverting it and trying the kernel I still got the crash.
>> 
>> So .. the weirdness is that this seems to be only happening on
>> certain AMD machines - for example on my AMD A8 box I did not see this.

> I took a look at this last week and can't repro.

> The code which calls this function is supposed to ensure that the buffer
> doesn't cross a page boundary.

> There are two places which call it, one is looping over the skb's frags,
> which just can't cross page boundaries and if they did it would be
> breaking left and right for everyone AFAICT (although I'm very behind on
> my LKML and netdev reading, so maybe it is ;-)).

> The other case is processing the SKB's linear data area, which can cross
> a page boundary but the code loops over it and processes it in chunks
> which fit in single pages. I was suspicious of this code so I pulled it
> out into a little userspace test harness and fed it some corner cases
> but it looked like it was doing the right thing.

> I speculated that this might be NIC rather than processor related
> (perhaps there's some weak correlation between certain NICs and certain
> processor manufacturers).

> Konrad seems to have an r8169 but the module list wasn't in Sander's
> output -- do you know what you have?

Surprise surprise .. a r8169 as well ..

>> I fear that the next step is to do a bit off git bisection to
>> get an idea of which merge it might be. I am going to be AFK
>> on Monday so I won't get to this until Tuesday/Wednesay :-(
>> 
>> .. Thought to help speed this process, this looks like a
>> candidate:
>> 
>> commit 229993001282e128a49a59ec43d255614775703a
>> Merge: 7687b80 fd0f586
>> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Date:   Mon Oct 1 11:13:33 2012 -0700
>> 
>>     Merge branch 'x86-mm-for-linus' of 
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>     
>>     Pull x86/mm changes from Ingo Molnar:
>>      "The biggest change is new TLB partial flushing code for AMD CPUs.
>>       (The v3.6 kernel had the Intel CPU side code, see commits
>>       e0ba94f14f74..effee4b9b3b.)

> Would be interesting to try although I don't think anything in this area
> is actively messing with page table mappings (that happens later, and
> doesn't effect the non-data bits of the skb like the sizes and offsets).

> Perhaps this debug patch might shed some light? PG_compound or THP might
> be an interesting case?

> Ian.

> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index 05593d8..ca4c47d 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -386,7 +386,7 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct 
> xenvif *vif,
>   * Set up the grant operations for this fragment. If it's a flipping
>   * interface, we also set up the unmap request from here.
>   */
> -static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
> +static int netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>                                 struct netrx_pending_operations *npo,
>                                 struct page *page, unsigned long size,
>                                 unsigned long offset, int *head)
> @@ -402,7 +402,8 @@ static void netbk_gop_frag_copy(struct xenvif *vif, 
> struct sk_buff *skb,
>         unsigned long bytes;
>  
>         /* Data must not cross a page boundary. */
> -       BUG_ON(size + offset > PAGE_SIZE);
> +       if (size + offset > PAGE_SIZE)
> +               return -1;
>  
>         meta = npo->meta + npo->meta_prod - 1;
>  
> @@ -459,6 +460,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, 
> struct sk_buff *skb,
>                 *head = 0; /* There must be something in this buffer now. */
>  
>         }
> +       return 0;
>  }
>  
>  /*
> @@ -517,17 +519,31 @@ static int netbk_gop_skb(struct sk_buff *skb,
>                 if (data + len > skb_tail_pointer(skb))
>                         len = skb_tail_pointer(skb) - data;
>  
> -               netbk_gop_frag_copy(vif, skb, npo,
> -                                   virt_to_page(data), len, offset, &head);
> +               if (netbk_gop_frag_copy(vif, skb, npo,
> +                               virt_to_page(data), len, offset, &head) < 0) {
> +printk(KERN_CRIT "netbk_gop_frag_copy failed: skb head %p-%p\n",
+       skb->>data, skb_tail_pointer);
> +printk(KERN_CRIT "copying from %p-%p, offset %x, len %x\n",
> +       data, data+len, offset, len);
> +dump_page(virt_to_page(data));
> +BUG();
> +               }
>                 data += len;
>         }
>  
>         for (i = 0; i < nr_frags; i++) {
> -               netbk_gop_frag_copy(vif, skb, npo,
> +               if (netbk_gop_frag_copy(vif, skb, npo,
>                                     skb_frag_page(&skb_shinfo(skb)->frags[i]),
>                                     skb_frag_size(&skb_shinfo(skb)->frags[i]),
>                                     skb_shinfo(skb)->frags[i].page_offset,
> -                                   &head);
> +                                   &head) < 0) {
> +printk(KERN_CRIT "netbk_gop_frag_copy failed: skb frag %d page\n", i);
> +printk(KERN_CRIT "copying from offset %x, len %x\n",
> +       skb_shinfo(skb)->frags[i].page_offset,
> +       skb_frag_size(&skb_shinfo(skb)->frags[i]));
> +dump_page(skb_frag_page(&skb_shinfo(skb)->frags[i]));
> +BUG();
> +               }
>         }
>  
>         return npo->meta_prod - old_meta_prod;





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.