[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression in xen-netfront on v3.6 (git commit c48a11c7ad2623b99bbd6859b0b4234e7f11176f, netvm: propagate page->pfmemalloc to skb)



On Sat, Aug 04, 2012 at 07:03:55AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 03, 2012 at 08:04:14AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Aug 01, 2012 at 03:02:27PM -0400, Konrad Rzeszutek Wilk wrote:
> > > So I hadn't done a git bisection yet. But if I choose git commit:
> > > 4b24ff71108164e047cf2c95990b77651163e315
> > >     Merge tag 'for-v3.6' of git://git.infradead.org/battery-2.6
> > > 
> > >     Pull battery updates from Anton Vorontsov:
> > > 
> > > 
> > > everything works nicely. Anything past that, so these merges:
> > > 
> > > konrad@phenom:~/ssd/linux$ git log --oneline --merges 
> > > 4b24ff71108164e047cf2c95990b77651163e315..linus/master
> > > 2d53492 Merge tag 'irqdomain-for-linus' of 
> > > git://git.secretlab.ca/git/linux-2.6
> > ===> ac694db Merge branch 'akpm' (Andrew's patch-bomb)
> > 
> > Somewhere in there is the culprit. Hadn't done yet the full bisection
> > (was just checking out in each merge to see when it stopped working)
> 
> Mel, your:
> commit c48a11c7ad2623b99bbd6859b0b4234e7f11176f
> Author: Mel Gorman <mgorman@xxxxxxx>
> Date:   Tue Jul 31 16:44:23 2012 -0700
> 
>     netvm: propagate page->pfmemalloc to skb
> 
> is the culprit per git bisect. Any ideas - do the drivers need to do
> some extra processing? Here is the git bisect log
> 

The problem appears to be at drivers/net/xen-netfront.c#973 where it
calls __skb_fill_page_desc(skb, 0, NULL, 0, 0) . The driver does not
have to do extra processing as such but I did not expect NULL to be
passed in like this. Can you check if this fixes the bug please?

---8<---
netvm: check for page == NULL when propogating the skb->pfmemalloc flag

Commit [c48a11c7: netvm: propagate page->pfmemalloc to skb] is responsible
for the following bug triggered by a xen network driver

[    1.908592] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000010
[    1.908643] IP: [<ffffffffa0037750>] xennet_poll+0x980/0xec0 [xen_netfront]
[    1.908703] PGD ea1df067 PUD e8ada067 PMD 0
[    1.908774] Oops: 0000 [#1] SMP
[    1.908797] Modules linked in: fbcon tileblit font radeon bitblit softcursor 
ttm drm_kms_helper crc32c_intel xen_blkfront xen_netfront xen_fbfront 
fb_sys_fops sysimgblt sysfillrect syscopyarea +xen_kbdfront xenfs xen_privcmd
[    1.908938] CPU 0
[    1.908950] Pid: 2165, comm: ip Not tainted 3.5.0upstream-08854-g444fa66 #1
[    1.908983] RIP: e030:[<ffffffffa0037750>]  [<ffffffffa0037750>] 
xennet_poll+0x980/0xec0 [xen_netfront]
[    1.909029] RSP: e02b:ffff8800ffc03db8  EFLAGS: 00010282
[    1.909055] RAX: ffff8800ea010140 RBX: ffff8800f00e86c0 RCX: 000000000000009a
[    1.909055] RDX: 0000000000000040 RSI: 000000000000005a RDI: ffff8800fa7dee80
[    1.909055] RBP: ffff8800ffc03ee8 R08: ffff8800f00e86d8 R09: ffff8800ea010000
[    1.909055] R10: dead000000200200 R11: dead000000100100 R12: ffff8800fa7dee80
[    1.909055] R13: 000000000000005a R14: ffff8800fa7dee80 R15: 0000000000000200
[    1.909055] FS:  00007fbafc188700(0000) GS:ffff8800ffc00000(0000) 
knlGS:0000000000000000
[    1.909055] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.909055] CR2: 0000000000000010 CR3: 00000000ea108000 CR4: 0000000000002660
[    1.909055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.909055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    1.909055] Process ip (pid: 2165, threadinfo ffff8800ea0f2000, task 
ffff8800fa783040)
[    1.909055] Stack:
[    1.909055]  ffff8800e27e5040 ffff8800ffc03e88 ffff8800ffc03e68 
ffff8800ffc03e48
[    1.909055]  7fffffffffffffff ffff8800ffc03e00 ffff8800e27e5040 
ffff8800f00e86d8
[    1.909055]  ffff8800ffc03eb0 00000040ffffffff ffff8800f00e8000 
00000000ffc03e30
[    1.909055] Call Trace:
[    1.909055]  <IRQ>
[    1.909055]  [<ffffffff81066028>] ?  pvclock_clocksource_read+0x58/0xd0
[    1.909055]  [<ffffffff81486352>] net_rx_action+0x112/0x240
[    1.909055]  [<ffffffff8107f319>] __do_softirq+0xb9/0x190
[    1.909055]  [<ffffffff815d8d7c>] call_softirq+0x1c/0x30

The problem is that the xenfront driver is passing a NULL page to
__skb_fill_page_desc() which was unexpected. This patch checks that
there is a page before dereferencing.

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
---
 include/linux/skbuff.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7632c87..8857669 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1256,7 +1256,7 @@ static inline void __skb_fill_page_desc(struct sk_buff 
*skb, int i,
         * do not lose pfmemalloc information as the pages would not be
         * allocated using __GFP_MEMALLOC.
         */
-       if (page->pfmemalloc && !page->mapping)
+       if (page && page->pfmemalloc && !page->mapping)
                skb->pfmemalloc = true;
        frag->page.p              = page;
        frag->page_offset         = off;

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.