[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] BUG: unable to handle kernel NULL pointer in __netdev_pick_tx()



On 07/06/2015 06:41 PM, Eric Dumazet wrote:
> On Mon, 2015-07-06 at 16:26 +0800, Bob Liu wrote:
>> Hi,
>>
>> I tried to run the latest kernel v4.2-rc1, but often got below panic during 
>> system boot.
>>
>> [   42.118983] BUG: unable to handle kernel paging request at 
>> 0000003fffffffff
>> [   42.119008] IP: [<ffffffff8161cfd0>] __netdev_pick_tx+0x70/0x120
>> [   42.119023] PGD 0 
>> [   42.119026] Oops: 0000 [#1] PREEMPT SMP 
>> [   42.119031] Modules linked in: bridge stp llc iTCO_wdt 
>> iTCO_vendor_support x86_pkg_temp_thermal coretemp pcspkr crc32_pclmul 
>> crc32c_intel ghash_clmulni_intel ixgbe ptp pps_core cdc_ether usbnet mii 
>> mdio sb_edac dca edac_core wmi i2c_i801 tpm_tis tpm lpc_ich mfd_core ipmi_si 
>> ipmi_msghandler shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput 
>> usb_storage mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core nvme 
>> mpt2sas raid_class scsi_transport_sas
>> [   42.119073] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.2.0-rc1 #80
>> [   42.119077] Hardware name: Oracle Corporation SUN SERVER X4-4/ASSY,MB 
>> WITH TRAY, BIOS 24030400 08/22/2014
>> [   42.119081] task: ffff880300b84000 ti: ffff880300b90000 task.ti: 
>> ffff880300b90000
>> [   42.119085] RIP: e030:[<ffffffff8161cfd0>]  [<ffffffff8161cfd0>] 
>> __netdev_pick_tx+0x70/0x120
>> [   42.119091] RSP: e02b:ffff880306d03868  EFLAGS: 00010206
>> [   42.119093] RAX: ffff8802f676b6b0 RBX: 0000003fffffffff RCX: 
>> ffffffff8161cf60
>> [   42.119097] RDX: 000000000000001c RSI: ffff8802fe24c900 RDI: 
>> ffff8802f96c0000
>> [   42.119100] RBP: ffff880306d038a8 R08: 0000000000023240 R09: 
>> ffffffff8160fb1c
>> [   42.119104] R10: 0000000000000000 R11: 0000000000000000 R12: 
>> ffff8802fe24c900
>> [   42.119107] R13: 0000000000000000 R14: 00000000ffffffff R15: 
>> ffff8802f96c0000
>> [   42.119121] FS:  0000000000000000(0000) GS:ffff880306d00000(0000) 
>> knlGS:0000000000000000
>> [   42.119124] CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
>> [   42.119127] CR2: 0000003fffffffff CR3: 0000000001c1c000 CR4: 
>> 0000000000042660
>> [   42.119130] Stack:
>> [   42.119132]  ffffffff81d63850 ffff8802f63040a0 ffff880306d03888 
>> ffff8802fe24c900
>> [   42.119137]  000000000000000e 0000000000000000 ffff8802f96c0000 
>> ffff8802fe24c400
>> [   42.119141]  ffff880306d038e8 ffffffffa028bea4 ffffffff8189cfe0 
>> ffffffff81d1b900
>> [   42.119146] Call Trace:
>> [   42.119149]  <IRQ> 
>> [   42.119160]  [<ffffffffa028bea4>] ixgbe_select_queue+0xc4/0x150 [ixgbe]
>> [   42.119167]  [<ffffffff816240ee>] netdev_pick_tx+0x5e/0xf0
>> [   42.119170]  [<ffffffff81624210>] __dev_queue_xmit+0x90/0x560
>> [   42.119174]  [<ffffffff816246f3>] dev_queue_xmit_sk+0x13/0x20
>> [   42.119181]  [<ffffffffa02d2b3a>] br_dev_queue_push_xmit+0x4a/0x80 
>> [bridge]
>> [   42.119186]  [<ffffffffa02d2cca>] br_forward_finish+0x2a/0x80 [bridge]
>> [   42.119191]  [<ffffffffa02d2da8>] __br_forward+0x88/0x110 [bridge]
>> [   42.119198]  [<ffffffff8160e18e>] ? __skb_clone+0x2e/0x140
>> [   42.119202]  [<ffffffff8160fb33>] ? skb_clone+0x63/0xa0
>> [   42.119206]  [<ffffffffa02d2d20>] ? br_forward_finish+0x80/0x80 [bridge]
>> [   42.119211]  [<ffffffffa02d2ac7>] deliver_clone+0x37/0x60 [bridge]
>> [   42.119215]  [<ffffffffa02d2c38>] br_flood+0xc8/0x130 [bridge]
>> [   42.119220]  [<ffffffffa02d2d20>] ? br_forward_finish+0x80/0x80 [bridge]
>> [   42.119255]  [<ffffffffa02d3229>] br_flood_forward+0x19/0x20 [bridge]
>> [   42.119260]  [<ffffffffa02d4188>] br_handle_frame_finish+0x258/0x590 
>> [bridge]
>> [   42.119266]  [<ffffffff8172b5d0>] ? get_partial_node.isra.63+0x1b7/0x1d4
>> [   42.119272]  [<ffffffffa02d4606>] br_handle_frame+0x146/0x270 [bridge]
>> [   42.119277]  [<ffffffff8168ed39>] ? udp_gro_receive+0x129/0x150
>> [   42.119281]  [<ffffffff81621836>] __netif_receive_skb_core+0x1d6/0xa20
>> [   42.119286]  [<ffffffff81697a1d>] ? inet_gro_receive+0x9d/0x230
>> [   42.119290]  [<ffffffff81622098>] __netif_receive_skb+0x18/0x60
>> [   42.119294]  [<ffffffff81622113>] netif_receive_skb_internal+0x33/0xb0
>> [   42.119297]  [<ffffffff81622d3f>] napi_gro_receive+0xbf/0x110
>> [   42.119303]  [<ffffffffa028def0>] ixgbe_clean_rx_irq+0x490/0x9e0 [ixgbe]
>> [   42.119308]  [<ffffffffa028f0c0>] ixgbe_poll+0x420/0x790 [ixgbe]
>> [   42.119312]  [<ffffffff8162255d>] net_rx_action+0x15d/0x340
>> [   42.119321]  [<ffffffff81095426>] __do_softirq+0xe6/0x2f0
>> [   42.119324]  [<ffffffff81095904>] irq_exit+0xf4/0x100
>> [   42.119333]  [<ffffffff814275c9>] xen_evtchn_do_upcall+0x39/0x50
>> [   42.119340]  [<ffffffff817367de>] xen_do_hypervisor_callback+0x1e/0x30
>> [   42.119343]  <EOI> 
>> [   42.119348]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [   42.119351]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [   42.119356]  [<ffffffff8100bbf0>] ? xen_safe_halt+0x10/0x20
>> [   42.119362]  [<ffffffff8101feab>] ? default_idle+0x1b/0xf0
>> [   42.119365]  [<ffffffff8102062f>] ? arch_cpu_idle+0xf/0x20
>> [   42.119370]  [<ffffffff810d273b>] ? default_idle_call+0x3b/0x50
>> [   42.119374]  [<ffffffff810d2a7f>] ? cpu_startup_entry+0x2bf/0x350
>> [   42.119379]  [<ffffffff8101290a>] ? cpu_bringup_and_idle+0x2a/0x40
>> [   42.119382] Code: 8b 87 e8 03 00 00 48 85 c0 0f 84 af 00 00 00 41 8b 94 
>> 24 ac 00 00 00 83 ea 01 48 8d 44 d0 10 48 8b 18 48 85 db 0f 84 93 00 00 00 
>> <8b> 03 83 f8 01 74 6b 41 f6 84 24 91 00 00 00 30 74 66 41 8b 94 
>> [   42.119414] RIP  [<ffffffff8161cfd0>] __netdev_pick_tx+0x70/0x120
>> [   42.119418]  RSP <ffff880306d03868>
>> [   42.119420] CR2: 0000003fffffffff
>> [   42.119425] ---[ end trace cbc4abc4d5c3f8b2 ]---
>> [   43.391014] BUG: unable to handle kernel paging request at 
>> 0000003fffffffff
>> [   43.391023] IP: [<ffffffff8161cfd0>] __netdev_pick_tx+0x70/0x120
>> [   43.391030] PGD 0 
>> [   43.391032] Oops: 0000 [#2] PREEMPT SMP 
>> [   43.391036] Modules linked in: bridge stp llc iTCO_wdt 
>> iTCO_vendor_support x86_pkg_temp_thermal coretemp pcspkr crc32_pclmul 
>> crc32c_intel ghash_clmulni_intel ixgbe ptp pps_core cdc_ether usbnet mii 
>> mdio sb_edac dca edac_core wmi i2c_i801 tpm_tis tpm lpc_ich mfd_core ipmi_si 
>> ipmi_msghandler shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput 
>> usb_storage mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core nvme 
>> mpt2sas raid_class scsi_transport_sas
>> [   43.391070] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G      D         
>> 4.2.0-rc1 #80
>> [   43.391074] Hardware name: Oracle Corporation SUN SERVER X4-4/ASSY,MB 
>> WITH TRAY, BIOS 24030400 08/22/2014
>> [   43.391078] task: ffff880300b98000 ti: ffff880300ba0000 task.ti: 
>> ffff880300ba0000
>> [   43.391081] RIP: e030:[<ffffffff8161cfd0>]  [<ffffffff8161cfd0>] 
>> __netdev_pick_tx+0x70/0x120
>> [   43.391086] RSP: e02b:ffff880306d83868  EFLAGS: 00010206
>> [   43.391089] RAX: ffff8802f676b6c0 RBX: 0000003fffffffff RCX: 
>> ffffffff8161cf60
>> [   43.391092] RDX: 000000000000001e RSI: ffff8802ff0aa400 RDI: 
>> ffff8802f96c0000
>> [   43.391095] RBP: ffff880306d838a8 R08: 0000000000023240 R09: 
>> ffffffff8160fb1c
>> [   43.391099] R10: 0000000000000000 R11: ffffea000bd88580 R12: 
>> ffff8802ff0aa400
>> [   43.391102] R13: 0000000000000000 R14: 00000000ffffffff R15: 
>> ffff8802f96c0000
>> [   43.391108] FS:  0000000000000000(0000) GS:ffff880306d80000(0000) 
>> knlGS:0000000000000000
>> [   43.391111] CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
>> [   43.391114] CR2: 0000003fffffffff CR3: 0000000001c1c000 CR4: 
>> 0000000000042660
>> [   43.391118] Stack:
>> [   43.391119]  0000000000000000 0000000000000000 0000000000000000 
>> ffff8802ff0aa400
>> [   43.391124]  000000000000000e 0000000000000000 ffff8802f96c0000 
>> ffff8802ff0aad00
>> [   43.391128]  ffff880306d838e8 ffffffffa028bea4 0000000000000000 
>> 0000000000000000
>> [   43.391133] Call Trace:
>> [   43.391135]  <IRQ> 
>> [   43.391141]  [<ffffffffa028bea4>] ixgbe_select_queue+0xc4/0x150 [ixgbe]
>> [   43.391146]  [<ffffffff816240ee>] netdev_pick_tx+0x5e/0xf0
>> [   43.391150]  [<ffffffff81624210>] __dev_queue_xmit+0x90/0x560
>> [   43.391154]  [<ffffffff816246f3>] dev_queue_xmit_sk+0x13/0x20
>> [   43.391160]  [<ffffffffa02d2b3a>] br_dev_queue_push_xmit+0x4a/0x80 
>> [bridge]
>> [   43.391165]  [<ffffffffa02d2cca>] br_forward_finish+0x2a/0x80 [bridge]
>> [   43.391170]  [<ffffffffa02d2da8>] __br_forward+0x88/0x110 [bridge]
>> [   43.391177]  [<ffffffff81388f01>] ? list_del+0x11/0x40
>> [   43.391181]  [<ffffffff8160e18e>] ? __skb_clone+0x2e/0x140
>> [   43.391184]  [<ffffffff8160fb33>] ? skb_clone+0x63/0xa0
>> [   43.391188]  [<ffffffffa02d2d20>] ? br_forward_finish+0x80/0x80 [bridge]
>> [   43.391193]  [<ffffffffa02d2ac7>] deliver_clone+0x37/0x60 [bridge]
>> [   43.391198]  [<ffffffffa02d2c38>] br_flood+0xc8/0x130 [bridge]
>> [   43.391202]  [<ffffffffa02d2d20>] ? br_forward_finish+0x80/0x80 [bridge]
>> [   43.391207]  [<ffffffffa02d3229>] br_flood_forward+0x19/0x20 [bridge]
>> [   43.391212]  [<ffffffffa02d4188>] br_handle_frame_finish+0x258/0x590 
>> [bridge]
>> [   43.391216]  [<ffffffff8172b5d0>] ? get_partial_node.isra.63+0x1b7/0x1d4
>> [   43.391221]  [<ffffffffa02d4606>] br_handle_frame+0x146/0x270 [bridge]
>> [   43.391224]  [<ffffffff8172b95f>] ? __slab_alloc+0x193/0x4a3
>> [   43.391228]  [<ffffffff81621836>] __netif_receive_skb_core+0x1d6/0xa20
>> [   43.391233]  [<ffffffff81622098>] __netif_receive_skb+0x18/0x60
>> [   43.391236]  [<ffffffff81622113>] netif_receive_skb_internal+0x33/0xb0
>> [   43.391240]  [<ffffffff81622d3f>] napi_gro_receive+0xbf/0x110
>> [   43.391246]  [<ffffffffa028def0>] ixgbe_clean_rx_irq+0x490/0x9e0 [ixgbe]
>> [   43.391251]  [<ffffffffa028f0c0>] ixgbe_poll+0x420/0x790 [ixgbe]
>> [   43.391255]  [<ffffffff8162255d>] net_rx_action+0x15d/0x340
>> [   43.391259]  [<ffffffff81095426>] __do_softirq+0xe6/0x2f0
>> [   43.391263]  [<ffffffff81095904>] irq_exit+0xf4/0x100
>> [   43.391267]  [<ffffffff814275c9>] xen_evtchn_do_upcall+0x39/0x50
>> [   43.391271]  [<ffffffff817367de>] xen_do_hypervisor_callback+0x1e/0x30
>> [   43.391274]  <EOI> 
>> [   43.391277]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [   43.391280]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [   43.391285]  [<ffffffff8100bbf0>] ? xen_safe_halt+0x10/0x20
>> [   43.391289]  [<ffffffff8101feab>] ? default_idle+0x1b/0xf0
>> [   43.391296]  [<ffffffff8102062f>] ? arch_cpu_idle+0xf/0x20
>> [   43.391301]  [<ffffffff810d273b>] ? default_idle_call+0x3b/0x50
>> [   43.391307]  [<ffffffff810d2a7f>] ? cpu_startup_entry+0x2bf/0x350
>> [   43.391318]  [<ffffffff8101290a>] ? cpu_bringup_and_idle+0x2a/0x40
>> [   43.391324] Code: 8b 87 e8 03 00 00 48 85 c0 0f 84 af 00 00 00 41 8b 94 
>> 24 ac 00 00 00 83 ea 01 48 8d 44 d0 10 48 8b 18 48 85 db 0f 84 93 00 00 00 
>> <8b> 03 83 f8 01 74 6b 41 f6 84 24 91 00 00 00 30 74 66 41 8b 94 
>> [   43.391358] RIP  [<ffffffff8161cfd0>] __netdev_pick_tx+0x70/0x120
>> [   43.391362]  RSP <ffff880306d83868>
>> [   43.391364] CR2: 0000003fffffffff
>> [   43.391368] ---[ end trace cbc4abc4d5c3f8b3 ]---
>> [   43.393487] Kernel panic - not syncing: Fatal exception in interrupt
>>
> 
> Hi Bob
> 
> I am suspecting something similar to what
> c29390c6dfeee0944ac6b5610ebbe403944378fc ("xps: must clear sender_cpu
> before forwarding") attempted to fix.
> 
> Trying to keep sk_buff small is hard.
> 
> Could you try something like :
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index e97572b5d2cc..0ff6e1bbca91 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -42,6 +42,7 @@ int br_dev_queue_push_xmit(struct sock *sk, struct sk_buff 
> *skb)
>       } else {
>               skb_push(skb, ETH_HLEN);
>               br_drop_fake_rtable(skb);
> +             skb_sender_cpu_clear(skb);
>               dev_queue_xmit(skb);
>       }
>  

Thank you for the quick fix!
Tested by rebooting several times and didn't hit this panic any more.

Regards,
-Bob


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.