[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback Oops then xenwatch stuck in D state



On 01/02/13 21:58, Christopher S. Aker wrote:
> We've been hitting the following issue on a variety of hosts and recent 
> Xen/dom0 version combinations.  Here's an excerpt from our latest:
>
> Xen: 4.1.4 (xenbits @ 23432)
> Dom0: 3.7.1-x86_64
>
> BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
> IP: [<ffffffff8141a301>] evtchn_from_irq+0x11/0x40
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 ebt_ip 
> ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding ebtable_filter igb
> CPU 0
> Pid: 1636, comm: netback/0 Not tainted 3.7.1-x86_64 #1 Supermicro 
> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+
> RIP: e030:[<ffffffff8141a301>]  [<ffffffff8141a301>] 
> evtchn_from_irq+0x11/0x40
> RSP: e02b:ffff88004334fc98  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff880004964700 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 00000000000001dc RDI: 000000000000001c
> RBP: ffff88004334fc98 R08: ffffea00010bf818 R09: 0000000000000000
> R10: 0000000000000001 R11: ffff880000000000 R12: ffff880004964720
> R13: ffff88002d34d700 R14: 00000000ffffffff R15: ffff88004334fd84
> FS:  00007f8939347700(0000) GS:ffff880101e00000(0000) 
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000000001c CR3: 0000000001c0b000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process netback/0 (pid: 1636, threadinfo ffff88004334e000, task 
> ffff880043fd5fe0)
> Stack:
>   ffff88004334fcb8 ffffffff8141b06d ffff880000000218 ffff880042fe1200
>   ffff88004334fdb8 ffffffff81543b9b ffff88004334fd84 ffff880042c59040
>   ffff88004334fd68 ffff88004334fd48 ffff880000000cc0 ffffc900106c7ac0
> Call Trace:
>   [<ffffffff8141b06d>] notify_remote_via_irq+0xd/0x40
>   [<ffffffff81543b9b>] xen_netbk_rx_action+0x73b/0x800
>   [<ffffffff81544c25>] xen_netbk_kthread+0xb5/0xa60
>   [<ffffffff81080050>] ? finish_task_switch+0x60/0xd0
>   [<ffffffff81071fe0>] ? wake_up_bit+0x40/0x40
>   [<ffffffff81544b70>] ? xen_netbk_tx_build_gops+0xa10/0xa10
>   [<ffffffff81071926>] kthread+0xc6/0xd0
>   [<ffffffff810037b9>] ? xen_end_context_switch+0x19/0x20
>   [<ffffffff81071860>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff81767c7c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81071860>] ? kthread_freezable_should_stop+0x70/0x70
> Code: be f5 01 00 00 48 c7 c7 12 e2 99 81 e8 d9 4c c3 ff eb cd 0f 1f 80 
> 00 00 00 00 55 48 89 e5 39 3d c6 fd 80 00 76 0b e8 df fa ff ff <0f> b7 
> 40 1c c9 c3 89 f9 31 c0 48 c7 c2 27 e2 99 81 be db 00 00
> RIP  [<ffffffff8141a301>] evtchn_from_irq+0x11/0x40
>   RSP <ffff88004334fc98>
> CR2: 000000000000001c
> ---[ end trace 1b5f6b359343fcfe ]---
>
>
> Which leads to xenwatch being stuck in D state, which then requires us 
> to reboot the host.
>
> SysRq : Show Blocked State
>    task                        PC stack   pid father
> xenwatch        D ffff880101f938c0  5056    49      2 0x00000000
>   ffff880101305cb8 0000000000000246 ffff8801012a0760 00000000000138c0
>   ffff880101305fd8 ffff880101304010 00000000000138c0 00000000000138c0
>   ffff880101305fd8 00000000000138c0 ffff8800349224e0 ffff8801012a0760
> Call Trace:
>   [<ffffffff8175f444>] schedule+0x24/0x70
>   [<ffffffff8154698d>] xenvif_disconnect+0x7d/0x130
>   [<ffffffff81071fe0>] ? wake_up_bit+0x40/0x40
>   [<ffffffff81545ac4>] frontend_changed+0x214/0x660
>   [<ffffffff81080050>] ? finish_task_switch+0x60/0xd0
>   [<ffffffff8141fb22>] xenbus_otherend_changed+0xb2/0xc0
>   [<ffffffff8175fe39>] ? _raw_spin_unlock_irqrestore+0x19/0x20
>   [<ffffffff8141fd3b>] frontend_changed+0xb/0x10
>   [<ffffffff8141da3a>] xenwatch_thread+0xba/0x180
>   [<ffffffff81071fe0>] ? wake_up_bit+0x40/0x40
>   [<ffffffff8141d980>] ? xs_watch+0x60/0x60
>   [<ffffffff81071926>] kthread+0xc6/0xd0
>   [<ffffffff810037b9>] ? xen_end_context_switch+0x19/0x20
>   [<ffffffff81071860>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff81767c7c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81071860>] ? kthread_freezable_should_stop+0x70/0x70
>
> I'll give building an updated dom0 kernel a shot, but was hoping this 
> rang a bell or two.
>
> Thanks,
> -Chris

Well - it looks like info_for_irq(irq) returns a null pointer, and
evtchn_from_irq blindly dereferences it trying to find the event channel.

As for why the info is NULL, I can't help you, but perhaps there should
be a NULL check, returning 0 in the case of an error?

~Andrew

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.