[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback Oops then xenwatch stuck in D state



On Sun, 2013-02-10 at 22:03 +0000, Christopher S. Aker wrote:
> And another this afternoon on a different machine:
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000008b8

OK, so the guest is faulting at different offset now. It is very likely
that there is OOM / race condition in other places. And judging from
your two emails, I presume this bug can be reproduce steadily.

> IP: [<ffffffff81011dda>] xen_spin_lock_flags+0x3a/0x80
> PGD 0
> Oops: 0002 [#1] SMP
> Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 ebt_ip 
> ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding ebtable_filter e1000e
> CPU 5
> Pid: 1550, comm: netback/5 Not tainted 3.7.6-1-x86_64 #1 Supermicro 
> X8DT6/X8DT6
> RIP: e030:[<ffffffff81011dda>]  [<ffffffff81011dda>] 
> xen_spin_lock_flags+0x3a/0x80
> RSP: e02b:ffff8800836e7b58  EFLAGS: 00010006
> RAX: 0000000000000400 RBX: 00000000000008b8 RCX: 000000000045de5d
> RDX: 0000000000000001 RSI: 0000000000000211 RDI: 00000000000008b8
> RBP: ffff8800836e7b78 R08: 0000000000000068 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
> R13: 0000000000000200 R14: 0000000000000400 R15: 000000000045de5d
> FS:  00007f474a465700(0000) GS:ffff880100740000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000008b8 CR3: 0000000001c0b000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process netback/5 (pid: 1550, threadinfo ffff8800836e6000, task 
> ffff880084510000)
> Stack:
>   0000000000000211 00000000000008b8 ffff8800771e5700 ffff8800771e57d8
>   ffff8800836e7b98 ffffffff817605da 0000000000000000 00000000000008b8
>   ffff8800836e7bd8 ffffffff8154446f ffff8800771e5000 0000000000000000
> Call Trace:
>   [<ffffffff817605da>] _raw_spin_lock_irqsave+0x2a/0x40
>   [<ffffffff8154446f>] xen_netbk_schedule_xenvif+0x8f/0x100
>   [<ffffffff81544505>] xen_netbk_check_rx_xenvif+0x25/0x60
>   [<ffffffff815445eb>] netbk_tx_err+0x5b/0x70
>   [<ffffffff8154518c>] xen_netbk_tx_build_gops+0xb8c/0xbc0
>   [<ffffffff81012880>] ? __switch_to+0x160/0x4f0
>   [<ffffffff810891b8>] ? idle_balance+0xf8/0x150
>   [<ffffffff81080150>] ? finish_task_switch+0x60/0xd0
>   [<ffffffff8175f7b4>] ? __schedule+0x394/0x750
>   [<ffffffff815452af>] xen_netbk_kthread+0xef/0x9d0
>   [<ffffffff81080150>] ? finish_task_switch+0x60/0xd0
>   [<ffffffff810720c0>] ? wake_up_bit+0x40/0x40
>   [<ffffffff815451c0>] ? xen_netbk_tx_build_gops+0xbc0/0xbc0
>   [<ffffffff81071a06>] kthread+0xc6/0xd0
>   [<ffffffff810037b9>] ? xen_end_context_switch+0x19/0x20
>   [<ffffffff81071940>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8176847c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81071940>] ? kthread_freezable_should_stop+0x70/0x70
[snip]
> 
> We're not so good at this, but it looks like xl->lock deref is what we 
> hit?  The lock was gone?
> 

A quick check on the xen_spinlock struct, its offset should not be
0x8b8. Reading the backtrace suggests that it is the netbk struct is
gone.

Do you manipulate the number of vcpus Dom0 has after it's up?


Wei.

> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.