[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] DomU: kernel BUG at arch/x86/xen/enlighten.c:425



>>> On 08.03.13 at 03:23, James Sinclair <james.sinclair@xxxxxxxxxx> wrote:
> Xen: (XEN) Xen version 4.1.5-pre (maker@(none)) (gcc version 4.4.5 (Debian 
> 4.4.5-8) ) Mon Feb  4 12:59:38 EST 2013
> Dom0: 3.7.7-1-x86_64 #1 SMP Thu Feb 14 15:58:35 EST 2013 x86_64 GNU/Linux
> DomU: 3.7.10 (32 bit)
> 
> We're seeing this bug pop up fairly regularly. The BUG below is from the 
> first time it's triggered after a reboot - subsequent triggers produce 
> similar 
> tracebacks with the kernel flagged as "Tainted: G      D" - I can provide 
> more 
> examples if desired.

Two fundamental things that are missing: For one, this is almost
certainly being accompanied by some hypervisor message(s), so
you will want to provide the hypervisor log (making sure the log
level is high enough). And then, even more with you using a not
yet released hypervisor version, you would want to clarify whether
this is a regression of some sort (i.e. whether a certain hypervisor
and/or kernel version are known not to exhibit this).

>     ------------[ cut here ]------------
>     kernel BUG at arch/x86/xen/enlighten.c:425!
>     invalid opcode: 0000 [#1] SMP 
>     Modules linked in:
>     Pid: 3158, comm: ntpd Not tainted 3.7.10-linode49 #2  
>     EIP: 0061:[<c0102e8a>] EFLAGS: 00210282 CPU: 0
>     EIP is at set_aliased_prot+0x9a/0x110
>     EAX: ffffffea EBX: ee2e6000 ECX: 2c74a063 EDX: 80000019
>     ESI: 00000000 EDI: 00002000 EBP: 2c74a063 ESP: ead9da10
>      DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>     CR0: 8005003b CR2: 4a258bc4 CR3: 2aa7a000 CR4: 00002660
>     DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>     DR6: ffff0ff0 DR7: 00000400
>     Process ntpd (pid: 3158, ti=ead9c000 task=eb6e3300 task.ti=ead9c000)
>     Stack:
>      0007fb6b ee2e6000 80000019 ec1ed000 00000001 ee2e6000 00000200 00002000
>      eb6e3300 c0102f27 eb19cc80 eb19cc80 00000000 c010ba5f eb19cc80 eb19cc80
>      c013083c c010863b c078c405 00000000 00000000 c091e1c0 c015ade1 00000000
>     Call Trace:
>      [<c0102f27>] ? xen_free_ldt+0x27/0x40

Quite unusual to have a process with LDT these days. And hence
quite likely for this to be the culprit. The primary guess therefore
would be that the LDT didn't get cleared yet (MMUEXT_SET_LDT).

Jan

>      [<c010ba5f>] ? destroy_context+0x2f/0xa0
>      [<c013083c>] ? __mmdrop+0x1c/0xb0
>      [<c010863b>] ? __switch_to+0x13b/0x3a0
>      [<c078c405>] ? _raw_spin_lock+0x5/0x10
>      [<c015ade1>] ? finish_task_switch+0xb1/0xc0
>      [<c078b77d>] ? __schedule+0x1fd/0x590
>      [<c0639aa7>] ? get_unique_tuple+0x167/0x200
>      [<c06258c2>] ? nf_ct_invert_tuple+0x52/0x70
>      [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
>      [<c078ad9d>] ? schedule_hrtimeout_range_clock+0x10d/0x120
>      [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
>      [<c014ecca>] ? add_wait_queue+0x1a/0x50
>      [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>      [<c078adbf>] ? schedule_hrtimeout_range+0xf/0x20
>      [<c01e4a47>] ? poll_schedule_timeout+0x37/0x50
>      [<c01e57f5>] ? do_select+0x445/0x510
>      [<c04f97f5>] ? info_for_irq+0x5/0x20
>      [<c04f9eb0>] ? evtchn_from_irq+0x10/0x40
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c069c55c>] ? udp_sendmsg+0x2ec/0x820
>      [<c0678250>] ? ip_append_page+0x4e0/0x4e0
>      [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
>      [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>      [<c05d9cc2>] ? __skb_recv_datagram+0xf2/0x260
>      [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
>      [<c05d1586>] ? lock_sock_fast+0x16/0x50
>      [<c069bf76>] ? udp_recvmsg+0x76/0x2e0
>      [<c06a293c>] ? inet_recvmsg+0x5c/0xb0
>      [<c06a3817>] ? inet_sendmsg+0x47/0xb0
>      [<c05ce847>] ? sock_recvmsg+0xe7/0x100
>      [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
>      [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>      [<c0112017>] ? save_xstate_sig+0x297/0x310
>      [<c01e5a6d>] ? core_sys_select+0x1ad/0x2a0
>      [<c014165f>] ? __set_task_blocked+0x2f/0x80
>      [<c01419a9>] ? set_current_blocked+0x49/0x60
>      [<c0142fd4>] ? signal_delivered+0x44/0x60
>      [<c0111002>] ? fpu_finit+0x52/0x60
>      [<c018adb6>] ? __audit_syscall_exit+0x3b6/0x3f0
>      [<c0112749>] ? syscall_trace_leave+0xd9/0x110
>      [<c01e5dcf>] ? sys_select+0x2f/0xb0
>      [<c078c844>] ? syscall_call+0x7/0xb
>     Code: d2 0f a4 c2 0c c1 e0 0c 09 da 09 c6 89 f0 ff 15 d8 34 92 c0 31 f6 
> 89 c5 8b 5c 24 04 89 54 24 08 89 c1 e8 3a e3 ff ff 85 c0 74 04 <0f> 0b eb fe 
> 8b 
> 04 24 8b 54 24 0c c1 e0 05 8b 04 10 c1 e8 1e 69
>     EIP: [<c0102e8a>] set_aliased_prot+0x9a/0x110 SS:ESP 0069:ead9da10
>     ---[ end trace a8da5f670b3c74bb ]---
> 
> Several processes have been seen to trigger it - audispd gam java klogd ntpd 
> perl syslogd
> Eventually init will trigger it and cause a kernel panic:
> 
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> 
> I've done some Googling to see if this is a known issue. All I could find 
> was a closed Debian bug from Dec 2010 with no resolution:
> 
>     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607709 
> 
> Please let me know if you need further info from me.
> 
> Thanks,
> James Sinclair



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.