Xen project Mailing List

Re: [Xen-devel] DomU: kernel BUG at arch/x86/xen/enlighten.c:425

To: "James Sinclair" <james.sinclair@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 08 Mar 2013 08:38:08 +0000

Delivery-date: Fri, 08 Mar 2013 08:38:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 08.03.13 at 03:23, James Sinclair <james.sinclair@xxxxxxxxxx> wrote: > Xen: (XEN) Xen version 4.1.5-pre (maker@(none)) (gcc version 4.4.5 (Debian > 4.4.5-8) ) Mon Feb 4 12:59:38 EST 2013 > Dom0: 3.7.7-1-x86_64 #1 SMP Thu Feb 14 15:58:35 EST 2013 x86_64 GNU/Linux > DomU: 3.7.10 (32 bit) > > We're seeing this bug pop up fairly regularly. The BUG below is from the > first time it's triggered after a reboot - subsequent triggers produce > similar > tracebacks with the kernel flagged as "Tainted: G D" - I can provide > more > examples if desired. Two fundamental things that are missing: For one, this is almost certainly being accompanied by some hypervisor message(s), so you will want to provide the hypervisor log (making sure the log level is high enough). And then, even more with you using a not yet released hypervisor version, you would want to clarify whether this is a regression of some sort (i.e. whether a certain hypervisor and/or kernel version are known not to exhibit this). > ------------[ cut here ]------------ > kernel BUG at arch/x86/xen/enlighten.c:425! > invalid opcode: 0000 [#1] SMP > Modules linked in: > Pid: 3158, comm: ntpd Not tainted 3.7.10-linode49 #2 > EIP: 0061:[<c0102e8a>] EFLAGS: 00210282 CPU: 0 > EIP is at set_aliased_prot+0x9a/0x110 > EAX: ffffffea EBX: ee2e6000 ECX: 2c74a063 EDX: 80000019 > ESI: 00000000 EDI: 00002000 EBP: 2c74a063 ESP: ead9da10 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > CR0: 8005003b CR2: 4a258bc4 CR3: 2aa7a000 CR4: 00002660 > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > DR6: ffff0ff0 DR7: 00000400 > Process ntpd (pid: 3158, ti=ead9c000 task=eb6e3300 task.ti=ead9c000) > Stack: > 0007fb6b ee2e6000 80000019 ec1ed000 00000001 ee2e6000 00000200 00002000 > eb6e3300 c0102f27 eb19cc80 eb19cc80 00000000 c010ba5f eb19cc80 eb19cc80 > c013083c c010863b c078c405 00000000 00000000 c091e1c0 c015ade1 00000000 > Call Trace: > [<c0102f27>] ? xen_free_ldt+0x27/0x40 Quite unusual to have a process with LDT these days. And hence quite likely for this to be the culprit. The primary guess therefore would be that the LDT didn't get cleared yet (MMUEXT_SET_LDT). Jan > [<c010ba5f>] ? destroy_context+0x2f/0xa0 > [<c013083c>] ? __mmdrop+0x1c/0xb0 > [<c010863b>] ? __switch_to+0x13b/0x3a0 > [<c078c405>] ? _raw_spin_lock+0x5/0x10 > [<c015ade1>] ? finish_task_switch+0xb1/0xc0 > [<c078b77d>] ? __schedule+0x1fd/0x590 > [<c0639aa7>] ? get_unique_tuple+0x167/0x200 > [<c06258c2>] ? nf_ct_invert_tuple+0x52/0x70 > [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20 > [<c078ad9d>] ? schedule_hrtimeout_range_clock+0x10d/0x120 > [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40 > [<c014ecca>] ? add_wait_queue+0x1a/0x50 > [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c078adbf>] ? schedule_hrtimeout_range+0xf/0x20 > [<c01e4a47>] ? poll_schedule_timeout+0x37/0x50 > [<c01e57f5>] ? do_select+0x445/0x510 > [<c04f97f5>] ? info_for_irq+0x5/0x20 > [<c04f9eb0>] ? evtchn_from_irq+0x10/0x40 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c01e4be0>] ? __pollwait+0xf0/0xf0 > [<c069c55c>] ? udp_sendmsg+0x2ec/0x820 > [<c0678250>] ? ip_append_page+0x4e0/0x4e0 > [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40 > [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c05d9cc2>] ? __skb_recv_datagram+0xf2/0x260 > [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20 > [<c05d1586>] ? lock_sock_fast+0x16/0x50 > [<c069bf76>] ? udp_recvmsg+0x76/0x2e0 > [<c06a293c>] ? inet_recvmsg+0x5c/0xb0 > [<c06a3817>] ? inet_sendmsg+0x47/0xb0 > [<c05ce847>] ? sock_recvmsg+0xe7/0x100 > [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40 > [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c0112017>] ? save_xstate_sig+0x297/0x310 > [<c01e5a6d>] ? core_sys_select+0x1ad/0x2a0 > [<c014165f>] ? __set_task_blocked+0x2f/0x80 > [<c01419a9>] ? set_current_blocked+0x49/0x60 > [<c0142fd4>] ? signal_delivered+0x44/0x60 > [<c0111002>] ? fpu_finit+0x52/0x60 > [<c018adb6>] ? __audit_syscall_exit+0x3b6/0x3f0 > [<c0112749>] ? syscall_trace_leave+0xd9/0x110 > [<c01e5dcf>] ? sys_select+0x2f/0xb0 > [<c078c844>] ? syscall_call+0x7/0xb > Code: d2 0f a4 c2 0c c1 e0 0c 09 da 09 c6 89 f0 ff 15 d8 34 92 c0 31 f6 > 89 c5 8b 5c 24 04 89 54 24 08 89 c1 e8 3a e3 ff ff 85 c0 74 04 <0f> 0b eb fe > 8b > 04 24 8b 54 24 0c c1 e0 05 8b 04 10 c1 e8 1e 69 > EIP: [<c0102e8a>] set_aliased_prot+0x9a/0x110 SS:ESP 0069:ead9da10 > ---[ end trace a8da5f670b3c74bb ]--- > > Several processes have been seen to trigger it - audispd gam java klogd ntpd > perl syslogd > Eventually init will trigger it and cause a kernel panic: > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > > I've done some Googling to see if this is a known issue. All I could find > was a closed Debian bug from Dec 2010 with no resolution: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607709 > > Please let me know if you need further info from me. > > Thanks, > James Sinclair _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.