[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Help with: Xen BUG at timer.c:189
Hi list, I'am struggling with a hypervisor panic. The hypervisor version is 4.4.3, yes I know - very old ;-), but the affected code hasn't much changed. The panic was never seen before and occured while booting the system, only dom0 was running. (XEN) Xen BUG at timer.c:189 (XEN) ----[ Xen-4.4.3_02-26.2_x3 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 14 (XEN) RIP: e008:[<ffff82d08012c3e6>] remove_entry+0x26/0x80 (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) rax: 00000000000000bc rbx: ffff830839bcd5c8 rcx: ffff82d080343a80 (XEN) rdx: 000000000000d5f0 rsi: 0000002887fa0000 rdi: ffff830839bcd5c8 (XEN) rbp: 000000000000d5f0 rsp: ffff830839bbfe20 r8: 0000002887211398 (XEN) r9: 000000288721132f r10: 00000000bf8b2994 r11: ffff82d08011cfa0 (XEN) r12: ffff830839bcd5c8 r13: ffff82d080343a80 r14: 000000000000d5f0 (XEN) r15: ffff82d080343a80 cr0: 000000008005003b cr4: 00000000001526f0 (XEN) cr3: 0000000075679000 cr2: 000000000069b8ec (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff830839bbfe20: (XEN) ffff830839bcb020 ffff82d08012d3a8 0000002887fa0000 0000000000000286 (XEN) ffff830839bcd5f0 ffff830839bcd9c0 0000002887c1915d 000000000000000e (XEN) ffff82d080343a70 0000000000000002 ffff830839bcda30 ffff82d0801affd0 (XEN) 000000010000000e 00000028872113f4 ffffffffffffffff ffff82d08030fc00 (XEN) ffff82d0802ffc00 ffff82d080189a1c 0000000000000000 0000000000000000 (XEN) ffffffffffffffff 0000107800076e92 ffff82d0801292aa ffff830839bb8000 (XEN) ffff8300754fd000 ffff82d08031abe0 0000000000000000 ffff830839bcb048 (XEN) 0000000000000000 ffff82d080161ad6 0000000000000001 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000009 (XEN) ffff8803dda20010 0000000000000246 0000000000000000 00000000fffffffe (XEN) 0000000000000000 0000000000000000 ffffffff800033aa 0000000000000000 (XEN) 0000000000000000 0000000000000001 0000010000000000 ffffffff800033aa (XEN) 000000000000e033 0000000000000246 ffff8803dda21f08 000000000000e02b (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 000000000000000e ffff8300754fd000 00000037b9887700 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82d08012c3e6>] remove_entry+0x26/0x80 (XEN) [<ffff82d08012d3a8>] set_timer+0x178/0x230 (XEN) [<ffff82d0801affd0>] mwait_idle+0x270/0x370 (XEN) [<ffff82d080189a1c>] local_time_calibration+0x19c/0x1a0 (XEN) [<ffff82d0801292aa>] __do_softirq+0x6a/0xa0 (XEN) [<ffff82d080161ad6>] idle_loop+0x16/0x50 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 14: (XEN) Xen BUG at timer.c:189 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Executing kexec image on cpu14 (XEN) Shot down all CPUs This is the call of BUG() in remove_entry() in case timer->status is not TIMER_STATUS_in_heap or TIMER_STATUS_in_list. With the address of the timer from the vmcore: crash> struct timer ffff830839bcd5c8 struct timer { expires = 0, { heap_offset = 968676784, list_next = 0xffff830839bcd5b0, inactive = { next = 0xffff830839bcd5b0, prev = 0x200200200000050 } }, function = 0xffff830839bcd5e0, data = 0xffff830839bcd5e0, cpu = 54768, status = 188 '\274' } This looks like the structure was overwritten. The call path seems to be: mwait_idle() sched_tick_resume() csched_tick_resume() set_timer() remove_timer() In sched_tick_resume(): unsigned int cpu = smp_processor_id(); I extracted struct cpu_info from the stack and found: crash> cpu_info ffff830839bbff18 struct cpu_info { ... processor_id = 14, current_vcpu = 0xffff8300754fd000, per_cpu_offset = 0x37b9887700, ... And from the code in sched_credit.c: spc = CSCHED_PCPU(cpu); I calculated spc: crash> csched_pcpu 0xffff830839be2b40 struct csched_pcpu { runq = { next = 0xffff830839be2b40, prev = 0xffff830839be2b40 }, runq_sort_last = 0, ticker = { expires = 174080000000, { heap_offset = 1, list_next = 0x100100100000001, inactive = { next = 0x100100100000001, prev = 0x200200200200200 } }, function = 0xffff82d08011f200 <csched_tick>, data = 0xe, cpu = 14, status = 3 '\003' }, tick = 0, idle_bias = 47, balance_mask = 0xffff830839be2bb0 } And this ticker looks ok! So the ticker address extracted from the vmcore in remove_entry() is not the same as the expected one read from the code. Did I made wrong assumptions when looking at the code/vmcore? Has anyone seen such a case or has anybody a hint for further investigation? Any help would be very welcome. Thanks in advance. Dietmar. -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |