|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen-4.3 - curious crash
Hello,
Last night, XenRT discovered an interesting host crash. The crash
itself somewhat concerning, but lack of information does highlight an
area which could do with easier debugability.
Here is the results from the serial console. The server in question is
a Supermicro Xeon X5376 system which has not exhibited stability issues
in the past, and seems fine for tests during today.
I have linearised the stack and applied notes beside.
----[ Xen-4.3.1-xs82408-d x86_64 debug=y Not tainted ]----
CPU: 4
RIP: e008:[<ffff82c4c0235a92>] compat_create_bounce_frame+0x8/0xec
RFLAGS: 0000000000010046 CONTEXT: hypervisor
rax: 0000000000000061 rbx: ffff8300cfafa000 rcx: ffff82c4c02ffd80
rdx: ffff8300cfafa570 rsi: ffff83022eacfd00 rdi: ffff8300cfafa000
rbp: ffff83022eacfd60 rsp: ffff83022eacff08 r8: 0000000000000000
r9: 0000000000000000 r10: ffff83022ead32e8 r11: 00001ac42042804f
r12: ffff8300cfafa000 r13: 0000000000000004 r14: ffff8300cfd3f000
r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0
cr3: 0000000228dde000 cr2: 00000000b74e4f10
ds: 007b es: 007b fs: 00d8 gs: 00e0 ss: 0000 cs: e008
Xen stack trace from rsp=ffff83022eacff08:
0000000000000093 | rflags from pushfq in ASSERT_INTERRUPTS_ENABLED
ffff82c4c02358d8 | RA? compat/entry.S:123 in compat_test_all_events()
0000000000000001 | r15
ffff8300cfd3f000 | r14
0000000000000004 | r13
ffff8300cfafa000 | r12
00000000c1695ec0 | ebp
00000000deadbeef | ebx
0000000000000000 | r11
00000000deadbeef | r10
ffff8300cfafa060 | r9
0000000000000000 | r8
0000000000000000 | eax
00000000deadbeef | ecx
00000000ee8507a0 | edx
00000000c23a7000 | esi
0000000000000000 | edi
0002010000000000 | TRAP_syscall | TRAP_regs_dirty
00000000c10013a7 + (hypercall page) __HYPERCALL_sched_op
0000000000000061 |
0000000000000246 | Exception frame from ring1 kernel
00000000c1695eb0 |
0000000000000069 +
0000000000000000 | es
0000000000000000 | ds
0000000000000000 | fs
0000000000000000 | gs
0000000000000004 | cpu_info.processor_id
ffff8300cfafa000 | cpu_info.current_vcpu
0000003d6e797180 | cpu_info.per_cpu_offset
0000000000000000 +
Xen call trace:
[<ffff82c4c0235a92>] compat_create_bounce_frame+0x8/0xec
Xen has failed the ASSERT_INTERRUPTS_ENABLED check at the very top of
compat_create_bounce_frame, which itself lacks a bugframe which is why
it is not automatically recognised as an assertion.
Following the code back using what I presume to be a return address as
the penultimate word on the stack, the codeflow looks like:
compat_test_all_events:
...
sti
leaq ...
5x mov ...
call compat_create_bounce_frame
jmp compat_test_all_events
compat_create_bounce_frame:
pushfq
testb
jnz
ud2
What I presume has happened is that after 'sti', Xen has taken an
interrupt, which has caused some form of corruption. Judging from the
top word on the stack, rflags looks quite corrupt. Unfortunatly, this
is all the available information. (The crash kernel failed to boot
which is another issue I am looking into).
For crashes like this, particularly when attempting to leave Xen context
and return back to a guest, the information provided by the stack trace
is quite lacking; The interesting information is what is what has just
been popped off the stack (which I am hoping would have been another
exception frame)
Would it be sensible to have some indication that we are on the way out
of Xen, so errors in situations like this can take a chance to print
some of the recently popped stack values? I know it wont be terribly
heavily used debugging, but think it is probably worth the effort for
situations like this where there is simply not enough information to
diagnose the issue.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |