[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259]



Keir Fraser wrote:
On 19 Jun 2006, at 14:12, Charles Duffy wrote:

I'm seeing the same behavior I previously reported against xen-3.0-testing changeset 8259, albeit much more sporadically, on Xen 3.0.2 (with a 2.6.16.16 kernel built via the Gentoo Xen packages). I'd use stock XenSource binaries, but last I checked they don't have support for some of my hardware (ie. the 3w9xxx driver).

Hints on anything I can do to provide more detailed information (in the hopes of actually getting this fixed) would be welcome.

Does it always crash in __dequeue_signal()? You might have to add some tracing in there to find out exactly which part of the function it is crashing in.

Okay. I've rebuilt against a debug-enabled kernel, and (on getting another panic) decompiled vmlinux to try to match the instructions it's failing in to an individual line.

The crash appears to be occurring in this second instruction generated associated with kernel/signal.c:1976 (from Linux-2.6.16.16+Xen 3.0.2):

kernel/signal.c:1976
                        /* Run the handler.  */
                        *return_ka = *ka;

ffffffff8013d152:  48 8b 75 d0    mov 0xffffffffffffffd0(%rbp),%rsi
ffffffff8013d156:  48 89 06       mov %rax,(%rsi) <<<=== HERE
ffffffff8013d159:  48 8b 42 f0    mov 0xfffffffffffffff0(%rdx),%rax
ffffffff8013d15d:  48 89 46 08    mov %rax,0x8(%rsi)
ffffffff8013d161:  48 8b 42 f8    mov 0xfffffffffffffff8(%rdx),%rax
ffffffff8013d165:  48 89 46 10    mov %rax,0x10(%rsi)
ffffffff8013d169:  48 8b 41 18    mov 0x18(%rcx),%rax
ffffffff8013d16d:  48 89 46 18    mov %rax,0x18(%rsi)

My x86 assembler is tremendously rusty, but it looks to me like return_ka (which is passed in as a parameter to get_signal_to_deliver) points somewhere it shouldn't.

This parameter is passed in from arch/x86_64/kernel/signal.c's do_signal(), where it's declared as a function-local variable with its home on the stack. The code all looks fine at a glance -- but since the top of the stack is at ffff88013e87fe18, it doesn't make much sense for a variable living on the stack defined just a few calls ago to be at 7c51186269a192da. I'm guessing there's some kind of funky race condition going on -- but beyond that vague assertion, I'm pretty much lost. Ideas, anyone?


ksymoops output follows:

CPU 0
Pid: 16571, comm: java Not tainted 2.6.16.18-xen #4
RIP: e030:[<ffffffff8013d156>] <ffffffff8013d156>{get_signal_to_deliver+662}
Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64
RSP: e02b:ffff88013e87fdc8  EFLAGS: 00010406
RAX: 00002ab89d447a1b RBX: 000000000000000a RCX: ffff88000061eb68
RDX: ffff88000061eb80 RSI: 7c51186269a192da RDI: ffff880144962750
RBP: ffff88013e87fe18 R08: 0000000000000000 R09: 0000000000003a66
R10: 0000000000000000 R11: ffffffff8010b27e R12: 000000000000000a
R13: ffff88013e87fe48 R14: 0000000000000008 R15: ffff88013e87fe48
FS:  00002b47b9c0f900(0063) GS:ffffffff80535000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Stack: ffff88013e87fe68 7acefa865eaca248 26ab946c27ba950b 46b67a71dd1c67e3
       7c51186269a192da 1287d8a161cad8d5 60de4c306b46ae9f a035a0ac294ee773
       6cd46345a1e152ae 228b761ceaf9a045
Call Trace: <ffffffff8010b27e>{system_call+134} <ffffffff8010ad69>{sys_rt_sigsuspend+249}
       <ffffffff8010b681>{ptregscall_common+61}
Code: 48 89 06 48 8b 42 f0 48 89 46 08 48 8b 42 f8 48 89 46 10 48


>>RIP; ffffffff8013d156 <get_signal_to_deliver+296/6e0>   <=====

>>RAX; 00002ab89d447a1b <__crc_ioctl_by_bdev+2ab79d5e3940/fffffffe8029bf25>
>>RCX; ffff88000061eb68 <__crc_ioctl_by_bdev+ffff87ff007baa8d/fffffffe8029bf25> >>RDX; ffff88000061eb80 <__crc_ioctl_by_bdev+ffff87ff007baaa5/fffffffe8029bf25> >>RSI; 7c51186269a192da <__crc_ioctl_by_bdev+7c51186169bb51ff/fffffffe8029bf25> >>RDI; ffff880144962750 <__crc_ioctl_by_bdev+ffff880044afe675/fffffffe8029bf25> >>RBP; ffff88013e87fe18 <__crc_ioctl_by_bdev+ffff88003ea1bd3d/fffffffe8029bf25>
>>R11; ffffffff8010b27e <system_call+86/8b>
>>R13; ffff88013e87fe48 <__crc_ioctl_by_bdev+ffff88003ea1bd6d/fffffffe8029bf25> >>R15; ffff88013e87fe48 <__crc_ioctl_by_bdev+ffff88003ea1bd6d/fffffffe8029bf25>

Trace; ffffffff8010b27e <system_call+86/8b>
Trace; ffffffff8010b681 <ptregscall_common+3d/64>

Code;  ffffffff8013d156 <get_signal_to_deliver+296/6e0>
0000000000000000 <_RIP>:
Code;  ffffffff8013d156 <get_signal_to_deliver+296/6e0>   <=====
   0:   48 89 06                  mov    %rax,(%rsi)   <=====
Code;  ffffffff8013d159 <get_signal_to_deliver+299/6e0>
   3:   48 8b 42 f0               mov    0xfffffffffffffff0(%rdx),%rax
Code;  ffffffff8013d15d <get_signal_to_deliver+29d/6e0>
   7:   48 89 46 08               mov    %rax,0x8(%rsi)
Code;  ffffffff8013d161 <get_signal_to_deliver+2a1/6e0>
   b:   48 8b 42 f8               mov    0xfffffffffffffff8(%rdx),%rax
Code;  ffffffff8013d165 <get_signal_to_deliver+2a5/6e0>
   f:   48 89 46 10               mov    %rax,0x10(%rsi)
Code;  ffffffff8013d169 <get_signal_to_deliver+2a9/6e0>
  13:   48 00 00                  rex64 add    %al,(%rax)


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.