[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.



>>> On 04.10.17 at 20:26, <hahn@xxxxxxxxxxxxx> wrote:
> Hello,
> 
> with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
> several stuck processes: Here is one (truncated) dump of the Linux
> kernel messages:
> 
>>  [<ffffffff8160db6d>] ? __schedule+0x23d/0x6d0
>>  [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90
>>  [<ffffffff8160e032>] ? schedule+0x32/0x80
>>  [<ffffffff8161150c>] ? schedule_timeout+0x1ec/0x360
>>  [<ffffffff8130f277>] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
>>  [<ffffffff8101b7f1>] ? xen_clocksource_get_cycles+0x11/0x20
>>  [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90
>>  [<ffffffff8160d8b4>] ? io_schedule_timeout+0xb4/0x130
>>  [<ffffffff810bb6f7>] ? prepare_to_wait+0x57/0x80
>>  [<ffffffff8160e8b7>] ? bit_wait_io+0x17/0x60
>>  [<ffffffff8160e3ac>] ? __wait_on_bit+0x5c/0x90
>>  [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90
>>  [<ffffffff8160e50e>] ? out_of_line_wait_on_bit+0x7e/0xa0
>>  [<ffffffff810bba20>] ? autoremove_wake_function+0x40/0x40
>>  [<ffffffffc00abd18>] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
>>  [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720
>>  [<ffffffff810e67fd>] ? try_to_del_timer_sync+0x4d/0x80
>>  [<ffffffffc00b099d>] ? kjournald2+0xdd/0x280 [jbd2]
>>  [<ffffffff810bb9e0>] ? wake_up_atomic_t+0x30/0x30
>>  [<ffffffffc00b08c0>] ? commit_timeout+0x10/0x10 [jbd2]
>>  [<ffffffff81097660>] ? kthread+0xf0/0x110
>>  [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720
>>  [<ffffffff81097570>] ? kthread_park+0x60/0x60
>>  [<ffffffff81612bb5>] ? ret_from_fork+0x25/0x30
>> NMI backtrace for cpu 2
>> CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>> 4.9.30-2A~4.2.0.201709271649
>>  0000000000000000 ffffffff81331935 0000000000000000 0000000000000002
>>  ffffffff81335e60 0000000000000002 ffffffff8104cb70 ffff8801f0c90e80
>>  ffffffff81335f6a ffff8801f0c90e80 00000000003fffbc ffffffff81128048
>> Call Trace:
>>  [<ffffffff81331935>] ? dump_stack+0x5c/0x77
>>  [<ffffffff81335e60>] ? nmi_cpu_backtrace+0x90/0xa0
>>  [<ffffffff8104cb70>] ? irq_force_complete_move+0x140/0x140
>>  [<ffffffff81335f6a>] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
>>  [<ffffffff81128048>] ? watchdog+0x2b8/0x330
>>  [<ffffffff81127d90>] ? reset_hung_task_detector+0x10/0x10
>>  [<ffffffff81097660>] ? kthread+0xf0/0x110
>>  [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720
>>  [<ffffffff81097570>] ? kthread_park+0x60/0x60
>>  [<ffffffff81612bb5>] ? ret_from_fork+0x25/0x30
>> Sending NMI from CPU 2 to CPUs 0-1,3:
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> 4.9.30-2A~4.2.0.201709271649
>> task: ffff8801f4a02ec0 task.stack: ffffc90040ca4000
>> RIP: e030:[<ffffffff810013aa>]  [<ffffffff810013aa>] 
>> xen_hypercall_sched_op+0xa/0x20
>> RSP: e02b:ffffc90040ca7ed0  EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffff8801f4a02ec0 RCX: ffffffff810013aa
>> RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001
>> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
>> R13: 0000000000000000 R14: ffff8801f4a02ec0 R15: ffff8801f4a02ec0
>> FS:  00007f23ac595700(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
>> CR2: 00007f52537d6d46 CR3: 00000001bba23000 CR4: 0000000000002660
>> Stack:
>>  ffff8801bb832201 0000000000000001 ffffffff8101b55c ffffffff81611ec8
>>  ffff8801f4a02ec0 0000000000000001 ffffffff810bc280 ffff8801f4a02ec0
>>  ffff8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 0000000000000000
>> Call Trace:
>>  [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20
>>  [<ffffffff81611ec8>] ? default_idle+0x18/0xd0
>>  [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260
>> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
>> cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc 
>> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
>> NMI backtrace for cpu 3
>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>> 4.9.30-2A~4.2.0.201709271649
>> task: ffff8801f4a24f00 task.stack: ffffc90040cb4000
>> RIP: e030:[<ffffffff810013aa>]  [<ffffffff810013aa>] 
>> xen_hypercall_sched_op+0xa/0x20
>> RSP: e02b:ffffc90040cb7ed0  EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffff8801f4a24f00 RCX: ffffffff810013aa
>> RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001
>> RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
>> R13: 0000000000000000 R14: ffff8801f4a24f00 R15: ffff8801f4a24f00
>> FS:  00007f1a2af19700(0000) GS:ffff8801f5b80000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
>> CR2: 00007f4a5416b000 CR3: 00000001d83ec000 CR4: 0000000000002660
>> Stack:
>>  0000000000000001 0000000000000001 ffffffff8101b55c ffffffff81611ec8
>>  ffff8801f4a24f00 0000000000000003 ffffffff810bc280 ffff8801f4a24f00
>>  ffff8801f4a24f00 77816deb133b9979 addce6dcadd009c9 0000000000000000
>> Call Trace:
>>  [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20
>>  [<ffffffff81611ec8>] ? default_idle+0x18/0xd0
>>  [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260
>> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
>> cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc 
>> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>> 4.9.30-2A~4.2.0.201709271649
>> task: ffffffff81c0e540 task.stack: ffffffff81c00000
>> RIP: e030:[<ffffffff810013aa>]  [<ffffffff810013aa>] 
>> xen_hypercall_sched_op+0xa/0x20
>> RSP: e02b:ffffffff81c03e90  EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffffffff81c0e540 RCX: ffffffff810013aa
>> RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001
>> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
>> R13: 0000000000000000 R14: ffffffff81c0e540 R15: ffffffff81c0e540
>> FS:  00007f977d3c2700(0000) GS:ffff8801f5a00000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000055ae9becabc8 CR3: 00000001d6a40000 CR4: 0000000000002660
>> Stack:
>>  0000000000000001 0000000000000001 ffffffff8101b55c ffffffff81611ec8
>>  ffffffff81c0e540 0000000000000000 ffffffff810bc280 ffffffff81c0e540
>>  ffffffff81c0e540 ebeac214e330bd3f addce6dcadd009c9 ffffffffffffffff
>> Call Trace:
>>  [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20
>>  [<ffffffff81611ec8>] ? default_idle+0x18/0xd0
>>  [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260
>>  [<ffffffff81d4bf84>] ? start_kernel+0x46d/0x48d
>>  [<ffffffff81d510e6>] ? xen_start_kernel+0x52e/0x538
>> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
>> cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc 
>> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
> 
> 
> Looking at the dis-assembly of xen_clocksource_get_cycles() in
> arch/x86/xen/time.c I see no path how that should call
> __blk_mq_run_hw_queue():

Hence the question marks ahead of the stack entries: What you see
there are likely leftovers from prior call trees. It just so happens
that the old return address slots haven't got overwritten yet. You
need to first sanitize the stack trace e.g. by having the kernel
dump more of the stack in raw hex form, and then looking at the
disassembly to figure out how large each stack frame is, starting
at the top-most address (i.e. the one in RIP).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.