[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [BUG] XEN crash and double fault when doing cpu online/offline



Hi,

When I use xen-hptool cpu-offline/cpu-online to let CPU in a socket online/offline using the script as follows:

for((j=48;j<=95;j++));
do
  xen-hptool cpu-offline $j
done

for((j=48;j<=95;j++));
do
  xen-hptool cpu-online $j
done

Xen crash when cpu re-online. I use the upstream XEN(0dd92688) and try many days, it still crash. But if I only do cpu online/offline for CPU 48~59, Xen will not crash. The bug can be reproduced when we do cpu online/offline for most CPU in a socket. And interesting thing is when we use the script as follow:

for((j=48;j<=95;j++));
do
  xen-hptool cpu-offline $j
  xen-hptool cpu-online $j
done

Xen will not crash too. Is there a bug in sched_credit2?

The crash message as follows:

(XEN) Adding cpu 77 to runqueue 1
(XEN) Adding cpu 78 to runqueue 1
(XEN) Adding cpu 79 to runqueue 1
(XEN) Adding cpu 80 to runqueue 1
(X(ENXE) N) *** DOUBLE FAULT ***
(XEN) Assertion 'debug->cpu == smp_processor_id()' failed at spinlock.c:88
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) Debugging connection not set up.
(XEN) CPU:    48
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080240bfc>] _spin_unlock+0x40/0x42
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffff830059027fff   rbx: 0000000000000046   rcx: 0000000000000000
(XEN) rdx: 0000000000000030   rsi: 0000000000000046   rdi: ffff82d080819860
(XEN) rbp: ffff830059027a78   rsp: ffff830059027a78   r8:  0000000000000000
(XEN) r9:  0000000000000004   r10: 0000000000000001   r11: 0000000000000002
(XEN) r12: ffff82d08044d270   r13: 0000000000000010   r14: ffff82d08044d270
(XEN) r15: ffff82d0808197e0   cr0: 000000008005003b   cr4: 00000000003526e0
(XEN) cr3: 0000000059014000   cr2: 00007f9d0fbc1cd9
(XEN) fsb: 00007feb9960a740   gsb: ffff88fcdafc0000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d080240bfc> (_spin_unlock+0x40/0x42):
(XEN) ff 0f 66 83 07 01 5d c3 <0f> 0b 55 48 89 e5 e8 b5 ff ff ff fb 5d c3 55 48
(XEN) Xen stack trace from rsp=ffff830059027a78:
(XEN)    ffff830059027a90 ffff82d080240c17 0000000000000020 ffff830059027ae8
(XEN)    ffff82d080252ea9 0000000d8081a6a0 0000000000000046 ffff82d080819860
(XEN)    0000001000000000 0000000000000006 ffff82d08044d26a ffff82d08093e700
(XEN)    0000000000000086 ffff830059027b98 ffff830059027af8 ffff82d08024fe41
(XEN)    ffff830059027b18 ffff82d08024fe7d 0000000000000000 ffff82d08092f3a0
(XEN)    ffff830059027b80 ffff82d08024fee2 ffff830059027b50 ffff82d0802fa68e
(XEN)    0000000000000001 ffff830059027b60 ffff82d080240b77 ffff82d080819718
(XEN)    ffff82d08045b4d0 ffff82d08092f3a0 ffff830059027bd8 0000000000000086
(XEN)    ffff82d08093e71e ffff830059027bc8 ffff82d0802503ea ffff82d08044d26a
(XEN)    ffff82d08093e703 0000000000000051 ffff83203ffe20b0 ffff8320104e00d8
(XEN)    0000000000000001 ffff8323996aad00 ffff830059027c20 ffff82d080250502
(XEN)    ffff82d000000018 ffff830059027c30 ffff830059027bf0 ffff830059027c38
(XEN)    0000000000000051 0000000000000001 0000000000000001 ffff83239969f580
(XEN)    0000000000000003 ffff830059027c80 ffff82d0802303e8 0000000000000051
(XEN)    0000005159027c78 ffff82d080952b80 00000000000000e0 ffff8323996aad00
(XEN)    ffff83203ffe20b0 ffff83239969f580 ffff82d080930008 ffff82d08094c840
(XEN)    0000000000000051 ffff830059027cc0 ffff82d0802307e1 ffff8323996aad00
(XEN)    0000000000000051 ffff82d080930008 ffff82d080803660 0000000000000051
(XEN)    ffff8323996aad00 ffff830059027d58 ffff82d08023f1fd ffff830059027d10
(XEN)    0000000000000206 ffff82d080819680 ffff83239969f580 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d080240bfc>] R _spin_unlock+0x40/0x42
(XEN)    [<ffff82d080240c17>] F _spin_unlock_irqrestore+0xd/0x24
(XEN)    [<ffff82d080252ea9>] F serial_puts+0x131/0x141
(XEN)    [<ffff82d08024fe41>] F console_serial_puts+0x28/0x2a
(XEN)    [<ffff82d08024fe7d>] F drivers/char/console.c#__putstr+0x3a/0x8b
(XEN) [<ffff82d08024fee2>] F drivers/char/console.c#printk_start_of_line+0x14/0x17b (XEN) [<ffff82d0802503ea>] F drivers/char/console.c#vprintk_common+0x8d/0x158
(XEN)    [<ffff82d080250502>] F printk+0x4d/0x4f
(XEN)    [<ffff82d0802303e8>] F common/sched_credit2.c#init_pdata+0xdd/0x441
(XEN) [<ffff82d0802307e1>] F common/sched_credit2.c#csched2_switch_sched+0x95/0xe2
(XEN)    [<ffff82d08023f1fd>] F schedule_cpu_add+0x18a/0x3fd
(XEN) [<ffff82d080201a9f>] F common/cpupool.c#cpupool_assign_cpu_locked+0x58/0x189
(XEN)    [<ffff82d080201ee2>] F common/cpupool.c#cpu_callback+0x186/0x3c1
(XEN)    [<ffff82d0802242c0>] F notifier_call_chain+0x6b/0x96
(XEN) [<ffff82d080200f95>] F common/cpu.c#cpu_notifier_call_chain+0x1b/0x33
(XEN)    [<ffff82d080201215>] F cpu_up+0xa8/0xe5
(XEN)    [<ffff82d0802a7e28>] F cpu_up_helper+0xf/0xa5
(XEN) [<ffff82d080205d5d>] F common/domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
(XEN)    [<ffff82d080242ddb>] F common/tasklet.c#do_tasklet_work+0x76/0xa9
(XEN)    [<ffff82d0802430bc>] F do_tasklet+0x58/0x8a
(XEN)    [<ffff82d0802751e8>] F arch/x86/domain.c#idle_loop+0x40/0x9b
(XEN)
(XEN) RIP:    e008:[<ffff82d0bffcf800>](XEN)
(XEN) ****************************************
 ffff82d0bffcf800(XEN) Panic on CPU 0:

(XEN) RFLAGS: 0000000000010006 (XEN) Assertion 'debug->cpu == smp_processor_id()' failed at spinlock.c:88
CONTEXT: hypervisor(XEN) ****************************************
(XEN)

(XEN) rax: 0000000000000018   rbx: 00008c7d886a544d   rcx: ffffffff8100130a
(XEN) Reboot in five seconds...
(XEN) rdx: ffffc90040798e40   rsi: 0000000000000004   rdi: 0000000000000008
(XEN) Debugging connection not set up.
(XEN) rbp: 0000000000000004   rsp: ffffc90040798e28   r8:  00008c7cde95e94d
(XEN) r9:  0000006185e58599   r10: 000000000000010a   r11: 0000000000000206
(XEN) r12: ffff88fcdaf17140   r13: ffff88fcdaf1e438   r14: ffff88fcdaf1e478
(XEN) r15: ffff88fcdaf1e4b8   cr0: 0000000080050033   cr4: 00000000003426e0
(XEN) cr3: 000000238bc52000   cr2: ffffc90040798e18
(XEN) fsb: 0000000000000000   gsb: ffff88fcdaf00000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0bffcf800> (ffff82d0bffcf800):
(XEN) 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (XEN) Current stack base ffffc90040798000 differs from expected ffff837e77008000 (XEN) Valid stack range: ffffc9004079e000-ffffc900407a0000, sp=ffffc90040798e28, tss.rsp0=ffff837e7700ffa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 48:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Debugging connection not set up.
(XEN) Debugging connection not set up.
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<0000000067b4cb2d>] 0000000067b4cb2d
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830059027780   rcx: 0000000067c50000
(XEN) rdx: 0000000000000000   rsi: 00000000003526e0   rdi: ffff830059027760
(XEN) rbp: ffff8300590278a0   rsp: ffff8300590276c0   r8:  ffff830059027780
(XEN) r9:  ffff830059027760   r10: 0000000067b4e1b8   r11: 0101010101010101
(XEN) r12: 00000000fffffffe   r13: 0000000000000000   r14: 0000000000000065
(XEN) r15: 0000000000000003   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 000000203fe4e000   cr2: 0000000067c50010
(XEN) fsb: 00007feb9960a740   gsb: ffff88fcdafc0000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <0000000067b4cb2d> (0000000067b4cb2d):
(XEN) 6b c0 10 48 8b 4c 24 20 <48> 8b 44 01 10 48 89 44 24 28 48 8b 44 24 28 48
(XEN) Xen stack trace from rsp=ffff8300590276c0:
(XEN)    ffff832010000400 00000000000000f1 ffff8300590276e0 ffff8300590276f8
(XEN)    0000000067c50000 ffff82d0802510dc ffff82d0808197e0 0000000067b4bf3c
(XEN)    ffff830059027780 0000000000000060 ffff82d08093ebe0 ffff82d0808197e0
(XEN)    000000203fe4e000 0000000067b4b590 ffff830059027800 ffff82d080240b77
(XEN)    ffff832010000400 00000000000000f1 ffff830059027760 0000000067aeb54b
(XEN)    ffff8300590277c8 ffff82d0802858ca ffff82d080389845 ffff832010000424
(XEN)    00000000000fa000 67c5000000000200 ffff82d080389845 0000000067aeb8d7
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff830059027fff
(XEN)    0000000000000000 0000000067aeb6ae ffff82d0803898ba 0000000000000003
(XEN)    00000000003526e0 ffff830059027840 0000000000000000 0000000067aeb476
(XEN)    ffff830000000000 ffff830059027860 0000000059014000 0000000000000000
(XEN)    ffff830059027850 ffff82d080386464 0000000000000000 ffff82d080386768
(XEN)    0000000000000000 00000000fffffffe ffff8300590278a0 ffff82d080386739
(XEN)    0000000059014000 0000000000000283 ffff830059027888 000000000000e010
(XEN)    ffff82d0802a4d55 0000000000000046 ffff82d08046b5d0 ffffffffffffffff
(XEN)    ffff8300590278f0 ffff82d0802a4607 0000138800000000 0000000000000000
(XEN)    0000000000000000 0000000000000046 ffff82d08046b5d0 ffff82d080240bfe
(XEN)    ffff82d08044c815 0000000000000003 ffff830059027958 ffff82d080250ea9
(XEN)    ffff830000000028 ffff830059027968 ffff830059027918 ffff82d080240e1c
(XEN)    ffff82d08045c610 ffff82d080451a57 ffff82d08044c815 0000000000000058
(XEN) Xen call trace:
(XEN)    [<0000000067b4cb2d>] R 0000000067b4cb2d
(XEN)    [<ffff832010000400>] S ffff832010000400
(XEN)    [<ffff82d0802a4607>] F machine_restart+0x168/0x28a
(XEN)    [<ffff82d080250ea9>] F console_suspend+0/0x28
(XEN)    [<ffff82d0802abf51>] F do_invalid_op+0x387/0x3b5
(XEN) [<ffff82d080389a3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
(XEN)    [<ffff82d080240bfc>] F _spin_unlock+0x40/0x42
(XEN)    [<ffff82d080240c17>] F _spin_unlock_irqrestore+0xd/0x24
(XEN)    [<ffff82d080252ea9>] F serial_puts+0x131/0x141
(XEN)    [<ffff82d08024fe41>] F console_serial_puts+0x28/0x2a
(XEN)    [<ffff82d08024fe7d>] F drivers/char/console.c#__putstr+0x3a/0x8b
(XEN) [<ffff82d08024fee2>] F drivers/char/console.c#printk_start_of_line+0x14/0x17b (XEN) [<ffff82d0802503ea>] F drivers/char/console.c#vprintk_common+0x8d/0x158
(XEN)    [<ffff82d080250502>] F printk+0x4d/0x4f
(XEN)    [<ffff82d0802303e8>] F common/sched_credit2.c#init_pdata+0xdd/0x441
(XEN) [<ffff82d0802307e1>] F common/sched_credit2.c#csched2_switch_sched+0x95/0xe2
(XEN)    [<ffff82d08023f1fd>] F schedule_cpu_add+0x18a/0x3fd
(XEN) [<ffff82d080201a9f>] F common/cpupool.c#cpupool_assign_cpu_locked+0x58/0x189
(XEN)    [<ffff82d080201ee2>] F common/cpupool.c#cpu_callback+0x186/0x3c1
(XEN)    [<ffff82d0802242c0>] F notifier_call_chain+0x6b/0x96
(XEN) [<ffff82d080200f95>] F common/cpu.c#cpu_notifier_call_chain+0x1b/0x33
(XEN)    [<ffff82d080201215>] F cpu_up+0xa8/0xe5
(XEN)    [<ffff82d0802a7e28>] F cpu_up_helper+0xf/0xa5
(XEN) [<ffff82d080205d5d>] F common/domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
(XEN)    [<ffff82d080242ddb>] F common/tasklet.c#do_tasklet_work+0x76/0xa9
(XEN)    [<ffff82d0802430bc>] F do_tasklet+0x58/0x8a
(XEN)    [<ffff82d0802751e8>] F arch/x86/domain.c#idle_loop+0x40/0x9b
(XEN)
(XEN) Pagetable walk from 0000000067c50010:
(XEN)  L4[0x000] = 000000203fe4d063 ffffffffffffffff
(XEN)  L3[0x001] = 000000005900d063 ffffffffffffffff
(XEN)  L2[0x13e] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000067c50010
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Debugging connection not set up.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

Attachment: crash_upstream.log
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.