[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Fix scheduler crash after s3 resume

On 23/01/13 17:11, Jan Beulich wrote:
So can you confirm that this is not a problem on a non-debug
hypervisor? I'm particularly asking because, leaving the ASSERT()
aside, such a fundamental flaw would have made it impossible
for S3 to work for anyone, and that's reportedly not the case.

I tested with non debug hypervisor, without the patch, and it seems I get one of two outcomes: either it works, but the cpupool is emptied of all cpus but cpu 0 (so I presume nothing will get scheduled on them after resume), evidenced by "dump run queues" debug key, or I get the following crash:

(XEN) ----[ Xen-4.3-unstable  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48012113a>] vcpu_wake+0x4a/0x3b0
(XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
(XEN) rax: 00007d3b7fd10180   rbx: ffff8300ba6f3000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 0000000000000040   rdi: ffff8300ba6f3000
(XEN) rbp: ffff82c4802da1c0   rsp: ffff82c4802af998   r8:  0000000000000016
(XEN) r9:  ffff8301300bc0c8   r10: 0000000000000002   r11: ffff83013797d010
(XEN) r12: ffff82c4802efee0   r13: 0000000000000092   r14: ffff82c4802efee0
(XEN) r15: ffff82c4802da1c0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000ba65d000   cr2: 0000000000000060
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802af998:
(XEN)    ffff82c4802afbc0 ffffffffffffff49 0000000000000000 ffff82c400000010
(XEN)    0000000800000008 ffff8300ba6f3000 0000000000000000 000000000000001b
(XEN)    ffff830137984000 ffff83013c5e00f0 0000000000000001 ffff82c48015b6fd
(XEN)    000000000013c4be ffff830137984000 0000000000000027 ffff82c480106375
(XEN)    ffff82c4802a8000 7400000000000000 ffff8301379c1b80 ffff82c4802c7800
(XEN)    ffff82c4802f02e8 ffff82c480165a2b ffff8800ba60e3c8 ffff8301300bc800
(XEN)    000000000000001b ffff82c4802afbf8 ffff8301379c1ba4 0000000000000000
(XEN)    ffffffffffff8000 2000000000000001 ffff82c4802afab8 ffff82c4802c7800
(XEN)    ffff82c4802f02e8 0000000000000000 ffff8301379c0080 ffff82c4802afbf8
(XEN)    00000000000000f0 ffff82c48015d507 00000000000000f0 ffff82c4802afbf8
(XEN)    ffff8301379c0080 0000000000000000 ffff82c4802f02e8 ffff82c4802c7800
(XEN)    0000000000000400 0000000000000002 0000000000000000 0000000000000002
(XEN)    ffff82c4802efea0 ffff82c48024dfc0 ffff82c4802a8000 ffff8301379c00a4
(XEN)    ffff8301379c00a4 0000006900000000 ffff82c480165cf7 000000000000e008
(XEN)    0000000000000202 ffff82c4802afb78 000000000000e010 ffff82c480165cf7
(XEN)    0000000000000096 0000000000000000 ffff82c48024dfc0 0000000000000000
(XEN)    ffff8301379c00a4 0000000000000092 0000000000000096 ffff82c48013bb56
(XEN)    ffff82c480263700 0000000000000000 ffff82c4802efea0 0000000000000003
(XEN)    ffff82c4802f0290 ffff82c4802a8000 00000000000026f0 ffff82c48015d507
(XEN)    00000000000026f0 ffff82c4802a8000 ffff82c4802f0290 0000000000000003
(XEN) Xen call trace:
(XEN)    [<ffff82c48012113a>] vcpu_wake+0x4a/0x3b0
(XEN)    [<ffff82c48015b6fd>] vcpu_kick+0x1d/0x80
(XEN)    [<ffff82c480106375>] evtchn_set_pending+0x105/0x180
(XEN)    [<ffff82c480165a2b>] do_IRQ+0x1db/0x5e0
(XEN)    [<ffff82c48015d507>] common_interrupt+0x57/0x60
(XEN)    [<ffff82c480165cf7>] do_IRQ+0x4a7/0x5e0
(XEN)    [<ffff82c480165cf7>] do_IRQ+0x4a7/0x5e0
(XEN)    [<ffff82c48013bb56>] __serial_putc+0x76/0x160
(XEN)    [<ffff82c48015d507>] common_interrupt+0x57/0x60
(XEN)    [<ffff82c48019a0ef>] enter_state+0x29f/0x390
(XEN)    [<ffff82c480139ce0>] serial_rx+0/0xa0
(XEN)    [<ffff82c480100000>] _stext+0/0x14
(XEN)    [<ffff82c480110b57>] handle_keypress+0x67/0xd0
(XEN)    [<ffff82c48013bca6>] serial_rx_interrupt+0x66/0xe0
(XEN)    [<ffff82c48013a1db>] __ns16550_poll+0x4b/0xb0
(XEN)    [<ffff82c48013a190>] __ns16550_poll+0/0xb0
(XEN)    [<ffff82c480139fd6>] ns16550_poll+0x26/0x30
(XEN)    [<ffff82c480182499>] do_invalid_op+0x3d9/0x3f0
(XEN)    [<ffff82c480139fb0>] ns16550_poll+0/0x30
(XEN)    [<ffff82c480216355>] handle_exception_saved+0x2e/0x6c
(XEN)    [<ffff82c480139fb0>] ns16550_poll+0/0x30
(XEN)    [<ffff82c480139fcc>] ns16550_poll+0x1c/0x30
(XEN)    [<ffff82c480125a3b>] timer_softirq_action+0xbb/0x2b0
(XEN)    [<ffff82c48012311a>] __do_softirq+0x5a/0x90
(XEN)    [<ffff82c480156e95>] idle_loop+0x25/0x50
(XEN) Pagetable walk from 0000000000000060:
(XEN)  L4[0x000] = 000000013a80d063 ffffffffffffffff
(XEN)  L3[0x000] = 000000013a80c063 ffffffffffffffff
(XEN)  L2[0x000] = 000000013a80b063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************

With the patch, there's no crash and the cpupool correctly contains 4 cpus after resume (same as before suspend).

Sidenote: I was also getting the vcpu_wake crashes earlier, on debug versions, when I just commented out the asserts for test purposes. So just assumed its a consequence of a violated assert later on.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.