[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume

As long as that doesn't lead to other pools still come back
corrupted after resume.

You may need to work this out with Juergen.

Nods. My intention is to just offset the cpupool0 valid mask clear done on __cpu_disable().

The change you're about to partly revert was correcting one
fundamental mistake: bringing down a CPU at runtime is a
different thing than doing the same during suspend. In the
former case you indeed want all associations to it to be cut off,
whereas in the S3 case you want everything to come back at
resume the way it was before suspend. So I think you're just
trying to revert to much of that original change.

But again, Keir and Juergen (who collectively did that change
iirc) would be good for you to consult with.

Understood, it'd certainly be preferable if this could be fixed leaving the original intent in (I tried but so far failed). From my limited understanding of reading the scheduler code, it is actually the vcpu_migrate() function called from cpu_disable_scheduler which substitutes new v->processor in the vcpu struct, which in turn causes the vcpu_wakes to stop happening on these partially downed cpus, and hence avoids the crash.

It certainly looks wrong for vcpu_wake() to be called on an offline
CPU in the first place (i.e. I'm getting the impression you're trying
to cure symptoms rather than the root cause). Did you note down
the call tree that got you there?

Here's an example one, taken from unpatched suspend path - was pretty hard to get as for some reason I am getting a lot of serial output eaten even with sync_console. I also seen it from few other places, believe the window of opportunity for it to happen is before the enable_nonboot_boot_cpus() is called in enter_state(). If it survives past this point it won't crash.

(XEN) Finishing wakeup from ACPI S3 state.
(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480122c1a>] vcpu_wake+0x3a/0x40a
(XEN) RFLAGS: 0000000000010082   CONTEXT: hypervisor
(XEN) rax: 00007d3b7fcf7580   rbx: ffff8300ba70f000   rcx: ffff8301020180c8
(XEN) rdx: 0000000000000000   rsi: 0000000000000040   rdi: ffff8300ba70f000
(XEN) rbp: ffff82c4802c7bf0   rsp: ffff82c4802c7bb0   r8:  0000000000000002
(XEN) r9:  ffff83013a805380   r10: 00000009713faecf   r11: 000000000000000a
(XEN) r12: ffff82c480308ae0   r13: ffff82c4802f2dc0   r14: 0000000000000246
(XEN) r15: 0000000000000082   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000ba674000   cr2: 0000000000000060
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802c7bb0:
(XEN)    ffff82c4802c7c00 0000000000000286 ffff83013a8053a8 ffff8300ba70f000
(XEN)    0000000000000000 ffff8300ba70f15c 0000000000000246 ffff82c4802c7e3c
(XEN)    ffff82c4802c7c00 ffff82c48012337b ffff82c4802c7c20 ffff82c48015eaf6
(XEN)    ffff830137b75000 0000000000000013 ffff82c4802c7c30 ffff82c48015eb75
(XEN)    ffff82c4802c7c60 ffff82c4801067e5 ffff82c4802c7c60 ffff8300ba70f000
(XEN)    0000000000000000 ffff8300ba70f15c ffff82c4802c7c90 ffff82c480106a88
(XEN)    ffff82c480308c80 ffff8300ba70f000 ffff82c480121590 00000009a5a50985
(XEN)    ffff82c4802c7ca0 ffff82c480181af7 ffff82c4802c7cb0 ffff82c480121599
(XEN)    ffff82c4802c7ce0 ffff82c4801274cf 0000000000000002 ffff8300ba70f060
(XEN)    ffff82c480308c80 ffff830137beb240 ffff82c4802c7d30 ffff82c4801275cb
(XEN)    ffff82c480308e28 0000000000000286 0000000000000000 ffff82c4802e0000
(XEN)    ffff82c4802e0000 ffff82c4802c0000 fffffffffffffffd ffff82c4802c7e3c
(XEN)    ffff82c4802c7d60 ffff82c480124b8e 0000000000000000 ffff82c4802655e0
(XEN)    0000000000000003 0000000000000100 ffff82c4802c7d70 ffff82c480124bdf
(XEN)    ffff82c4802c7db0 ffff82c48012cd1f ffff82c4802c7da0 0000000000000000
(XEN)    ffff82c48012c79c ffff82c4802c7e3c 0000000000000008 0000000000000000
(XEN)    ffff82c4802c7e20 ffff82c480125a77 ffff82c4802c7e40 ffff82c48012cccb
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff82c4802c7e00 ffff82c4802c0000 0000000000000000 0000000000000003
(XEN)    ffff82c480308e90 00000000000026f0 ffff82c4802c7e40 ffff82c48012cbf8
(XEN) Xen call trace:
(XEN)    [<ffff82c480122c1a>] vcpu_wake+0x3a/0x40a
(XEN)    [<ffff82c48012337b>] vcpu_unblock+0x4b/0x4d
(XEN)    [<ffff82c48015eaf6>] vcpu_kick+0x20/0x71
(XEN)    [<ffff82c48015eb75>] vcpu_mark_events_pending+0x2e/0x39
(XEN)    [<ffff82c4801067e5>] evtchn_set_pending+0xba/0x185
(XEN)    [<ffff82c480106a88>] send_guest_vcpu_virq+0x62/0x7f
(XEN)    [<ffff82c480181af7>] send_timer_event+0xe/0x10
(XEN)    [<ffff82c480121599>] vcpu_singleshot_timer_fn+0x9/0xb
(XEN)    [<ffff82c4801274cf>] execute_timer+0x4e/0x6c
(XEN)    [<ffff82c4801275cb>] timer_softirq_action+0xde/0x206
(XEN)    [<ffff82c480124b8e>] __do_softirq+0x8e/0x99
(XEN)    [<ffff82c480124bdf>] process_pending_softirqs+0x46/0x48
(XEN)    [<ffff82c48012cd1f>] rcu_barrier_action+0x54/0x7d
(XEN)    [<ffff82c480125a77>] stop_machine_run+0x1b5/0x1fe
(XEN)    [<ffff82c48012cbf8>] rcu_barrier+0x24/0x26
(XEN)    [<ffff82c48019dae4>] enter_state+0x2cb/0x36b
(XEN)    [<ffff82c48019db9c>] enter_state_action+0x18/0x24
(XEN)    [<ffff82c480126ac6>] do_tasklet_work+0x8d/0xc7
(XEN)    [<ffff82c480126e14>] do_tasklet+0x65/0x95
(XEN)    [<ffff82c480159945>] idle_loop+0x63/0x6a
(XEN) Pagetable walk from 0000000000000060:
(XEN)  L4[0x000] = 000000013a808063 ffffffffffffffff
(XEN)  L3[0x000] = 000000013a807063 ffffffffffffffff
(XEN)  L2[0x000] = 000000013a806063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.