[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
>>> On 25.01.13 at 10:07, Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx> >>> wrote: >> It certainly looks wrong for vcpu_wake() to be called on an offline >> CPU in the first place (i.e. I'm getting the impression you're trying >> to cure symptoms rather than the root cause). Did you note down >> the call tree that got you there? >> > Here's an example one, taken from unpatched suspend path - was pretty > hard to get as for some reason I am getting a lot of serial output eaten > even with sync_console. I also seen it from few other places, believe > the window of opportunity for it to happen is before the > enable_nonboot_boot_cpus() is called in enter_state(). If it survives > past this point it won't crash. > > > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) ----[ Xen-4.3-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c480122c1a>] vcpu_wake+0x3a/0x40a > (XEN) RFLAGS: 0000000000010082 CONTEXT: hypervisor > (XEN) rax: 00007d3b7fcf7580 rbx: ffff8300ba70f000 rcx: ffff8301020180c8 > (XEN) rdx: 0000000000000000 rsi: 0000000000000040 rdi: ffff8300ba70f000 > (XEN) rbp: ffff82c4802c7bf0 rsp: ffff82c4802c7bb0 r8: 0000000000000002 > (XEN) r9: ffff83013a805380 r10: 00000009713faecf r11: 000000000000000a > (XEN) r12: ffff82c480308ae0 r13: ffff82c4802f2dc0 r14: 0000000000000246 > (XEN) r15: 0000000000000082 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 00000000ba674000 cr2: 0000000000000060 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c4802c7bb0: > (XEN) ffff82c4802c7c00 0000000000000286 ffff83013a8053a8 ffff8300ba70f000 > (XEN) 0000000000000000 ffff8300ba70f15c 0000000000000246 ffff82c4802c7e3c > (XEN) ffff82c4802c7c00 ffff82c48012337b ffff82c4802c7c20 ffff82c48015eaf6 > (XEN) ffff830137b75000 0000000000000013 ffff82c4802c7c30 ffff82c48015eb75 > (XEN) ffff82c4802c7c60 ffff82c4801067e5 ffff82c4802c7c60 ffff8300ba70f000 > (XEN) 0000000000000000 ffff8300ba70f15c ffff82c4802c7c90 ffff82c480106a88 > (XEN) ffff82c480308c80 ffff8300ba70f000 ffff82c480121590 00000009a5a50985 > (XEN) ffff82c4802c7ca0 ffff82c480181af7 ffff82c4802c7cb0 ffff82c480121599 > (XEN) ffff82c4802c7ce0 ffff82c4801274cf 0000000000000002 ffff8300ba70f060 > (XEN) ffff82c480308c80 ffff830137beb240 ffff82c4802c7d30 ffff82c4801275cb > (XEN) ffff82c480308e28 0000000000000286 0000000000000000 ffff82c4802e0000 > (XEN) ffff82c4802e0000 ffff82c4802c0000 fffffffffffffffd ffff82c4802c7e3c > (XEN) ffff82c4802c7d60 ffff82c480124b8e 0000000000000000 ffff82c4802655e0 > (XEN) 0000000000000003 0000000000000100 ffff82c4802c7d70 ffff82c480124bdf > (XEN) ffff82c4802c7db0 ffff82c48012cd1f ffff82c4802c7da0 0000000000000000 > (XEN) ffff82c48012c79c ffff82c4802c7e3c 0000000000000008 0000000000000000 > (XEN) ffff82c4802c7e20 ffff82c480125a77 ffff82c4802c7e40 ffff82c48012cccb > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) ffff82c4802c7e00 ffff82c4802c0000 0000000000000000 0000000000000003 > (XEN) ffff82c480308e90 00000000000026f0 ffff82c4802c7e40 ffff82c48012cbf8 > (XEN) Xen call trace: > (XEN) [<ffff82c480122c1a>] vcpu_wake+0x3a/0x40a > (XEN) [<ffff82c48012337b>] vcpu_unblock+0x4b/0x4d > (XEN) [<ffff82c48015eaf6>] vcpu_kick+0x20/0x71 > (XEN) [<ffff82c48015eb75>] vcpu_mark_events_pending+0x2e/0x39 > (XEN) [<ffff82c4801067e5>] evtchn_set_pending+0xba/0x185 > (XEN) [<ffff82c480106a88>] send_guest_vcpu_virq+0x62/0x7f > (XEN) [<ffff82c480181af7>] send_timer_event+0xe/0x10 > (XEN) [<ffff82c480121599>] vcpu_singleshot_timer_fn+0x9/0xb > (XEN) [<ffff82c4801274cf>] execute_timer+0x4e/0x6c > (XEN) [<ffff82c4801275cb>] timer_softirq_action+0xde/0x206 > (XEN) [<ffff82c480124b8e>] __do_softirq+0x8e/0x99 > (XEN) [<ffff82c480124bdf>] process_pending_softirqs+0x46/0x48 > (XEN) [<ffff82c48012cd1f>] rcu_barrier_action+0x54/0x7d > (XEN) [<ffff82c480125a77>] stop_machine_run+0x1b5/0x1fe > (XEN) [<ffff82c48012cbf8>] rcu_barrier+0x24/0x26 I think I had already raised the question of the placement of this rcu_barrier() here, and the lack of a counterpart in the suspend portion of the path. Keir? Or should rcu_barrier_action() avoid calling process_pending_softirqs() while still resuming, and instead call __do_softirq() with all but RCU_SOFTIRQ masked (perhaps through a suitable wrapper, or alternatively by open-coding its effect)? Jan > (XEN) [<ffff82c48019dae4>] enter_state+0x2cb/0x36b > (XEN) [<ffff82c48019db9c>] enter_state_action+0x18/0x24 > (XEN) [<ffff82c480126ac6>] do_tasklet_work+0x8d/0xc7 > (XEN) [<ffff82c480126e14>] do_tasklet+0x65/0x95 > (XEN) [<ffff82c480159945>] idle_loop+0x63/0x6a > (XEN) > (XEN) Pagetable walk from 0000000000000060: > (XEN) L4[0x000] = 000000013a808063 ffffffffffffffff > (XEN) L3[0x000] = 000000013a807063 ffffffffffffffff > (XEN) L2[0x000] = 000000013a806063 ffffffffffffffff > (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: 0000000000000060 > (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |