[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume



>>> On 25.01.13 at 10:07, Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx> 
>>> wrote:
>> It certainly looks wrong for vcpu_wake() to be called on an offline
>> CPU in the first place (i.e. I'm getting the impression you're trying
>> to cure symptoms rather than the root cause). Did you note down
>> the call tree that got you there?
>>
> Here's an example one, taken from unpatched suspend path - was pretty 
> hard to get as for some reason I am getting a lot of serial output eaten 
> even with sync_console. I also seen it from few other places, believe 
> the window of opportunity for it to happen is before the 
> enable_nonboot_boot_cpus() is called in enter_state(). If it survives 
> past this point it won't crash.
> 
> 
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c480122c1a>] vcpu_wake+0x3a/0x40a
> (XEN) RFLAGS: 0000000000010082   CONTEXT: hypervisor
> (XEN) rax: 00007d3b7fcf7580   rbx: ffff8300ba70f000   rcx: ffff8301020180c8
> (XEN) rdx: 0000000000000000   rsi: 0000000000000040   rdi: ffff8300ba70f000
> (XEN) rbp: ffff82c4802c7bf0   rsp: ffff82c4802c7bb0   r8:  0000000000000002
> (XEN) r9:  ffff83013a805380   r10: 00000009713faecf   r11: 000000000000000a
> (XEN) r12: ffff82c480308ae0   r13: ffff82c4802f2dc0   r14: 0000000000000246
> (XEN) r15: 0000000000000082   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000ba674000   cr2: 0000000000000060
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802c7bb0:
> (XEN)    ffff82c4802c7c00 0000000000000286 ffff83013a8053a8 ffff8300ba70f000
> (XEN)    0000000000000000 ffff8300ba70f15c 0000000000000246 ffff82c4802c7e3c
> (XEN)    ffff82c4802c7c00 ffff82c48012337b ffff82c4802c7c20 ffff82c48015eaf6
> (XEN)    ffff830137b75000 0000000000000013 ffff82c4802c7c30 ffff82c48015eb75
> (XEN)    ffff82c4802c7c60 ffff82c4801067e5 ffff82c4802c7c60 ffff8300ba70f000
> (XEN)    0000000000000000 ffff8300ba70f15c ffff82c4802c7c90 ffff82c480106a88
> (XEN)    ffff82c480308c80 ffff8300ba70f000 ffff82c480121590 00000009a5a50985
> (XEN)    ffff82c4802c7ca0 ffff82c480181af7 ffff82c4802c7cb0 ffff82c480121599
> (XEN)    ffff82c4802c7ce0 ffff82c4801274cf 0000000000000002 ffff8300ba70f060
> (XEN)    ffff82c480308c80 ffff830137beb240 ffff82c4802c7d30 ffff82c4801275cb
> (XEN)    ffff82c480308e28 0000000000000286 0000000000000000 ffff82c4802e0000
> (XEN)    ffff82c4802e0000 ffff82c4802c0000 fffffffffffffffd ffff82c4802c7e3c
> (XEN)    ffff82c4802c7d60 ffff82c480124b8e 0000000000000000 ffff82c4802655e0
> (XEN)    0000000000000003 0000000000000100 ffff82c4802c7d70 ffff82c480124bdf
> (XEN)    ffff82c4802c7db0 ffff82c48012cd1f ffff82c4802c7da0 0000000000000000
> (XEN)    ffff82c48012c79c ffff82c4802c7e3c 0000000000000008 0000000000000000
> (XEN)    ffff82c4802c7e20 ffff82c480125a77 ffff82c4802c7e40 ffff82c48012cccb
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    ffff82c4802c7e00 ffff82c4802c0000 0000000000000000 0000000000000003
> (XEN)    ffff82c480308e90 00000000000026f0 ffff82c4802c7e40 ffff82c48012cbf8
> (XEN) Xen call trace:
> (XEN)    [<ffff82c480122c1a>] vcpu_wake+0x3a/0x40a
> (XEN)    [<ffff82c48012337b>] vcpu_unblock+0x4b/0x4d
> (XEN)    [<ffff82c48015eaf6>] vcpu_kick+0x20/0x71
> (XEN)    [<ffff82c48015eb75>] vcpu_mark_events_pending+0x2e/0x39
> (XEN)    [<ffff82c4801067e5>] evtchn_set_pending+0xba/0x185
> (XEN)    [<ffff82c480106a88>] send_guest_vcpu_virq+0x62/0x7f
> (XEN)    [<ffff82c480181af7>] send_timer_event+0xe/0x10
> (XEN)    [<ffff82c480121599>] vcpu_singleshot_timer_fn+0x9/0xb
> (XEN)    [<ffff82c4801274cf>] execute_timer+0x4e/0x6c
> (XEN)    [<ffff82c4801275cb>] timer_softirq_action+0xde/0x206
> (XEN)    [<ffff82c480124b8e>] __do_softirq+0x8e/0x99
> (XEN)    [<ffff82c480124bdf>] process_pending_softirqs+0x46/0x48
> (XEN)    [<ffff82c48012cd1f>] rcu_barrier_action+0x54/0x7d
> (XEN)    [<ffff82c480125a77>] stop_machine_run+0x1b5/0x1fe
> (XEN)    [<ffff82c48012cbf8>] rcu_barrier+0x24/0x26

I think I had already raised the question of the placement of
this rcu_barrier() here, and the lack of a counterpart in the
suspend portion of the path. Keir? Or should
rcu_barrier_action() avoid calling process_pending_softirqs()
while still resuming, and instead call __do_softirq() with all but
RCU_SOFTIRQ masked (perhaps through a suitable wrapper,
or alternatively by open-coding its effect)?

Jan

> (XEN)    [<ffff82c48019dae4>] enter_state+0x2cb/0x36b
> (XEN)    [<ffff82c48019db9c>] enter_state_action+0x18/0x24
> (XEN)    [<ffff82c480126ac6>] do_tasklet_work+0x8d/0xc7
> (XEN)    [<ffff82c480126e14>] do_tasklet+0x65/0x95
> (XEN)    [<ffff82c480159945>] idle_loop+0x63/0x6a
> (XEN)
> (XEN) Pagetable walk from 0000000000000060:
> (XEN)  L4[0x000] = 000000013a808063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000013a807063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000013a806063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000060
> (XEN) ****************************************



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.