[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
Ok, so I tried this approach (by turning the locking in vcpu_wake to be conditional based on system_state), and whilst it stopped vcpu_wake crash I traded it for a crash in acpi_cpufreq_target:The crash happens due to an access to the scheduler percpu area which isn't allocated at the moment. The accessed element is the address of the scheduler lock for this cpu. Disabling the percpu locking scheme of the scheduler while the non-boot cpus are offline will avoid the crash. (XEN) ----[ Xen-4.3-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff82c4801a0594>] acpi_cpufreq_target+0x165/0x33b (XEN) RFLAGS: 0000000000010293 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830137bc7300 rcx: 0000000000000000 (XEN) rdx: 0000000000000009 rsi: ffff82c480265460 rdi: ffff830137bd7d60 (XEN) rbp: ffff830137bd7db0 rsp: ffff830137bd7d30 r8: 0000000000000004 (XEN) r9: 00000000fffffffe r10: 0000000000000009 r11: 0000000000000000 (XEN) r12: ffff830137bc7c70 r13: ffff8301025444f8 r14: ffff830137bc7c70 (XEN) r15: 0000000001b5b14c cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 00000000ba674000 cr2: 000000000000004c (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff830137bd7d30: (XEN) 000000008017d626 0000000000000009 00000009000000fb ffff830100000001 (XEN) ffff830137bd7d60 0000080000000199 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffffffff37bd7da0 00000000ffffffea (XEN) ffff830137bc7c70 00000000002936c8 0000000006d9e30a 0000000001b5b14c (XEN) ffff830137bd7df0 ffff82c4801414ee ffff830137bc7c70 0000000000000003 (XEN) ffff830137bd7df0 0000000000000008 0000000000000008 ffff830102ae1340 (XEN) ffff830137bd7e50 ffff82c480140815 ffff8301141624a0 002936c800000286 (XEN) ffff82c480308dc0 ffff830137bc7c70 0000000000000003 ffff830102ae1380 (XEN) ffff830137bebb50 ffff830137bebc00 0000000000000010 0000000000000030 (XEN) ffff830137bd7e70 ffff82c480140a2b ffff830137bd7e70 0000001548c205b8 (XEN) ffff830137bd7ef0 ffff82c4801a31da 0000000000000002(XEN) Resetting with ACPI MEMORY or I/O RESET_REG. (call graph sadly got eaten) which corresponds to the following lines in cpufreq.c freqs.old = perf->states[perf->state].core_frequency * 1000; freqs.new = data->freq_table[next_state].frequency; ffff82c4801a058d: 8b 55 94 mov -0x6c(%rbp),%edx ffff82c4801a0590: 48 8b 43 08 mov 0x8(%rbx),%rax ffff82c4801a0594: 8b 44 d0 04 mov 0x4(%rax,%rdx,8),%eax ffff82c4801a0598: 89 45 8c mov %eax,-0x74(%rbp) ffff82c4801a059b: 48 c7 c0 00 80 ff ff mov $0xffffffffffff8000,%rax ffff82c4801a05a2: 48 21 e0 and %rsp,%rax which I guess crashes because either freq_table or data is freed at this point (indeed seems that cpufreq driver has some cpu up/down logic which frees it). Given this is not even first place in acpi_freq_target this is accessed, it looks like the cpu got torn down halfway thru this function... Suspect there are likely to be more sites affected by this. I also tried Jan's suggestion of making do_softirq skip its job if we are resuming, that causes a hang in rcu_barrier(), adding another resume conditional rcu_barrier() made it progress further but crash elsewhere (don't remember where exactly, this approach looked a bit like dead end so i abandoned it quickly) So still not having a better solution than the revert of the cpu_disable_schedule() hunk. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |