[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
On 24/01/13 16:36, Jan Beulich wrote: You're right, in my simple tests this was the case, but generally speaking it might not be.. Would an approach based on storing cpupool0 mask in disable_nonboot_cpus() and restoring it in enable_nonboot_cpus() be more acceptable?On 24.01.13 at 15:26, Tomasz Wroblewski<tomasz.wroblewski@xxxxxxxxxx> wrote:@@ -212,6 +213,8 @@ BUG_ON(error == -EBUSY); printk("Error taking CPU%d up: %d\n", cpu, error); } + if (system_state == SYS_STATE_resume) + cpumask_set_cpu(cpu, cpupool0->cpu_valid);This can't be right: What tells you that all CPUs were in pool 0? Also, for the future - generating patches with -p helps quite a bit in reviewing them. Ok, thanks! Is the objection about the affinity part or also the (c == NULL) bit? The cpu_disable_scheduler() function is currently part of a regular cpu down process, and was also part of suspend process before the "system state variable" changeset which regressed it. So the (c==NULL) hunk mostly just returns to previous state where this was working alot better (by empirical testing). But I am no expert on this, so would be grateful for ideas how this could be fixed in a better way!--- a/xen/common/schedule.c Mon Jan 21 17:03:10 2013 +0000 +++ b/xen/common/schedule.c Thu Jan 24 13:40:31 2013 +0000 @@ -545,7 +545,7 @@ int ret = 0; c = per_cpu(cpupool, cpu); - if ( (c == NULL) || (system_state == SYS_STATE_suspend) ) + if ( c == NULL ) return ret; for_each_domain_in_cpupool ( d, c ) @@ -556,7 +556,8 @@ cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid); if ( cpumask_empty(&online_affinity)&& - cpumask_test_cpu(cpu, v->cpu_affinity) ) + cpumask_test_cpu(cpu, v->cpu_affinity)&& + system_state != SYS_STATE_suspend ) { printk("Breaking vcpu affinity for domain %d vcpu %d\n", v->domain->domain_id, v->vcpu_id);I doubt this is correct, as you don't restore any of the settings during resume that you tear down here. Just to recap, the current problem boils down, I believe, to the fact that vcpu_wake (schedule.c) function keeps getting called occasionally during the S3 path for cpus which have the per_cpu data freed, causing a crash. Safest way of fixing it seemed to be just put the suspend cpu_disable_scheduler under regular path again - it probably isn't the best.. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |