[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx>
Date: Thu, 24 Jan 2013 17:25:09 +0100
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, "Keir \(Xen.org\)" <keir@xxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Thu, 24 Jan 2013 16:27:21 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 24/01/13 16:36, Jan Beulich wrote:

On 24.01.13 at 15:26, Tomasz Wroblewski<tomasz.wroblewski@xxxxxxxxxx>  wrote:

@@ -212,6 +213,8 @@
             BUG_ON(error == -EBUSY);
             printk("Error taking CPU%d up: %d\n", cpu, error);
         }
+        if (system_state == SYS_STATE_resume)
+            cpumask_set_cpu(cpu, cpupool0->cpu_valid);

This can't be right: What tells you that all CPUs were in pool 0?

You're right, in my simple tests this was the case, but generallyspeaking it might not be.. Would an approach based on storing cpupool0mask in disable_nonboot_cpus() and restoring it in enable_nonboot_cpus()be more acceptable?

Also, for the future - generating patches with -p helps quite
a bit in reviewing them.

Ok, thanks!

--- a/xen/common/schedule.c     Mon Jan 21 17:03:10 2013 +0000
+++ b/xen/common/schedule.c     Thu Jan 24 13:40:31 2013 +0000
@@ -545,7 +545,7 @@
     int    ret = 0;

     c = per_cpu(cpupool, cpu);
-    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
+    if ( c == NULL )
         return ret;

     for_each_domain_in_cpupool ( d, c )
@@ -556,7 +556,8 @@

             cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity)&&
-                 cpumask_test_cpu(cpu, v->cpu_affinity) )
+                 cpumask_test_cpu(cpu, v->cpu_affinity)&&
+                 system_state != SYS_STATE_suspend )
             {
                 printk("Breaking vcpu affinity for domain %d vcpu %d\n",
                         v->domain->domain_id, v->vcpu_id);

I doubt this is correct, as you don't restore any of the settings
during resume that you tear down here.

Is the objection about the affinity part or also the (c == NULL) bit?The cpu_disable_scheduler() function is currently part of a regular cpudown process, and was also part of suspend process before the "systemstate variable" changeset which regressed it. So the (c==NULL) hunkmostly just returns to previous state where this was working alot better(by empirical testing). But I am no expert on this, so would be gratefulfor ideas how this could be fixed in a better way!

Just to recap, the current problem boils down, I believe, to the factthat vcpu_wake (schedule.c) function keeps getting called occasionallyduring the S3 path for cpus which have the per_cpu data freed, causing acrash. Safest way of fixing it seemed to be just put the suspendcpu_disable_scheduler under regular path again - it probably isn't thebest..










_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
  - From: Jan Beulich

References:
- [Xen-devel] [PATCH] Fix scheduler crash after s3 resume
  - From: Tomasz Wroblewski
- Re: [Xen-devel] [PATCH] Fix scheduler crash after s3 resume
  - From: Juergen Gross
- [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
  - From: Tomasz Wroblewski
- Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] [RFC PATCH 1/15]: PVH xen: turn gdb_frames/gdt_ents into union
Next by Date: Re: [Xen-devel] [RFC PATCH 10/16]: PVH xen: introduce vmx_pvh.c
Previous by thread: Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
Next by thread: Re: [Xen-devel] [PATCH v2] Fix scheduler crash after s3 resume
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.