[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
In fact, with 2 cpupools, one (the default) Credit and one Credit2 (with at least 1 pCPU in the latter), trying a (e.g., ACPI S3) suspend/resume crashes like this: (XEN) [ 150.587779] ----[ Xen-4.7-unstable x86_64 debug=y Not tainted ]---- (XEN) [ 150.587783] CPU: 6 (XEN) [ 150.587786] RIP: e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d (XEN) [ 150.587796] RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) [ 150.587801] rax: ffff83031fa3c020 rbx: ffff830322c1b4b0 rcx: 0000000000000000 (XEN) [ 150.587806] rdx: ffff83031fa78000 rsi: 000000000000000a rdi: ffff82d0802a9788 (XEN) [ 150.587811] rbp: ffff83031fa7fe20 rsp: ffff83031fa7fd30 r8: ffff83031fa80000 (XEN) [ 150.587815] r9: 0000000000000006 r10: 000000000008f7f2 r11: 0000000000000006 (XEN) [ 150.587819] r12: ffff8300dbdf3000 r13: ffff830322c1b4b0 r14: 0000000000000006 (XEN) [ 150.587823] r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026e0 (XEN) [ 150.587827] cr3: 00000000dbaa8000 cr2: 0000000000000000 (XEN) [ 150.587830] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) [ 150.587835] Xen stack trace from rsp=ffff83031fa7fd30: ... ... ... (XEN) [ 150.587962] Xen call trace: (XEN) [ 150.587966] [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d (XEN) [ 150.587974] [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635 (XEN) [ 150.587979] [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d (XEN) [ 150.587983] [<ffff82d08012dc6e>] do_softirq+0x13/0x15 (XEN) [ 150.587988] [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b (XEN) [ 151.272182] (XEN) [ 151.274174] **************************************** (XEN) [ 151.279624] Panic on CPU 6: (XEN) [ 151.282915] Xen BUG at sched_credit.c:655 (XEN) [ 151.287415] **************************************** During suspend, the pCPUs are not removed from their pools with the standard procedure (which would involve schedule_cpu_switch(). During resume, they: 1) are assigned to the default cpupool (CPU_UP_PREPARE phase); 2) are moved to the pool they were in before suspend, via schedule_cpu_switch() (CPU_ONLINE phase) During resume, scheduling (even if just the idle loop) can happen right after the CPU_STARTING phase(before CPU_ONLINE), i.e., before the pCPU is put back in its pool. In this case, it is the default pool'sscheduler that is invoked (Credit1, in the example above). But, during suspend, the Credit2 specific vCPU data is not being freed, and Credit1 specific vCPU data is not allocated, during resume. Therefore, Credit1 schedules on pCPUs whose idle vCPU's sched_priv points to Credit2 vCPU data, and we crash. Fix things by properly deallocating scheduler specific data of the pCPU's pool scheduler during pCPU teardown, and re-allocating them --always for &ops-- during pCPU bringup. This also fixes another (latent) bug. In fact, it avoids, still in schedule_cpu_switch(), that Credit1's free_vdata() is used to deallocate data allocated with Credit2's alloc_vdata(). This is not easy to trigger, but only because the other bug shown above manifests first and crashes the host. The downside of this patch, is that it adds one more allocation on the resume path, which is not ideal. Still, there is no better way of fixing the described bugs at the moment. Removing (all ideally) allocations happening during resume should continue being chased, in the long run. Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> --- Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx> Cc: Juergen Gross <jgross@xxxxxxxx> Cc: Jan Beulich <jbeulich@xxxxxxxx> --- Changes from v1: * reversed the deallocation order in cpu_schedule_down(), so that allocations and deallocations actually happen in reverse ordering; * moved the allocation of the private data in the else branch of the case where the whole idle vcpu is allocated, and added an ASSERT() about such data being actually not allocated in that case; * improved both in code comment and the changelog. --- xen/common/schedule.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 20f5f56..1c05184 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1378,6 +1378,27 @@ static int cpu_schedule_up(unsigned int cpu) if ( idle_vcpu[cpu] == NULL ) alloc_vcpu(idle_vcpu[0]->domain, cpu, cpu); + else + { + struct vcpu *idle = idle_vcpu[cpu]; + + /* + * During (ACPI?) suspend the idle vCPU for this pCPU is not freed, + * while its scheduler specific data (what is pointed by sched_priv) + * is. Also, at this stage of the resume path, we attach the pCPU + * to the default scheduler, no matter in what cpupool it was before + * suspend. To avoid inconsistency, let's allocate default scheduler + * data for the idle vCPU here. If the pCPU was in a different pool + * with a different scheduler, it is schedule_cpu_switch(), invoked + * later, that will set things up as appropriate. + */ + ASSERT(idle->sched_priv == NULL); + + idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle, + idle->domain->sched_priv); + if ( idle->sched_priv == NULL ) + return -ENOMEM; + } if ( idle_vcpu[cpu] == NULL ) return -ENOMEM; @@ -1395,6 +1416,10 @@ static void cpu_schedule_down(unsigned int cpu) if ( sd->sched_priv != NULL ) SCHED_OP(sched, free_pdata, sd->sched_priv, cpu); + SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv); + + idle_vcpu[cpu]->sched_priv = NULL; + sd->sched_priv = NULL; kill_timer(&sd->s_timer); } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |