[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
On 13/11/15 18:10, Dario Faggioli wrote: > In fact, with 2 cpupools, one (the default) Credit and > one Credit2 (with at least 1 pCPU in the latter), trying > a (e.g., ACPI S3) suspend/resume crashes like this: > > (XEN) [ 150.587779] ----[ Xen-4.7-unstable x86_64 debug=y Not tainted > ]---- > (XEN) [ 150.587783] CPU: 6 > (XEN) [ 150.587786] RIP: e008:[<ffff82d080123a10>] > sched_credit.c#csched_schedule+0xf2/0xc3d > (XEN) [ 150.587796] RFLAGS: 0000000000010086 CONTEXT: hypervisor > (XEN) [ 150.587801] rax: ffff83031fa3c020 rbx: ffff830322c1b4b0 rcx: > 0000000000000000 > (XEN) [ 150.587806] rdx: ffff83031fa78000 rsi: 000000000000000a rdi: > ffff82d0802a9788 > (XEN) [ 150.587811] rbp: ffff83031fa7fe20 rsp: ffff83031fa7fd30 r8: > ffff83031fa80000 > (XEN) [ 150.587815] r9: 0000000000000006 r10: 000000000008f7f2 r11: > 0000000000000006 > (XEN) [ 150.587819] r12: ffff8300dbdf3000 r13: ffff830322c1b4b0 r14: > 0000000000000006 > (XEN) [ 150.587823] r15: 0000000000000000 cr0: 000000008005003b cr4: > 00000000000026e0 > (XEN) [ 150.587827] cr3: 00000000dbaa8000 cr2: 0000000000000000 > (XEN) [ 150.587830] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 > cs: e008 > (XEN) [ 150.587835] Xen stack trace from rsp=ffff83031fa7fd30: > ... ... ... > (XEN) [ 150.587962] Xen call trace: > (XEN) [ 150.587966] [<ffff82d080123a10>] > sched_credit.c#csched_schedule+0xf2/0xc3d > (XEN) [ 150.587974] [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635 > (XEN) [ 150.587979] [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d > (XEN) [ 150.587983] [<ffff82d08012dc6e>] do_softirq+0x13/0x15 > (XEN) [ 150.587988] [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b > (XEN) [ 151.272182] > (XEN) [ 151.274174] **************************************** > (XEN) [ 151.279624] Panic on CPU 6: > (XEN) [ 151.282915] Xen BUG at sched_credit.c:655 > (XEN) [ 151.287415] **************************************** > > During suspend, the pCPUs are not removed from their > pools with the standard procedure (which would involve > schedule_cpu_switch(). During resume, they: > 1) are assigned to the default cpupool (CPU_UP_PREPARE > phase); > 2) are moved to the pool they were in before suspend, > via schedule_cpu_switch() (CPU_ONLINE phase) > > During resume, scheduling (even if just the idle loop) > can happen right after the CPU_STARTING phase(before > CPU_ONLINE), i.e., before the pCPU is put back in its > pool. In this case, it is the default pool'sscheduler > that is invoked (Credit1, in the example above). But, > during suspend, the Credit2 specific vCPU data is not > being freed, and Credit1 specific vCPU data is not > allocated, during resume. > > Therefore, Credit1 schedules on pCPUs whose idle vCPU's > sched_priv points to Credit2 vCPU data, and we crash. > > Fix things by properly deallocating scheduler specific > data of the pCPU's pool scheduler during pCPU teardown, > and re-allocating them --always for &ops-- during pCPU > bringup. > > This also fixes another (latent) bug. In fact, it avoids, > still in schedule_cpu_switch(), that Credit1's free_vdata() > is used to deallocate data allocated with Credit2's > alloc_vdata(). This is not easy to trigger, but only > because the other bug shown above manifests first and > crashes the host. > > The downside of this patch, is that it adds one more > allocation on the resume path, which is not ideal. Still, > there is no better way of fixing the described bugs at > the moment. Removing (all ideally) allocations happening > during resume should continue being chased, in the long > run. > > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Reviewed-by: Juergen Gross <jgross@xxxxxxxx> Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |