[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer
On 20.09.22 12:22, Marek Marczykowski-Górecki wrote: On Mon, Aug 22, 2022 at 12:00:27PM +0200, Marek Marczykowski-Górecki wrote:On Mon, Aug 22, 2022 at 11:53:50AM +0200, Jan Beulich wrote:On 21.08.2022 18:14, Marek Marczykowski-Górecki wrote:On Sat, Oct 09, 2021 at 06:28:17PM +0200, Marek Marczykowski-Górecki wrote:On Sun, Jan 31, 2021 at 03:15:30AM +0100, Marek Marczykowski-Górecki wrote:I'm resurrecting this thread as it was recently mentioned elsewhere. I can still reproduce the issue on the recent staging branch (9dc687f155). It fails after the first resume (not always, but frequent enough to debug it). At least one guest needs to be running - with just (PV) dom0 the crash doesn't happen (at least for the ~8 times in a row I tried). If the first resume works, the second (almost?) always will fail but with a different symptoms - dom0 kernel lockups (at least some of its vcpus). I haven't debugged this one yet at all. Any help will be appreciated, I can apply some debug patches, change configuration etc.This still happens on 4.14.3. Maybe it is related to freeing percpu areas, as it caused other issues with suspend too? Just a thought...I have reproduced this on current staging(*). And I can reproduce it reliably. And also, I got (I believe) closely related crash with credit1 scheduler. (*) It isn't plain staging, it's one with my xhci console patches on top, including attempt to make it survive S3. I believe the only relevant part there is sticking set_timer() into console resume path (or just having a timer with rather short delay registered). The actual tree at https://github.com/marmarek/xen/tree/master-xue2-debug, including quite a lot of debug prints and debug hacks. Specific crash with credit2:(XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at common/sched/credit2.c:2274 (XEN) ----[ Xen-4.17-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 10 (XEN) RIP: e008:[<ffff82d040247a4d>] credit2.c#csched2_unit_wake+0x152/0x154 (XEN) RFLAGS: 0000000000010083 CONTEXT: hypervisor (d0v0) (XEN) rax: ffff830251778230 rbx: ffff830251768cb0 rcx: 00000032111d6000 (XEN) rdx: ffff8302515c1eb0 rsi: 0000000000000006 rdi: ffff830251769000 (XEN) rbp: ffff8302515cfd90 rsp: ffff8302515cfd70 r8: ffff830251769000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: ffff830251768dd0 r13: ffff8302515c1d00 r14: 0000000000000006 (XEN) r15: ffff82d0405ddb40 cr0: 0000000080050033 cr4: 0000000000372660 (XEN) cr3: 000000022f2a1000 cr2: ffff8881012738e0 (XEN) fsb: 0000744bf6a0db80 gsb: ffff888255600000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen code around <ffff82d040247a4d> (credit2.c#csched2_unit_wake+0x152/0x154): (XEN) df e8 6f bf ff ff eb ad <0f> 0b f3 0f 1e fa 55 48 89 e5 41 57 41 56 41 55 (XEN) Xen stack trace from rsp=ffff8302515cfd70: (XEN) ffff83025174b000 ffff830251768cb0 ffff830251778270 ffff82d0405c4298 (XEN) ffff8302515cfdd8 ffff82d04024fcb8 0000000000000202 ffff830251778270 (XEN) ffff83025174b000 0000000000000001 ffff830251769018 0000000000000000 (XEN) 0000000000000000 ffff8302515cfe48 ffff82d04020a8c9 ffff8882556aedc0 (XEN) 0000000000000003 00001910537e623e 0000000b988f78a6 0000000059d4a716 (XEN) 00001901f30fa41e 0000000217f96af6 0000000000000000 ffff83025174b000 (XEN) ffff830251756000 0000000000000002 0000000000000001 ffff8302515cfe70 (XEN) ffff82d0402f7968 ffff830251756000 ffff8302515cfef8 0000000000000018 (XEN) ffff8302515cfee8 ffff82d0402ec6de 0000000000000000 ffffffff82f157e0 (XEN) 0000000000000000 0000000000000000 ffff8302515cfef8 0000000000000000 (XEN) 0000000000000000 ffff8302515cffff ffff830251756000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 00007cfdaea300e7 (XEN) ffff82d0402012bd 0000000000000000 ffffffff82c51120 ffff88810036cf00 (XEN) 0000000000000002 000000000001e120 0000000000000002 0000000000000246 (XEN) ffffffff82f157e0 0000000000000001 0000000000000000 0000000000000018 (XEN) ffffffff81e4a30a 0000000000000000 0000000000000002 0000000000000001 (XEN) 0000010000000000 ffffffff81e4a30a 000000000000e033 0000000000000246 (XEN) ffffc9004aef7c18 000000000000e02b fb5ee398d214b10c eb5ef398c214a10c (XEN) eb56f390c21ca104 ebd6f310c29ca184 0000e0100000000a ffff830251756000 (XEN) 0000003211016000 0000000000372660 0000000000000000 80000002963e1002 (XEN) Xen call trace: (XEN) [<ffff82d040247a4d>] R credit2.c#csched2_unit_wake+0x152/0x154 (XEN) [<ffff82d04024fcb8>] F vcpu_wake+0xfd/0x267 (XEN) [<ffff82d04020a8c9>] F common_vcpu_op+0x178/0x5d1 (XEN) [<ffff82d0402f7968>] F do_vcpu_op+0x69/0x226 (XEN) [<ffff82d0402ec6de>] F pv_hypercall+0x575/0x657 (XEN) [<ffff82d0402012bd>] F lstar_enter+0x13d/0x150 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 10: (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at common/sched/credit2.c:2274 (XEN) ****************************************Ok, I think I figured it out! I added a function that verifies run queues of all the sched units, and called it basically every other line on the resume path. The debug function (if anybody is interested): void verify_sched_units(void) { struct domain *d; const struct sched_unit *unit;for_each_domain ( d ){ for_each_sched_unit ( d, unit ) { if ( c2rqd(sched_unit_master(unit)) != csched2_unit(unit)->rqd ) { printk(XENLOG_WARNING "d%d sched unit %d: rq=%d, unit master %d, rq=%d\n", d->domain_id, unit->unit_id, csched2_unit(unit)->rqd ? csched2_unit(unit)->rqd->id : -1, sched_unit_master(unit), c2rqd(sched_unit_master(unit))->id); WARN_ON(1); } } } } It appears that restore_vcpu_affinity() is responsible, specifically this part: 1216 /* 1217 * Re-assign the initial processor as after resume we have no 1218 * guarantee the old processor has come back to life again. 1219 * 1220 * Therefore, here, before actually unpausing the domains, we should 1221 * set v->processor of each of their vCPUs to something that will 1222 * make sense for the scheduler of the cpupool in which they are in. 1223 */ ... 1249 res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu))); 1250 sched_set_res(unit, res); 1251 1252 spin_unlock_irq(lock); 1253 1254 /* v->processor might have changed, so reacquire the lock. */ 1255 lock = unit_schedule_lock_irq(unit); 1256 res = sched_pick_resource(unit_scheduler(unit), unit); 1257 sched_set_res(unit, res); 1258 spin_unlock_irq(lock); 1259 1260 if ( old_cpu != sched_unit_master(unit) ) 1261 sched_move_irqs(unit); It calls sched_set_res() directly, which assigns sched resources, but does _not_ adjust runqueues (if new pcpu happen to be assigned to another runqueue than the one from previous pcpu). I have two (non exclusive) ideas here: 1. If old_cpu is actually still available, do not move it at all. 2. Use sched_migrate() instead of sched_set_res(). Here is the patch that fixes it for me: ---8<--- diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 83455fbde1c8..dcf202d8b307 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -1246,19 +1246,29 @@ void restore_vcpu_affinity(struct domain *d) } }- res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));+ /* Prefer old cpu if available. */ + if ( cpumask_test_cpu(old_cpu, cpumask_scratch_cpu(cpu)) ) + res = get_sched_res(old_cpu); + else + res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu))); sched_set_res(unit, res);spin_unlock_irq(lock); - /* v->processor might have changed, so reacquire the lock. */- lock = unit_schedule_lock_irq(unit); - res = sched_pick_resource(unit_scheduler(unit), unit); - sched_set_res(unit, res); - spin_unlock_irq(lock); - + /* + * If different cpu was chosen, it was random, let scheduler do proper + * decision. + */ if ( old_cpu != sched_unit_master(unit) ) + { + /* v->processor might have changed, so reacquire the lock. */ + lock = unit_schedule_lock_irq(unit); + res = sched_pick_resource(unit_scheduler(unit), unit); + sched_migrate(unit_scheduler(unit), unit, res->master_cpu); + spin_unlock_irq(lock); + sched_move_irqs(unit); + } }rcu_read_unlock(&sched_res_rculock);---8<--- I have several doubts here: 1. If old_cpu is available, is sched_set_res() needed at all? 2. Should both calls be changed to sched_migrate()? Currently I changed only the second one, in case scheduler could be confused about old_cpu not being available anymore. 3. Are there any extra locking requirements for sched_migrate() at this stage? The long comment above sched_unit_migrate_start() suggests there might be, but I'm not sure if that's really the case during resume. 4. Related to the above - should thaw_domains() be modified to call restore_vcpu_affinity() for all domains first, and unpause only later? That could reduce locking requirements, I guess. Looking into this in more detail I think the fix is much easier. In case a cpu isn't coming up again after suspend, cpu_disable_scheduler() for this cpu will be called. This will do the needed vcpu migration, so we can just remove restore_vcpu_affinity() completely without any need for replacement. I'll write a patch. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |