[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen crash on S3 resume on 4.13 and unstable if any CPU is re-offlined
On 04/01/2020 15:30, Marek Marczykowski-Górecki wrote: > Hi, > > I have a reliable crash on resume from S3. I can reproduce it on both > real hardware and nested within KVM, although call traces are different > between those platforms. In any case, it happens only if some CPU is to > be re-offlined after resume (smt=off and/or maxcpus=... options). > > I think the crash from the real hardware gives more clues, but the one > from qemu may also be interesting, maybe it's even another bug? > > The crash message (full console log attached): > > (XEN) mce_intel.c:772: MCA Capability: firstbank 0, extended MCE MSR 0, > BCAST, CMCI > (XEN) CPU0 CMCI LVT vector (0xf2) already installed > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs ... > (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d08023beb7>] > schedule.c#cpu_schedule_callback+0xea/0x1a1 > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: ffff82d080453348 rcx: ffff82d080584020 > (XEN) rdx: 000000339b66e000 rsi: 0000000000008005 rdi: ffff82d080453340 > (XEN) rbp: ffff8300ca45fd68 rsp: ffff8300ca45fd68 r8: 0000000000000004 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 8000000000000000 > (XEN) r12: ffff82d080453340 r13: ffff82d080453200 r14: 0000000000008005 > (XEN) r15: 0000000000008000 cr0: 000000008005003b cr4: 00000000000426e0 > (XEN) cr3: 00000000ca44f000 cr2: 0000000000000008 > (XEN) fsb: 000079d5e4f9e740 gsb: ffff888135600000 gss: 0000000000000000 > (XEN) ds: 0018 es: 0010 fs: b800 gs: 0010 ss: 0000 cs: e008 > (XEN) Xen code around <ffff82d08023beb7> > (schedule.c#cpu_schedule_callback+0xea/0x1a1): > (XEN) 48 8b 14 d1 48 8b 04 02 <48> 8b 48 08 48 85 c9 74 64 48 8b 05 b9 c3 32 > 00 > (XEN) Xen stack trace from rsp=ffff8300ca45fd68: > (XEN) ffff8300ca45fdb0 ffff82d080221289 ffff8300ca45fdd8 0000000000000001 > (XEN) 0000000000000000 00000000ffffffef ffff8300ca45fe00 0000000000000001 > (XEN) 0000000000000200 ffff8300ca45fdc8 ffff82d080203476 0000000000000001 > (XEN) ffff8300ca45fdf0 ffff82d080203550 0000000000000000 0000000000000001 > (XEN) 0000000000000000 ffff8300ca45fe20 ffff82d080203999 ffff8300ca45fef8 > (XEN) 0000000000000000 0000000000000003 00000000000426e0 ffff8300ca45fe58 > (XEN) ffff82d0802e4240 ffff83042896c5f0 ffff83041bb4d000 0000000000000000 > (XEN) 0000000000000000 ffff83041bb73000 ffff8300ca45fe78 ffff82d08020828f > (XEN) ffff83041bb4d1b8 ffff82d080567210 ffff8300ca45fe90 ffff82d08023fd39 > (XEN) ffff82d080567200 ffff8300ca45fec0 ffff82d08024001a 0000000000000000 > (XEN) ffff82d080567210 ffff82d08056d980 ffff82d080584020 ffff8300ca45fef0 > (XEN) ffff82d08027247a ffff83041bbb2000 ffff83041bb4d000 ffff83041bbb3000 > (XEN) 0000000000000000 ffff8300ca45fd98 0000000000000003 ffffffff820ae496 > (XEN) 0000000000000003 0000000000000000 0000000000002003 ffffffff822c6868 > (XEN) 0000000000000246 0000000000003403 00000000ffff0000 0000000000000000 > (XEN) 0000000000000000 ffffffff810010ea 0000000000002003 0000000000000010 > (XEN) deadbeefdeadf00d 0000010000000000 ffffffff810010ea 000000000000e033 > (XEN) 0000000000000246 ffffc900011abbe8 000000000000e02b 003b4a890045ffe0 > (XEN) 003b4ddf00098fa8 003b4e0300000001 003b499d0045ffe0 0000e01000000000 > (XEN) ffff83041bbb2000 0000000000000000 00000000000426e0 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82d08023beb7>] R schedule.c#cpu_schedule_callback+0xea/0x1a1 > (XEN) [<ffff82d080221289>] F notifier_call_chain+0x6b/0x96 > (XEN) [<ffff82d080203476>] F cpu.c#cpu_notifier_call_chain+0x1b/0x33 > (XEN) [<ffff82d080203550>] F cpu_down+0x5e/0x15c > (XEN) [<ffff82d080203999>] F enable_nonboot_cpus+0x113/0x1fb > (XEN) [<ffff82d0802e4240>] F power.c#enter_state_helper+0x107/0x51b > (XEN) [<ffff82d08020828f>] F > domain.c#continue_hypercall_tasklet_handler+0x8b/0xb7 > (XEN) [<ffff82d08023fd39>] F tasklet.c#do_tasklet_work+0x76/0xa9 > (XEN) [<ffff82d08024001a>] F do_tasklet+0x58/0x8a > (XEN) [<ffff82d08027247a>] F domain.c#idle_loop+0x40/0x96 > (XEN) > (XEN) Pagetable walk from 0000000000000008: > (XEN) L4[0x000] = 000000041bbff063 ffffffffffffffff > (XEN) L3[0x000] = 000000041bbfe063 ffffffffffffffff > (XEN) L2[0x000] = 000000041bbfd063 ffffffffffffffff > (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: 0000000000000008 > (XEN) **************************************** > > And the one from qemu: > > (XEN) mce_intel.c:772: MCA Capability: firstbank 1, extended MCE MSR 0, SER > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs ... > (XEN) Assertion 'c2rqd(ops, sched_unit_master(unit)) == svc->rqd' failed at > sched_credit2.c:2137 > (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 1 > (XEN) RIP: e008:[<ffff82d08022fe1a>] > sched_credit2.c#csched2_unit_wake+0x174/0x176 > (XEN) RFLAGS: 0000000000010097 CONTEXT: hypervisor (d0v0) > (XEN) rax: ffff83013a7313e8 rbx: ffff83013a6bdf40 rcx: 0000000000000051 > (XEN) rdx: ffff83013a731160 rsi: ffff83013a7310e0 rdi: 0000000000000003 > (XEN) rbp: ffff83013a6f7d98 rsp: ffff83013a6f7d78 r8: deadbeefdeadf00d > (XEN) r9: deadbeefdeadf00d r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff83013a6bc7e0 r13: ffff82d08043e720 r14: 0000000000000003 > (XEN) r15: 00000003c5ffecac cr0: 0000000080050033 cr4: 0000000000000660 > (XEN) cr3: 000000004b005000 cr2: 0000000000000000 > (XEN) fsb: 00007751649f4740 gsb: ffff888134a00000 gss: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen code around <ffff82d08022fe1a> > (sched_credit2.c#csched2_unit_wake+0x174/0x176): > (XEN) ef e8 1e c1 ff ff eb a7 <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 53 > 48 > (XEN) Xen stack trace from rsp=ffff83013a6f7d78: > (XEN) ffff83013a6a3000 ffff83013a6bdf40 ffff83013a6bdf40 ffff83013a7313e8 > (XEN) ffff83013a6f7de8 ffff82d0802391f8 0000000000000202 ffff83013a7313e8 > (XEN) ffff83013a6c1018 0000000000000001 0000000000000000 0000000000000000 > (XEN) ffff83013a6c1018 ffff83013a6a3000 ffff83013a6f7e58 ffff82d08020906c > (XEN) ffff82d08035d3d4 ffff82d08035d3c8 ffff82d08035d3d4 ffff82d08035d3c8 > (XEN) ffff82d08035d3d4 ffff82d08035d3c8 ffff82d08035d3d4 ffff83013a6f7ef8 > (XEN) 0000000000000180 ffff83013a6aa000 deadbeefdeadf00d 0000000000000003 > (XEN) ffff83013a6f7ee8 ffff82d0803570c7 0000000000000001 0000000000000001 > (XEN) 0000000000000000 deadbeefdeadf00d deadbeefdeadf00d ffff82d08035d3c8 > (XEN) ffff82d08035d3d4 ffff82d08035d3c8 ffff82d08035d3d4 ffff82d08035d3c8 > (XEN) ffff82d08035d3d4 ffff83013a6aa000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 00007cfec59080e7 ffff82d08035d432 > (XEN) 0000000000015120 0000000000000001 0000000000000000 ffff88813024a540 > (XEN) 0000000000000000 0000000000000001 0000000000000246 0000000000140000 > (XEN) ffff8880bf7db000 ffffea0004be4508 0000000000000018 ffffffff8100130a > (XEN) 0000000000000000 0000000000000001 0000000000000001 0000010000000000 > (XEN) ffffffff8100130a 000000000000e033 0000000000000246 ffffc90000c97c98 > (XEN) 000000000000e02b 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000e01000000001 ffff83013a6aa000 00000030ba196000 > (XEN) 0000000000000660 0000000000000000 000000013a6e2000 0000040000000000 > (XEN) Xen call trace: > (XEN) [<ffff82d08022fe1a>] R sched_credit2.c#csched2_unit_wake+0x174/0x176 > (XEN) [<ffff82d0802391f8>] F vcpu_wake+0xea/0x4d8 > (XEN) [<ffff82d08020906c>] F do_vcpu_op+0x36f/0x687 > (XEN) [<ffff82d0803570c7>] F pv_hypercall+0x28f/0x57d > (XEN) [<ffff82d08035d432>] F lstar_enter+0x112/0x120 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion 'c2rqd(ops, sched_unit_master(unit)) == svc->rqd' failed at > sched_credit2.c:2137 > (XEN) **************************************** This looks very much like the core scheduling crash found on specific machines in S5. From my analysis, it was a use-after-free on a schedulling resource. Does switching back to thread mode (as opposed to core mode) make the crash go away? ~Andrew Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |