[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Fix scheduler crash after s3 resume

  • To: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Thu, 24 Jan 2013 07:18:17 +0100
  • Cc: george.dunlap@xxxxxxxxxxxxx, keir@xxxxxxx, Jan Beulich <JBeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Thu, 24 Jan 2013 06:18:56 +0000
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=r/5K8xQaBG+7N6fvejP7H8lSwRmvA+4W/7p603aml804xCU9IpqnNYbg 056myRsPMs/oexhW76P8t+lkwsBT2AsqjWuKNWmU+wb8KJeMu2Ls72tkP sai72/wfApdUHFr5dkVowQlzHyXEFbyOpxijqEgr+ZCDwOi3hhmXDjCtC uXfv86KtY7vCpHpac5qk4kusm8x+yt1QjRr/nf80vx23PAVxikULeI0/O yd3WBcMI8CAmJfoQPoCpALEWabd7C;
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am 23.01.2013 16:51, schrieb Tomasz Wroblewski:
Hi all,

This was also discussed earlier, for example here

Changeset 25079:d5ccb2d1dbd1 (Introduce system_state variable) added a
global variable, which, among other things, is used to prevent disabling
cpu scheduler, prevent breaking vcpu affinities, prevent removing the
cpu from cpupool on suspend. However, it missed one place where cpu is
removed from the cpupool valid cpus mask, in smpboot.c, __cpu_disable(),
line 840:

cpumask_clear_cpu(cpu, cpupool0->cpu_valid);

This causes the vcpu in the default pool to be considered inactive, and
the following assertion is violated in sched_credit.c soon after resume
transitions out of xen, causing a platform reboot:

(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs ...
(XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
failed at sched_credit.c:507
(XEN) ----[ Xen-4.3-unstable x86_64 debug=y Tainted: C ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<ffff82c480119e9e>] _csched_cpu_pick+0x155/0x5fd
(XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor
(XEN) rax: 0000000000000001 rbx: 0000000000000008 rcx: 0000000000000008
(XEN) rdx: 00000000000000ff rsi: 0000000000000008 rdi: 0000000000000000
(XEN) rbp: ffff83011415fdd8 rsp: ffff83011415fcf8 r8: 0000000000000000
(XEN) r9: 000000000000003e r10: 00000008f3de731f r11: ffffea0000063800
(XEN) r12: ffff82c480261720 r13: ffff830137b4d950 r14: ffff830137beb010
(XEN) r15: ffff82c480261720 cr0: 0000000080050033 cr4: 00000000000026f0
(XEN) cr3: 000000013c17d000 cr2: ffff8800ac6ef8f0
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff83011415fcf8:
(XEN) 00000000000af257 0000000800000001 ffff8300ba4fd000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000002 ffff8800ac6ef8f0
(XEN) 0000000800000000 00000001318e0025 0000000000000087 ffff83011415fd68
(XEN) ffff82c480124f79 ffff83011415fd98 ffff83011415fda8 00007fda88d1e790
(XEN) ffff8800ac6ef8f0 00000001318e0025 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000146 ffff830137b4d940
(XEN) 0000000000000001 ffff830137b4d950 ffff830137beb010 ffff82c480261720
(XEN) ffff83011415fe48 ffff82c48011a51b 0002000e00000007 ffffffff81009071
(XEN) 000000000000e033 ffff83013a805360 ffff880002bb3c28 000000000000e02b
(XEN) e4d87248e7ca5f52 ffff830102ae2200 0000000000000001 ffff82c48011a356
(XEN) 00000008efa1f543 00007fda88d1e790 ffff83011415fe78 ffff82c48012748f
(XEN) 0000000000000002 ffff830137beb028 ffff830102ae2200 ffff830137beb8d0
(XEN) ffff83011415fec8 ffff82c48012758b ffff830114150000 ffff8800ac6ef8f0
(XEN) 80100000ae86d065 ffff82c4802e0080 ffff82c4802e0000 ffff830114158000
(XEN) ffffffffffffffff 00007fda88d1e790 ffff83011415fef8 ffff82c480124b4e
(XEN) ffff8300ba4fd000 ffffea0000063800 00000001318e0025 ffff8800ac6ef8f0
(XEN) ffff83011415ff08 ffff82c480124bb4 00007cfeebea00c7 ffff82c480226a71
(XEN) 00007fda88d1e790 ffff8800ac6ef8f0 00000001318e0025 ffffea0000063800
(XEN) ffff880002bb3c78 00000001318e0025 ffffea0000063800 0000000000000146
(XEN) 00003ffffffff000 ffffea0002b1bbf0 0000000000000000 00000001318e0025
(XEN) Xen call trace:
(XEN) [<ffff82c480119e9e>] _csched_cpu_pick+0x155/0x5fd
(XEN) [<ffff82c48011a51b>] csched_tick+0x1c5/0x342
(XEN) [<ffff82c48012748f>] execute_timer+0x4e/0x6c
(XEN) [<ffff82c48012758b>] timer_softirq_action+0xde/0x206
(XEN) [<ffff82c480124b4e>] __do_softirq+0x8e/0x99
(XEN) [<ffff82c480124bb4>] do_softirq+0x13/0x15
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
failed at sched_credit.c:507
(XEN) ****************************************
(XEN) Reboot in five seconds...

^ reason for above being that "cpus" cpumask is empty as it is a logical
"and" between cpupool's valid cpus (from which the cpu was removed) and
cpu affinity mask.

Attached patch follows the spirit of the changeset 25079:d5ccb2d1dbd1
(which blocked removal of the cpu from the cpupool in cpupool.c) by also
blocking it's removal from the cpupool's valid cpumask. So cpu
affinities are still preserved across suspend/resume, and scheuduler
does not need to be disabled, as per original intent (I think). Would
welcome comments.

Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx>

Acked-by: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>

Commit message:
Fix s3 resume regression (crash in scheduler) after c-s
25079:d5ccb2d1dbd1 by also blocking removal of the cpu from the
cpupool's cpu_valid mask - in the spirit of mentioned c-s.

Xen-devel mailing list

Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.