Xen project Mailing List

RE: [Xen-devel] cpuidle causing Dom0 soft lockups

To: Jan Beulich <JBeulich@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Yu, Ke" <ke.yu@xxxxxxxxx>

Date: Thu, 21 Jan 2010 20:07:33 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc:

Delivery-date: Thu, 21 Jan 2010 04:08:05 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: Acqaf2RtJ+iyhaAFQqKwkCIdRH4/4gAEoMog

Thread-topic: [Xen-devel] cpuidle causing Dom0 soft lockups

Hi Jan, Could you try the following debugging patch. it can help to narrow down the root cause: diff -r ea02c95af387 xen/arch/x86/acpi/cpu_idle.c --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -228,7 +228,6 @@ static void acpi_processor_idle(void) cpufreq_dbs_timer_suspend(); - sched_tick_suspend(); /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */ process_pending_softirqs(); Regards Ke >-----Original Message----- >From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx >[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich >Sent: Thursday, January 21, 2010 5:52 PM >To: xen-devel@xxxxxxxxxxxxxxxxxxx >Subject: [Xen-devel] cpuidle causing Dom0 soft lockups > >On large systems and with Dom0 booting with (significantly) more than >32 vCPU-s we have got multiple reports that the now by default >enabled C-state management is causing soft lockups, usually preventing >the boot from completing. > >The observations are: > >Reducing the number of vCPU-s (or pCPU-s) sufficiently much makes >the systems work. > >max_cstate=0 makes the systems work. > >max_cstate=1 makes the problem less severe on one (bigger) system, >and eliminates it completely on another (smaller) one. > >When appearing to hang, all vCPU-s are in Dom0's timer_interrupt(), >and all (sometimes all but one) are attempting to acquire xtime_lock. >However, due to our use of ticket locks we can verify that this is not >a deadlock (repeatedly sending '0' shows forward progress, as the >tickets [visible on the stack] continue to increase). Additionally, there >is always one vCPU that has its polling event channel (used for >waking the next waiting vCPU when a lock becomes available) >signaled. > >In one case (but not in the other) it is always the same vCPU that >is apparently taking very long to wake up from the polling request. >This may be coincidence, but output after sending 'c' also indicates >a significantly higher (about 3 times) usage value for C2 than the >second highest one; the duration printed is roughly the same for >all CPUs. > >While I don't know this code well, it would seem that we're suffering >from extremely long wakeup times. This suggests that there likely is >a (performance) problem even for smaller numbers of vCPU-s. >Hence, unless it can be fixed before 4.0 releases, I would suggest >disabling C-state management by default again. > >I can provide full logs in case needed. > >Jan > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@xxxxxxxxxxxxxxxxxxx >http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.