[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] cpuidle causing Dom0 soft lockups
Hi Jan, Could you try the following debugging patch. it can help to narrow down the root cause: diff -r ea02c95af387 xen/arch/x86/acpi/cpu_idle.c --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -228,7 +228,6 @@ static void acpi_processor_idle(void) cpufreq_dbs_timer_suspend(); - sched_tick_suspend(); /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */ process_pending_softirqs(); Regards Ke >-----Original Message----- >From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx >[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich >Sent: Thursday, January 21, 2010 5:52 PM >To: xen-devel@xxxxxxxxxxxxxxxxxxx >Subject: [Xen-devel] cpuidle causing Dom0 soft lockups > >On large systems and with Dom0 booting with (significantly) more than >32 vCPU-s we have got multiple reports that the now by default >enabled C-state management is causing soft lockups, usually preventing >the boot from completing. > >The observations are: > >Reducing the number of vCPU-s (or pCPU-s) sufficiently much makes >the systems work. > >max_cstate=0 makes the systems work. > >max_cstate=1 makes the problem less severe on one (bigger) system, >and eliminates it completely on another (smaller) one. > >When appearing to hang, all vCPU-s are in Dom0's timer_interrupt(), >and all (sometimes all but one) are attempting to acquire xtime_lock. >However, due to our use of ticket locks we can verify that this is not >a deadlock (repeatedly sending '0' shows forward progress, as the >tickets [visible on the stack] continue to increase). Additionally, there >is always one vCPU that has its polling event channel (used for >waking the next waiting vCPU when a lock becomes available) >signaled. > >In one case (but not in the other) it is always the same vCPU that >is apparently taking very long to wake up from the polling request. >This may be coincidence, but output after sending 'c' also indicates >a significantly higher (about 3 times) usage value for C2 than the >second highest one; the duration printed is roughly the same for >all CPUs. > >While I don't know this code well, it would seem that we're suffering >from extremely long wakeup times. This suggests that there likely is >a (performance) problem even for smaller numbers of vCPU-s. >Hence, unless it can be fixed before 4.0 releases, I would suggest >disabling C-state management by default again. > >I can provide full logs in case needed. > >Jan > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@xxxxxxxxxxxxxxxxxxx >http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |