[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] cpuidle causing Dom0 soft lockups
On large systems and with Dom0 booting with (significantly) more than 32 vCPU-s we have got multiple reports that the now by default enabled C-state management is causing soft lockups, usually preventing the boot from completing. The observations are: Reducing the number of vCPU-s (or pCPU-s) sufficiently much makes the systems work. max_cstate=0 makes the systems work. max_cstate=1 makes the problem less severe on one (bigger) system, and eliminates it completely on another (smaller) one. When appearing to hang, all vCPU-s are in Dom0's timer_interrupt(), and all (sometimes all but one) are attempting to acquire xtime_lock. However, due to our use of ticket locks we can verify that this is not a deadlock (repeatedly sending '0' shows forward progress, as the tickets [visible on the stack] continue to increase). Additionally, there is always one vCPU that has its polling event channel (used for waking the next waiting vCPU when a lock becomes available) signaled. In one case (but not in the other) it is always the same vCPU that is apparently taking very long to wake up from the polling request. This may be coincidence, but output after sending 'c' also indicates a significantly higher (about 3 times) usage value for C2 than the second highest one; the duration printed is roughly the same for all CPUs. While I don't know this code well, it would seem that we're suffering from extremely long wakeup times. This suggests that there likely is a (performance) problem even for smaller numbers of vCPU-s. Hence, unless it can be fixed before 4.0 releases, I would suggest disabling C-state management by default again. I can provide full logs in case needed. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |