Xen project Mailing List

Re: [Xen-devel] cpuidle causing Dom0 soft lockups

To: "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxxxx>

Date: Fri, 12 Feb 2010 09:21:00 +0000

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ke Yu <ke.yu@xxxxxxxxx>

Delivery-date: Fri, 12 Feb 2010 01:21:20 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 11.02.10 18:01 >>> >On 11/02/2010 14:44, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote: > >> Other than with the global processed_system_time, >> the per-CPU one may not get increased even if delta_cpu was close >> to 3*NS_PER_TICK, due to the stolen/blocked accounting. Probably >> this was not a problem so far because no code outside of >> timer_interrupt() read this value - Keir, since I think you wrote that >> logic originally, any word on this? > >What you say is true, as clearly it is currently implemented that way almost >by design. I'm lost in the intricacies of your current discussion though, so >not sure exactly why it's a problem, and how we should fix it? First of all I don't think anything necessarily needs to be fixed in the 2.6.18 tree, as that one will never support >32 vCPU-s, and I don't think the scalability issue we're talking about here is of concern there. The problem we're trying to address is the contention on xtime_lock. It is clear that there generally is no need for all CPUs in the system to try to update to global time variables, so some filtering on the number of CPUs concurrently trying to acquire xtime_lock is reasonable. With any filtering done, there is however potential for a CPU to see its local processed time ahead of the global one, but while setting a single shot timer in the past (or very near future) would guarantee that it would execute timer_interrupt() (almost) right away, it does not guarantee that it would now be among those CPUs that would try to acquire xtime_lock (i.e. the situation wouldn't necessarily have improved after the interrupt was handled, and hence an interrupt storm is possible). Consequently, along with capping the timeout to be set in stop_hz_timer() to jiffies+1, the timeout would also reasonably be capped to per_cpu(processed_system_time, cpu) + NS_PER_TICK. This in turn only makes sense is the per-CPU processed time is accurate (i.e. within NS_PER_TICK from when the last timer interrupt occurred). That however doesn't hold: Due to the stolen/ blocked calculations subtracting exact nanosecond values from delta_cpu, but only adding tick granular values into per-CPU processed_system_time, the error can accumulate up to a little less than 3*NS_PER_TICK. The supposed change would be to do only a single adjustment to per-CPU processed_system_time (using the originally calculated delta_cpu value). What I couldn't convince myself of so far was that this wouldn't influence the stolen/blocked accounting (since the delta_cpu calculated on the next timer interrupt would now necessarily be different from the one calculated with the current logic) - in particular the adjustments commented with "clamp local-time progress" are what would appear to get used more frequently with the thought of change. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.