Xen project Mailing List

Re: [Xen-devel] [PATCH] remove blocked time accounting from xen "clockchip"

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>

Date: Thu, 19 Jan 2012 14:42:39 -0500

Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Joe Jin <joe.jin@xxxxxxxxxx>, Zhenzhong Duan <zhenzhong.duan@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Laszlo Ersek <lersek@xxxxxxxxxx>

Delivery-date: Thu, 19 Jan 2012 19:42:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Thu, Nov 10, 2011 at 10:05:14AM -0800, Jeremy Fitzhardinge wrote: > On 11/09/2011 05:35 AM, Jan Beulich wrote: > >>>> On 18.10.11 at 22:42, Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > >> ... because the "clock_event_device framework" already accounts for idle > >> time through the "event_handler" function pointer in > >> xen_timer_interrupt(). > >> > >> The patch is intended as the completion of [1]. It should fix the double > >> idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to > >> stolen time accounting (the removed code seems to be isolated). > > After some more looking around I still think it's incorrect, albeit for > > a different reason: What tick_nohz_restart_sched_tick() accounts > > as idle time is *all* time that passed while in cpu_idle(). What gets > > accounted in do_stolen_accounting() (without your patch) is > > different: > > - time the vCPU was in RUNSTATE_blocked gets accounted as idle > > - time the vCPU was in RUNSTATE_runnable and RUNSTATE_offline > > gets accounted as stolen. > > > > That is, on an overcommitted system (and without your patch) I > > would expect you to not see the (full) double idle increment for a not > > fully idle and not fully loaded vCPU. > > > > Now we can of course say that for most purposes it's fine to > > ignore the difference between idle and steal time, but there must > > be a reason these have been getting accounted separately in Linux > > without regard to Xen. > > > > So I really think that what needs to be suppressed to address this > > is tick_nohz_restart_sched_tick()'s call to account_idle_ticks(). But > > simply forcing CONFIG_VIRT_CPU_ACCOUNTING is not an immediate > > option (not even when considering a Xen-only kernel), as that has > > further implications. Instead I wonder whether under Xen we should > > e.g. update per_cpu(tick_cpu_sched, cpu).idle_jiffies prior to calling > > tick_nohz_restart_sched_tick() (possibly implicitly by doing the > > update in do_stolen_accounting(), though that wouldn't work when > > the wakeup is due to an interrupt other than the timer one). > > > > The alternative of accounting the negative of the steal time as > > idle time in do_stolen_accounting() doesn't look correct either, > > since not all stolen time was necessarily accounted as idle (some > > would have got - also incorrectly - accounted as process or system > > time). > > > > So my conclusion is that only the full implementation of what > > CONFIG_VIRT_CPU_ACCOUNTING expects would actually get > > things sorted out properly (but that would imply Xen *and* native > > are willing to pay the performance price, or the enabling of this > > can be made dynamic so that at least native could remain > > unaffected). > > > > Or the negative stolen time should be accounted to the current > > process, the system, or as idle, depending on the context which > > do_stolen_accounting() runs in (just like timer_interrupt() did in > > the XenoLinux kernels for positive accounting). > > I was always suspicious of the timer-interrupt-based stolen time > accounting code. It's really a hold-over from when ticks were the main > timekeeping mechanism, but with highres timers its massive overkill and > probably a source of performance degradation. > > Overall, the stolen time accounting isn't really very important. The > kernel doesn't use it at all internally - it doesn't affect scheduling > decisions, for example. It's only exported in /proc/stat, and some > tools like top will display it if its non-zero. It could be useful for > diagnosing performance problems on a heavily loaded host system, but > other tools like "xentop" would give you as much information. > > So really, all this code is nice to have, but I'm not sure its worth a > lot of time to fix, especially if it makes idle accounting hard (which > *is* important). > > However, that said, PeterZ recently added some code to the scheduler so > that time "stolen" by interrupt handling isn't accounted against the > task running at the time, and the design is specifically intended to > also be useful for virtualization stolen time as well. There are some > KVM patches floating around to implement this, and we should look at I finally got some time to look at them and I think they are these ones: git log --oneline e03b644fe68b1c6401465b02724d261538dba10f..3c404b578fab699c4708279938078d9404b255a4 3c404b5 KVM guest: Add a pv_ops stub for steal time c9aaa89 KVM: Steal time implementation 9ddabbe KVM: KVM Steal time guest/host interface 4b6b35f KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface What is interesting is that they end up inserting a bunch of: + if (steal_account_process_tick()) + return; + in irqtime_account_process_tick and in account_process_tick. The steal_account_process_tick just makes a pv_time_ops.steal_clock based on the asm goto of paravirt_steal_enabled key. I think if we were to persue this we could rip out in the 'do_stolen_accounting' the call to 'accont_idle_tick' and 'account_idle_time' completly. In essence making that function behave as a pv_time_ops.steal_clock implementation. Also that would mean we could remove in the 'xen_timer_interrupt' function the call to 'do_stolen_accounting' I think this would mean we would account _only_ for the RUNSTATE_runnable and RUNSTATE_offline as Laszlo's earlier patch suggested? And the RUNSTATE_blocked would be figured out by the existing mechanism in the sched.c code? > doing a Xen implementation. That would be much more practically useful, > since (I think) it allows the scheduler to be aware of stolen time and > not penalize tasks for time they never got to use. Or at the very least > it could give you per-task stolen stats (maybe?) which would be useful. > > J > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.