[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 6/6] xen: try to prevent idle timer from firing too often.

On 09/06/2017 05:51 PM, Dario Faggioli wrote:
> On Tue, 2017-08-29 at 17:30 +0100, George Dunlap wrote:
>> On 08/18/2017 07:04 PM, Dario Faggioli wrote:
>>> What we're trying to avoid is one of those idle CPUs to
>>> wake up, only to discover that the grace period is still
>>> running, and that it hence could have be slept longer
>>> (saving more power).
>> So I think we're only taking about one or two extra wakeups per cpu
>> maximum -- if this even happens at all.
> Yep, indeed.
>> Wouldn't it be better to first add a performance counter, and check
>> to
>> see if and how often this situation happens?
> The counter is there already. It's rcu_idle_timer ("RCU: idle_timer").

And do you (or maybe Julien or Stefano) have an idea how often this
happens with a fixed timer like patch 5?

>>> This patch implements an heuristic aimed at achieving
>>> that, at the price of having to call cpumask_weight() on
>>> the 'entering idle' path, on CPUs with queued callbacks.
>> The heuristic seems a bit strange to me too: why would each cpu
>> increase
>> the grace period in a linear fashion?  I haven't looked at the grace
>> period code, but I would have expected each one to be independent;
>> and
>> so there'd be a "diminishing returns" effect when adding more cpus.
> I like the idea, in general. Let me just double check whether I'm
> understanding what you're suggesting properly.
> First of all, what do you mean with "adding more cpus"? Adding to what?
>  The timer is useful when a CPU, which is participating in the grace
> period, goes idle, while the grace period is not finished. From that
> point on, the number of CPUs participating to _that_ grace period will
> monotonically decrease, until it reaches zero. So what does 'adding'
> means in this context?

'Adding' means 'adding more pcpus to the rcu operation'.

The formula in this patch for the wait time was K * N, where N is the
number of cpus involved in the operation.  Thus, the calculated wait
time will scale linearly with the number of cpus.  An RCU operation with
2 pcups will be 2K, while an operation with 4 cpus will be 4K.  This
implies that the *time you think it will take to complete an RCU
transaction* will scale linearly with the number of cpus -- that on
average, an operation with 4 cpus will take twice as long for all cpus
to complete their grace periods as an operation with 2 cpus, and
operation with 6 cpus will take three times as long as an operation with
2 cpus.

This seems unlikely to me: It seems to me like like statistically the
amount of time taken will converge to some limit as the number of pcpus
involved in the operation grows.

>> If we have to have something like this (which I'm not at all
>> convinced
>> we do), I would think having a single number which self-adjusted so
>> that
>> the number of 'misses' was around 1% would be a good balance.  What
>> about a heuristic like this:
>> 1. If the timer goes off and the grace period isn't over, add 10ms to
>> the timer period
>> 2. If the timer goes off and the grace period *is* over, subtract
>> 100us
>> from the timer period
> So, let's say we start with a period of 1ms. First time RCU is used,
> the timer fires twice: the first time it finds the grace period is
> still ongoning --and hence the period becomes 11ms-- while the seconds
> finds it over --so the period is now 10.9ms.
> Next time RCU is used, if the timer is necessary, we use 10.9ms.
> Am I getting your proposal right?
> If yes, do we allow the period to become smaller than the initial value
> (1ms, in the example above). I'd say we better not (or that we at least
> set a lower bound), or, given enough occurrences of cases when the
> timer fires and finds the grace period to be over in a row, the period
> can become 0!

Yes, just about -- I was thinking of just starting at 10ms rather than
at 1ms, but everything else is pretty accurate.  And yes of course we
probably want an upper and lower bound on the timeout.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.