[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen



>>> On 24.04.15 at 10:32, <wei.w.wang@xxxxxxxxx> wrote:
> On 23/04/2015 15:27, Jan Beulich wrote:
>> >>> On 24.04.15 at 07:12, <wei.w.wang@xxxxxxxxx> wrote:
>> > On 23/04/2015 22:09, Jan Beulich wrote:
>> >> >>> On 23.04.16 at 15:31, <wei.w.wang@xxxxxxxxx> wrote:
>> >> > The intel_pstate.c file under xen/arch/x86/acpi/cpufreq/ contains
>> >> > all the logic for selecting the current P-state. It follows its
>> >> > implementation in the kernel. Instead of using the traditional
>> >> > cpufreq governors, intel_pstate implements its internal governor in
>> >> > the "setpolicy()".
>> >>
>> >> And this internal governor behaves how? Like ondemand, powersave,
>> >> peerformance, or yet something else? And how would its behavior be
>> >> changed?
>> >
>> > In the kenel intel_pstate implementation, they have two internal governors:
>> > Powersave and Performance.
>> > Powersave is similar to the old (cpufreq) ondemand governor. A timer
>> > function is periodically invoked to sample the CPU busy info (e.g.
>> > will get increased due to the running of a CPU-intensive workload).
>> > However, the final calculated target value is clamped into the
>> > [min_pct, max_pct] limit interval.
>> > The Performance governor is actually a special case of Powersave, when
>> > the min_pct= max_pct=100%. This is the same as the old performance
>> governor.
>> 
>> So a true powersave one would then be accomplished by setting min_pct =
>> max_pct = <some value smaller than 100>%. Is there a limit on the valid
>> percentages to be specified here?
> 
> 
> In the old driver, a powersave governor just sets the CPU to run with the 
> lowest possible performance state. This one does not exist in the 
> intel_pstate driver. 
> The intel_pstate driver changes the terminology by using "powersave" to 
> refer to the previous "ondemand" case. This does make people feel confused. 
> But we may think it this way: it only has two modes, the max performance mode 
> and the ondemand mode. "ondemand" is the one who saves power (actually in a 
> more reasonable way compared to the previous "powersave" which simply sets 
> the CPU to run with the lowest performance state). Anyway, we can surely 
> change the name if it sounds uncomfortable.

I think at the very least from a user interface perspective (e.g. the
xenpm tool) the meaning of the old governor names should be
retained as much as possible.

> The valid pct value range is 0 to 100. 

So what does 0% mean then? I.e. (wrt "powersave") what does
min_pct = max_pct = 0 result in?

>> Also, you calling "powersave" what supposedly is "ondemand"
>> makes me nervous about it not immediately raising the CPU freq when load
>> increases, yet imo that's a fundamental requirement for server kind loads
>> where you don't want to run in "performance" mode. Can you clarify the
>> behavior here?
> 
> The timer fires very 10ms to update the CPU P-state according to the sampled 
> workload info.

But that doesn't tell what the action is that the timer initiates. I.e.
under what conditions it would effect a frequency change.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.