[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/9] Porting the intel_pstate driver to Xen



 On 25/04/2015 00:15, Konrad Rzeszutek Wilk wrote
> On Fri, Apr 24, 2015 at 03:42:40PM +0000, Wang, Wei W wrote:
> > On 24/04/2015 23:04, Jan Beulich wrote
> > > >>> On 24.04.15 at 16:56, <wei.w.wang@xxxxxxxxx> wrote:
> > > > On 24/04/2015 20:57, Jan Beulich wrote
> > > >> I'm not sure how else to express what I want (no matter how many
> > > >> internal governors the intel_pstate driver has).
> > > >>
> > > >> xenpm set-scaling-governor powersave xenpm set-scaling-governor
> > > >> ondemand xenpm set-scaling-governor performance
> > > >>
> > > >> each should switch the system into a respective state, no matter
> > > >> whether internally to the driver this means a change of governors
> > > >> or just a modification to {min,max}_pct.
> > > >>
> > > >> And obtaining the current state after any of the above should
> > > >> show the same governor in use that was set (and not "internal"),
> > > >> again no matter how this is being achieved internally to the driver.
> > > >
> > > > Thanks Jan, that's clear. But this will have another issue. For
> > > > example, we set-scaling-governor to "ondemand", then we adjust
> > > > min_pct=max_pct = 60%. The timer function may generate results
> > > > like 35%, 55%, 45%..., but the CPU just keeps running with 60%.
> > >
> > > So I must be misunderstanding something then: How can the driver do
> > > anything at all when told to run the system at 60%?
> >
> > The {min,max}_pct is a limit. The timer function figures out a proper value
> based on the sampled statistics, then this value is clamped into [min_pct,
> max_pct]. When we have [60%, 60%], whatever the value from the timer
> function is, it will be finally adjusted to 60%, and set to the perf_ctl 
> register.
> >
> > > > Then, this is not "ondemand" at all (I think this should be
> > > > another reason why the intel_pstate driver does not call its
> > > > governor "ondemand").
> > > >
> > > > The intel_pstate driver in the kernel has already got rid of the
> > > > old governor convention. They let the user get what they want
> > > > through simply adjusting the {min,max}_pct  (the {min,max}_pct
> > > > actually limits how the performance is selected).
> > >
> > > Adjusting the values individually to me very much looks like the
> > > userspace governor.
> >
> > Yeah, that example was like "userspace". Please take a look at this example:
> [min_pct=60%, max_pct=80%], the timer generates 45%, 55%, 65%, 70%,
> 75%, 90%, then the final target values will not be constant. The ones (65%,
> 70%, 75%) falling into the limit interval behaves like "ondemand", others are
> not.
> >
> > >
> > > > I think we can follow the kernel implementation regarding this
> > > > point, what do you think?
> > >
> > > Not sure - I'm not always convinced that what Linux does is the one
> > > and only and best way.
> >
> > Understand it. But I think that usage is good, in terms of supporting future
> intel processors (e.g. the hardware controlled P-states on Skylake+). The
> {min,max}_pct needs to be exposed to users to set the limits.
> 
> How will this affect AMD processors which can use the cpufreq? Would the
> ondemand feature go away?

No, this won't affect them. When the "intel_pstate=disable" is added to the 
booting parameter list, the old cpufreq driver will be used, and everything, 
including xenpm, will work in the old style.

The new driver, intel_pstate, actually works in a mode similar to the ondemand 
(probably can be called enhanced ondemand - the user can set a limit range for 
the "ondemand").

Best,
Wei

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.