|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc driver in active mode
[Public]
Hi,
> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Tuesday, March 25, 2025 6:49 PM
> To: Penny, Zheng <penny.zheng@xxxxxxx>
> Cc: Huang, Ray <Ray.Huang@xxxxxxx>; Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx>; Anthony PERARD <anthony.perard@xxxxxxxxxx>;
> Orzel, Michal <Michal.Orzel@xxxxxxx>; Julien Grall <julien@xxxxxxx>; Roger
> Pau Monné <roger.pau@xxxxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> xen-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc
> driver in active mode
>
> On 06.03.2025 09:39, Penny Zheng wrote:
> > amd-cppc has 2 operation modes: autonomous (active) mode,
> > non-autonomous (passive) mode.
> > In active mode, platform ignores the requestd done in the Desired
> > Performance Target register and takes into account only the values set
> > to the minimum, maximum and energy performance preference(EPP)
> > registers.
> > The EPP is used in the CCLK DPM controller to drive the frequency that
> > a core is going to operate during short periods of activity.
> > The SOC EPP targets are configured on a scale from 0 to 255 where 0
> > represents maximum performance and 255 represents maximum efficiency.
>
> So this is the other way around from "perf" values, where aiui 0xff is
> "highest"?
>
Yes, it is not the perf value. It is an arbitrary value on a scale from 0 to 255
> > @@ -261,7 +276,20 @@ static int cf_check amd_cppc_cpufreq_target(struct
> cpufreq_policy *policy,
> > return res;
> >
> > return amd_cppc_write_request(policy->cpu, data-
> >caps.lowest_nonlinear_perf,
> > - des_perf, data->caps.highest_perf);
> > + des_perf, data->caps.highest_perf,
> > + /* Pre-defined BIOS value for passive
> > mode */
> > + per_cpu(epp_init, policy->cpu)); }
> > +
> > +static int read_epp_init(void)
> > +{
> > + uint64_t val;
> > +
> > + if ( rdmsr_safe(MSR_AMD_CPPC_REQ, val) )
> > + return -EINVAL;
>
> I'm unconvinced of using rdmsr_safe() everywhere (i.e. this also goes for
> earlier
> patches). Unless you can give a halfway reasonable scenario under which by the
> time we get here there's still a chance that the MSR isn't implemented in the
> next
> lower layer (hardware or another hypervisor, just to explain what's meant,
> without
> me assuming that the driver should come into play in the first place when we
> run
> virtualized ourselves).
>
Correct me if I understand wrongly, we are concerning that the driver may not
always
have the privilege to directly access the MSR in all scenarios, so rdmsr_safe
with exception
handling isn't always suitable. Then maybe I shall switch them all into
rdmsrl() ?
> Furthermore you call this function unconditionally, i.e. if there was a
> chance for the
> MSR read to fail, CPU init would needlessly fail when in passive mode.
>
The reason why I also run read_epp_init() for passive mode is to avoid setting
epp with zero value
for MSR_AMD_CPPC_REQ in passive mode. I want to give it pre-defined BIOS value
in passive mode.
If we wrap read_epp_init() with active mode check, maybe we shall add extra
read before setting request register MSR_AMD_CPPC_REQ,
introducing MSR_AMD_CPPC_EPP_MASK to reserve original value for epp in passive
mode, or any better suggestion?
> > + {
> > + /* Force the epp value to be zero for performance policy */
> > + epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
> > + min_perf = max_perf;
> > + }
> > + else if ( policy->policy == CPUFREQ_POLICY_POWERSAVE )
> > + /* Force the epp value to be 0xff for powersave policy */
> > + /*
> > + * If set max_perf = min_perf = lowest_perf, we are putting
> > + * cpu cores in idle.
> > + */
>
> Nit: Such two successive comments want combining. (Same near the top of the
> function, as I notice only now.)
>
> Furthermore I'm in trouble with interpreting this comment: To me "lowest"
> doesn't mean "doing nothing" but "doing things as efficiently in terms of
> power use
> as possible". IOW that's not idle. Yet the comment reads as if it was meant
> to be an
> explanation of why we can't set max_perf from min_perf here. That is, not
> matter
> what's meant to be said, I think this needs re- wording (and possibly using
> subjunctive mood).
>
How about:
The lowest non-linear perf is equivalent as P2 frequency. Reducing performance
below this
point does not lead to total energy savings for a given computation (although
it reduces momentary power).
So we are not suggesting to set max_perf smaller than lowest non-linear perf,
or even the lowest perf.
> Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |