[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc driver in active mode
[Public] Hi, > -----Original Message----- > From: Jan Beulich <jbeulich@xxxxxxxx> > Sent: Tuesday, March 25, 2025 6:49 PM > To: Penny, Zheng <penny.zheng@xxxxxxx> > Cc: Huang, Ray <Ray.Huang@xxxxxxx>; Andrew Cooper > <andrew.cooper3@xxxxxxxxxx>; Anthony PERARD <anthony.perard@xxxxxxxxxx>; > Orzel, Michal <Michal.Orzel@xxxxxxx>; Julien Grall <julien@xxxxxxx>; Roger > Pau Monné <roger.pau@xxxxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>; > xen-devel@xxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc > driver in active mode > > On 06.03.2025 09:39, Penny Zheng wrote: > > amd-cppc has 2 operation modes: autonomous (active) mode, > > non-autonomous (passive) mode. > > In active mode, platform ignores the requestd done in the Desired > > Performance Target register and takes into account only the values set > > to the minimum, maximum and energy performance preference(EPP) > > registers. > > The EPP is used in the CCLK DPM controller to drive the frequency that > > a core is going to operate during short periods of activity. > > The SOC EPP targets are configured on a scale from 0 to 255 where 0 > > represents maximum performance and 255 represents maximum efficiency. > > So this is the other way around from "perf" values, where aiui 0xff is > "highest"? > Yes, it is not the perf value. It is an arbitrary value on a scale from 0 to 255 > > @@ -261,7 +276,20 @@ static int cf_check amd_cppc_cpufreq_target(struct > cpufreq_policy *policy, > > return res; > > > > return amd_cppc_write_request(policy->cpu, data- > >caps.lowest_nonlinear_perf, > > - des_perf, data->caps.highest_perf); > > + des_perf, data->caps.highest_perf, > > + /* Pre-defined BIOS value for passive > > mode */ > > + per_cpu(epp_init, policy->cpu)); } > > + > > +static int read_epp_init(void) > > +{ > > + uint64_t val; > > + > > + if ( rdmsr_safe(MSR_AMD_CPPC_REQ, val) ) > > + return -EINVAL; > > I'm unconvinced of using rdmsr_safe() everywhere (i.e. this also goes for > earlier > patches). Unless you can give a halfway reasonable scenario under which by the > time we get here there's still a chance that the MSR isn't implemented in the > next > lower layer (hardware or another hypervisor, just to explain what's meant, > without > me assuming that the driver should come into play in the first place when we > run > virtualized ourselves). > Correct me if I understand wrongly, we are concerning that the driver may not always have the privilege to directly access the MSR in all scenarios, so rdmsr_safe with exception handling isn't always suitable. Then maybe I shall switch them all into rdmsrl() ? > Furthermore you call this function unconditionally, i.e. if there was a > chance for the > MSR read to fail, CPU init would needlessly fail when in passive mode. > The reason why I also run read_epp_init() for passive mode is to avoid setting epp with zero value for MSR_AMD_CPPC_REQ in passive mode. I want to give it pre-defined BIOS value in passive mode. If we wrap read_epp_init() with active mode check, maybe we shall add extra read before setting request register MSR_AMD_CPPC_REQ, introducing MSR_AMD_CPPC_EPP_MASK to reserve original value for epp in passive mode, or any better suggestion? > > + { > > + /* Force the epp value to be zero for performance policy */ > > + epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE; > > + min_perf = max_perf; > > + } > > + else if ( policy->policy == CPUFREQ_POLICY_POWERSAVE ) > > + /* Force the epp value to be 0xff for powersave policy */ > > + /* > > + * If set max_perf = min_perf = lowest_perf, we are putting > > + * cpu cores in idle. > > + */ > > Nit: Such two successive comments want combining. (Same near the top of the > function, as I notice only now.) > > Furthermore I'm in trouble with interpreting this comment: To me "lowest" > doesn't mean "doing nothing" but "doing things as efficiently in terms of > power use > as possible". IOW that's not idle. Yet the comment reads as if it was meant > to be an > explanation of why we can't set max_perf from min_perf here. That is, not > matter > what's meant to be said, I think this needs re- wording (and possibly using > subjunctive mood). > How about: The lowest non-linear perf is equivalent as P2 frequency. Reducing performance below this point does not lead to total energy savings for a given computation (although it reduces momentary power). So we are not suggesting to set max_perf smaller than lowest non-linear perf, or even the lowest perf. > Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |