[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc driver in active mode
On 28.03.2025 05:07, Penny, Zheng wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@xxxxxxxx> >> Sent: Tuesday, March 25, 2025 6:49 PM >> >> On 06.03.2025 09:39, Penny Zheng wrote: >>> @@ -261,7 +276,20 @@ static int cf_check amd_cppc_cpufreq_target(struct >> cpufreq_policy *policy, >>> return res; >>> >>> return amd_cppc_write_request(policy->cpu, data- >>> caps.lowest_nonlinear_perf, >>> - des_perf, data->caps.highest_perf); >>> + des_perf, data->caps.highest_perf, >>> + /* Pre-defined BIOS value for passive >>> mode */ >>> + per_cpu(epp_init, policy->cpu)); } >>> + >>> +static int read_epp_init(void) >>> +{ >>> + uint64_t val; >>> + >>> + if ( rdmsr_safe(MSR_AMD_CPPC_REQ, val) ) >>> + return -EINVAL; >> >> I'm unconvinced of using rdmsr_safe() everywhere (i.e. this also goes for >> earlier >> patches). Unless you can give a halfway reasonable scenario under which by >> the >> time we get here there's still a chance that the MSR isn't implemented in >> the next >> lower layer (hardware or another hypervisor, just to explain what's meant, >> without >> me assuming that the driver should come into play in the first place when we >> run >> virtualized ourselves). >> > > Correct me if I understand wrongly, we are concerning that the driver may not > always > have the privilege to directly access the MSR in all scenarios, so rdmsr_safe > with exception > handling isn't always suitable. Then maybe I shall switch them all into > rdmsrl() ? There's no privilege question here - we're running at the highest possible privilege level. The only question in MSR access can concern the existence of these MSRs (on bare hardware) or improper emulation of MSRs by an underlying hypervisor. The latter case I think we can pretty much exclude for a driver like this one - the driver simply has no (real) use when running virtualized. Which leaves errata on hardware. Those would better be dealt with by checking once up front (and then disabling the driver if need be). IOW except for perhaps a single probing access early in driver init, I think these better would all be plain accesses. And even such an early probing access would likely only need switching to when a relevant erratum becomes known. >> Furthermore you call this function unconditionally, i.e. if there was a >> chance for the >> MSR read to fail, CPU init would needlessly fail when in passive mode. >> > > The reason why I also run read_epp_init() for passive mode is to avoid > setting epp with zero value > for MSR_AMD_CPPC_REQ in passive mode. I want to give it pre-defined BIOS > value in passive mode. > If we wrap read_epp_init() with active mode check, maybe we shall add extra > read before setting request register MSR_AMD_CPPC_REQ, > introducing MSR_AMD_CPPC_EPP_MASK to reserve original value for epp in > passive mode, or any better suggestion? Well, not using rdmsr_safe() here would make the function impossible to fail, and hence the question itself would be moot. Otherwise my suggestion would be to ignore the error (perhaps associated with a warning) in passive mode. >>> + { >>> + /* Force the epp value to be zero for performance policy */ >>> + epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE; >>> + min_perf = max_perf; >>> + } >>> + else if ( policy->policy == CPUFREQ_POLICY_POWERSAVE ) >>> + /* Force the epp value to be 0xff for powersave policy */ >>> + /* >>> + * If set max_perf = min_perf = lowest_perf, we are putting >>> + * cpu cores in idle. >>> + */ >> >> Nit: Such two successive comments want combining. (Same near the top of the >> function, as I notice only now.) >> >> Furthermore I'm in trouble with interpreting this comment: To me "lowest" >> doesn't mean "doing nothing" but "doing things as efficiently in terms of >> power use >> as possible". IOW that's not idle. Yet the comment reads as if it was meant >> to be an >> explanation of why we can't set max_perf from min_perf here. That is, not >> matter >> what's meant to be said, I think this needs re- wording (and possibly using >> subjunctive mood). > > How about: > The lowest non-linear perf is equivalent as P2 frequency. Reducing > performance below this > point does not lead to total energy savings for a given computation (although > it reduces momentary power). > So we are not suggesting to set max_perf smaller than lowest non-linear perf, > or even the lowest perf. In an abstract way I think I can follow this. In the context of the code being commented, however, I'm afraid I still can't make sense of it. Main point being that the code commented doesn't use any of the *_perf values. It only sets the "epp" local variable. Maybe the point of the comment is to explain why non of the *_perf are used here, but I can't read this out of either of the proposed texts. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |