[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 12/15] xen/x86: implement EPP support for the amd-cppc driver in active mode


  • To: "Penny, Zheng" <penny.zheng@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 28 Mar 2025 08:18:16 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: "Huang, Ray" <Ray.Huang@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, "Orzel, Michal" <Michal.Orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 28 Mar 2025 07:18:25 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.03.2025 05:07, Penny, Zheng wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@xxxxxxxx>
>> Sent: Tuesday, March 25, 2025 6:49 PM
>>
>> On 06.03.2025 09:39, Penny Zheng wrote:
>>> @@ -261,7 +276,20 @@ static int cf_check amd_cppc_cpufreq_target(struct
>> cpufreq_policy *policy,
>>>          return res;
>>>
>>>      return amd_cppc_write_request(policy->cpu, data-
>>> caps.lowest_nonlinear_perf,
>>> -                                  des_perf, data->caps.highest_perf);
>>> +                                  des_perf, data->caps.highest_perf,
>>> +                                  /* Pre-defined BIOS value for passive 
>>> mode */
>>> +                                  per_cpu(epp_init, policy->cpu)); }
>>> +
>>> +static int read_epp_init(void)
>>> +{
>>> +    uint64_t val;
>>> +
>>> +    if ( rdmsr_safe(MSR_AMD_CPPC_REQ, val) )
>>> +        return -EINVAL;
>>
>> I'm unconvinced of using rdmsr_safe() everywhere (i.e. this also goes for 
>> earlier
>> patches). Unless you can give a halfway reasonable scenario under which by 
>> the
>> time we get here there's still a chance that the MSR isn't implemented in 
>> the next
>> lower layer (hardware or another hypervisor, just to explain what's meant, 
>> without
>> me assuming that the driver should come into play in the first place when we 
>> run
>> virtualized ourselves).
>>
> 
> Correct me if I understand wrongly, we are concerning that the driver may not 
> always
> have the privilege to directly access the MSR in all scenarios, so rdmsr_safe 
> with exception
> handling isn't always suitable. Then maybe I shall switch them all into 
> rdmsrl() ?

There's no privilege question here - we're running at the highest possible 
privilege
level. The only question in MSR access can concern the existence of these MSRs 
(on
bare hardware) or improper emulation of MSRs by an underlying hypervisor. The 
latter
case I think we can pretty much exclude for a driver like this one - the driver
simply has no (real) use when running virtualized. Which leaves errata on 
hardware.
Those would better be dealt with by checking once up front (and then disabling 
the
driver if need be). IOW except for perhaps a single probing access early in 
driver
init, I think these better would all be plain accesses. And even such an early
probing access would likely only need switching to when a relevant erratum 
becomes
known.

>> Furthermore you call this function unconditionally, i.e. if there was a 
>> chance for the
>> MSR read to fail, CPU init would needlessly fail when in passive mode.
>>
> 
> The reason why I also run read_epp_init() for passive mode is to avoid 
> setting epp with zero value
> for MSR_AMD_CPPC_REQ in passive mode. I want to give it pre-defined BIOS 
> value in passive mode.
> If we wrap read_epp_init() with active mode check, maybe we shall add extra 
> read before setting request register MSR_AMD_CPPC_REQ,
> introducing MSR_AMD_CPPC_EPP_MASK to reserve original value for epp in 
> passive mode, or any better suggestion?

Well, not using rdmsr_safe() here would make the function impossible to fail, 
and
hence the question itself would be moot. Otherwise my suggestion would be to 
ignore
the error (perhaps associated with a warning) in passive mode.

>>> +    {
>>> +        /* Force the epp value to be zero for performance policy */
>>> +        epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
>>> +        min_perf = max_perf;
>>> +    }
>>> +    else if ( policy->policy == CPUFREQ_POLICY_POWERSAVE )
>>> +        /* Force the epp value to be 0xff for powersave policy */
>>> +        /*
>>> +         * If set max_perf = min_perf = lowest_perf, we are putting
>>> +         * cpu cores in idle.
>>> +         */
>>
>> Nit: Such two successive comments want combining. (Same near the top of the
>> function, as I notice only now.)
>>
>> Furthermore I'm in trouble with interpreting this comment: To me "lowest"
>> doesn't mean "doing nothing" but "doing things as efficiently in terms of 
>> power use
>> as possible". IOW that's not idle. Yet the comment reads as if it was meant 
>> to be an
>> explanation of why we can't set max_perf from min_perf here. That is, not 
>> matter
>> what's meant to be said, I think this needs re- wording (and possibly using
>> subjunctive mood).
> 
> How about:
> The lowest non-linear perf is equivalent as P2 frequency. Reducing 
> performance below this
> point does not lead to total energy savings for a given computation (although 
> it reduces momentary power).
> So we are not suggesting to set max_perf smaller than lowest non-linear perf, 
> or even the lowest perf.

In an abstract way I think I can follow this. In the context of the code being
commented, however, I'm afraid I still can't make sense of it. Main point being
that the code commented doesn't use any of the *_perf values. It only sets the
"epp" local variable. Maybe the point of the comment is to explain why non of
the *_perf are used here, but I can't read this out of either of the proposed
texts.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.