[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for cpufreq scaling


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: "Penny, Zheng" <penny.zheng@xxxxxxx>
  • Date: Thu, 27 Feb 2025 06:53:53 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gVkitbJsKQoQLTTlQPgS7n8Cmo78g4N7NvRa0cmyOv4=; b=GUAKkfDlfV9J+bGb0AgB/iKeLIJQ9+aTVpId8lDbiz9VqUatvx7mdpP2ZIOY5d0vTeDzYSYtJiAj2KPcInoTHz11f51JgIIJkVBNQOsTVrvOhJvKoz86luFxvEB0ys3khpjnTcah57QEPhTihIahcg2qy7YJN5u24o5dWeQQ/vSLiBBpFLuMOUj8DjolWo1F8Hb24cqOelIA1HAacN1HbSTUbLaOMAqBMSfBdLTqO962PFD5UrzR2/qaBuHdB+ZbpnKod9ajsqdKUex3zPcisexc2jkfWDCJNa/YvB43VXpxIS7Cs5Hg8EVQUaxa/KcOYXMYLOqnPnupRGr3wC3ahA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=N5NtBCPM9ry2gvjTCBZROTYJD2GVmDIlWv4bNIdZe1XefRbyUgrcGp6XDnvx/w2fT9YX+fZ4OLQ9U0XI66cN0JDqjITIyCBzrcKolKjnCP2/d/I3Z/8ds/rh+kTB21Iy9gYyYfSxvlvs5Tuu0/+oU3FBI7DZwQhPkdJk1iZkZ3DabsvKGEsnHHiYNaiczdyDcs7oixnsyCFVdbfINfUa/Wfou7/bz+kv6hFi9CQUYZdlVyfJtvIT8caaIzIY7fob43CRnoy1RpNJOJI7gMuRjxMb/8LDq/XPt3gvqT+TvVqZXn6web86Y6vk9MvXGy3+Hc9qP8EgijVBOJe9GAwXyg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: "Huang, Ray" <Ray.Huang@xxxxxxx>, "Andryuk, Jason" <Jason.Andryuk@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 27 Feb 2025 06:54:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Msip_labels: MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_ActionId=058cab87-2639-4e39-b779-31cdd51ff91a;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_ContentBits=0;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Enabled=true;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Method=Privileged;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Name=Open Source;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_SetDate=2025-02-27T06:23:10Z;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Tag=10, 0, 1, 1;
  • Thread-index: AQHbeHHP6s77OhtOyESwm9UCfT4CR7NCV/0AgBh3PRA=
  • Thread-topic: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for cpufreq scaling

[Public]

Hi,

Sorry for the confidential header before. I finally learned how to remove 
them...

> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Wednesday, February 12, 2025 12:46 AM
> To: Penny, Zheng <penny.zheng@xxxxxxx>
> Cc: Huang, Ray <Ray.Huang@xxxxxxx>; Andryuk, Jason
> <Jason.Andryuk@xxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>;
> Roger Pau Monné <roger.pau@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for
> cpufreq scaling
>
> On 06.02.2025 09:32, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +}
> > +
> > +static int cf_check amd_cppc_write_request(int cpu, uint8_t min_perf,
> > +                                           uint8_t des_perf, uint8_t
> > +max_perf) {
> > +    struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> > +    uint64_t prev = data->req.raw;
> > +
> > +    data->req.min_perf = min_perf;
> > +    data->req.max_perf = max_perf;
> > +    data->req.des_perf = des_perf;
> > +
> > +    if ( prev == data->req.raw )
> > +        return 0;
> > +
> > +    on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs,
> > + data, 1);
> > +
> > +    return data->err;
> > +}
>
> ... in this function. Then the field would be written to (and the cacheline 
> change
> ownership) only in the error case.
>
> As to the function's parameters - why's there a plain int?
>

Are you asking why "int cpu" here?

> > +    /* Package level MSR */
> > +    if ( rdmsr_safe(MSR_AMD_CPPC_ENABLE, val) )
> > +    {
> > +        amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_ENABLE)\n");
> > +        goto err;
> > +    }
> > +
> > +    /*
> > +     * Only when Enable bit is on, the hardware will calculate the 
> > processor’s
> > +     * performance capabilities and initialize the performance level 
> > fields in
> > +     * the CPPC capability registers.
> > +     */
> > +    if ( !(val & AMD_CPPC_ENABLE) )
> > +    {
> > +        val |= AMD_CPPC_ENABLE;
> > +        if ( wrmsr_safe(MSR_AMD_CPPC_ENABLE, val) )
> > +        {
> > +            amd_cppc_err(policy->cpu,
> "wrmsr_safe(MSR_AMD_CPPC_ENABLE, %lx)\n", val);
> > +            goto err;
> > +        }
> > +    }
> > +
> > +    if ( rdmsr_safe(MSR_AMD_CPPC_CAP1, data->caps.raw) )
> > +    {
> > +        amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_CAP1)\n");
> > +        goto err;
> > +    }
> > +
> > +    if ( data->caps.highest_perf == 0 || data->caps.lowest_perf == 0 ||
> > +         data->caps.nominal_perf == 0 || data->caps.lowest_nonlinear_perf 
> > == 0 )
> > +    {
> > +        amd_cppc_err(policy->cpu,
> > +                     "Platform malfunction, read CPPC highest_perf: %u,
> lowest_perf: %u, nominal_perf: %u, lowest_nonlinear_perf: %u zero value\n",
> > +                     data->caps.highest_perf, data->caps.lowest_perf,
> > +                     data->caps.nominal_perf, 
> > data->caps.lowest_nonlinear_perf);
> > +        goto err;
> > +    }
> > +
> > +    data->err = amd_get_min_freq(data, &min_freq);
> > +    if ( data->err )
> > +        return;
> > +
> > +    data->err = amd_get_nominal_freq(data, &nominal_freq);
> > +    if ( data->err )
> > +        return;
> > +
> > +    data->err = amd_get_max_freq(data, &max_freq);
> > +    if ( data->err )
> > +        return;
> > +
> > +    if ( min_freq > max_freq )
> > +    {
> > +        amd_cppc_err(policy->cpu, "min_freq(%u) or max_freq(%u) value is
> incorrect\n",
> > +                     min_freq, max_freq);
> > +        goto err;
> > +    }
>
> And nominal freq not being between the two is okay?
>
> > +    policy->min = min_freq;
> > +    policy->max = max_freq;
> > +
> > +    policy->cpuinfo.min_freq = min_freq;
> > +    policy->cpuinfo.max_freq = max_freq;
> > +    policy->cpuinfo.perf_freq = nominal_freq;
> > +    policy->cur = nominal_freq;
>
> How do you know this is correct? Or does it simply not matter at this point?
>

I need policy->cur to be set for correctly using ondemand governor.
As we don't have any MSR register to reflect runtime perf/freq value under CPPC 
control,
I will suggest to use APERF/MPERF average frequency as current freq here.

> > +    /* Initial processor data capability frequencies */
> > +    data->min_freq = min_freq;
> > +    data->nominal_freq = nominal_freq;
> > +    data->max_freq = max_freq;
>
> Is this data duplication actually needed? Can't data, if necessary, simply 
> have a
> back pointer to the policy for the CPU?
>
> Actually, aren't two of the three values already accessible through the 
> cppc_data
> pointer hanging off of data? Which in turn makes me wonder why you go through
> the effort of calculating those values.
>

cppc_data->lowest/nominal frequency comes from ACPI _CPC entry, and they are
optional fields. So we need to calculate them when unavailable.
We need them to set policy->min/max, which are referred in ondemand governor.
But you're right, at least in this patch, we are not using 
data->min/max/nominal anywhere

> > + err:
> > +    data->err = -EINVAL;
> > +}
>
> At this point you may have set the enable bit already, which you can't undo.
> What effect will this have on the system when the error path is taken here?
> Especially if we then try to fall back to another driver?
>

On current code logic, when the error path is taken, amd-cppc cpufreq driver 
fails
to initialize. And we will also not fall back to another driver.
As we could register *only one* cpufreq driver. If amd-cppc is registered 
properly
before, then legacy P-states will not get registered.
In amd-cppc registration code:
```
+int __init amd_cppc_register_driver(void)
+{
+    if ( !cpu_has_cppc )
+        return -ENODEV;
+
+    return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+}
```
CPPC feature MSR gets checked before the registration, which is to check 
whether the
hardware has CPPC msr support.

> Jan

Many thanks,
Penny

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.