[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen master] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode



commit 94411808fbaaa8185993d9c29a2eb551e23cc148
Author:     Penny Zheng <Penny.Zheng@xxxxxxx>
AuthorDate: Thu Sep 25 09:20:52 2025 +0200
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Thu Sep 25 09:20:52 2025 +0200

    xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode
    
    amd-cppc has 2 operation modes: autonomous (active) mode and
    non-autonomous (passive) mode.
    In active mode, we don't need Xen governor to calculate and tune the cpu
    frequency, while hardware built-in CPPC power algorithm will calculate the
    runtime workload and adjust cores frequency automatically according to the
    power supply, thermal, core voltage and some other hardware conditions.
    In active mode, CPPC ignores requests done in the desired performance field,
    and takes into account only the values set to the minimum performance, 
maximum
    performance, and energy performance preference registers.
    
    A new field EPP (energy performance preference), in CPPC request register, 
is
    introduced. It will be used in the CCLK DPM controller to drive the 
frequency
    that a core is going to operate during short periods of activity, called
    minimum active frequency, It could contatin a range of values from 0 to 
0xff.
    An EPP of zero sets the min active frequency to maximum frequency, while
    an EPP of 0xff sets the min active frequency to approxiately Idle frequency.
    
    We implement a new AMD CPU frequency driver `amd-cppc-epp` for active mode.
    It requires `active` tag in Xen cmdline for users to explicitly select 
active
    mode.
    In driver `active-cppc-epp`, ->setpolicy() is hooked, not the ->target(), as
    it does not depend on xen governor to do performance tuning.
    
    We also introduce a new field "policy" (CPUFREQ_POLICY_xxx) to represent
    performance policy. Right now, it supports three values:
    CPUFREQ_POLICY_PERFORMANCE as maximum performance, CPUFREQ_POLICY_POWERSAVE
    as the least power consumption, and CPUFREQ_POLICY_ONDEMAND as no 
preference,
    just corresponding to "performance", "powersave" and "ondemand" Xen 
governor,
    which benefit users from re-using "governor" in Xen cmdline to deliver
    which performance policy they want to apply.
    
    Signed-off-by: Penny Zheng <Penny.Zheng@xxxxxxx>
    Acked-by: Jan Beulich <jbeulich@xxxxxxxx>
---
 docs/misc/xen-command-line.pandoc    |   9 ++-
 xen/arch/x86/acpi/cpufreq/amd-cppc.c | 135 +++++++++++++++++++++++++++++++++--
 xen/drivers/cpufreq/utility.c        |  15 ++++
 xen/include/acpi/cpufreq/cpufreq.h   |  18 +++++
 xen/include/public/sysctl.h          |   1 +
 5 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 518e42d965..28a98321c7 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, 
rather than detectin
 available support.
 
 ### cpufreq
-> `= none | {{ <boolean> | xen } { 
[:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } 
[,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
+> `= none | {{ <boolean> | xen } { 
[:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } 
[,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | 
amd-cppc[:[active][,verbose]]`
 
 > Default: `xen`
 
@@ -537,6 +537,13 @@ choice of `dom0-kernel` is deprecated and not supported by 
all Dom0 kernels.
 * `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
   on supported AMD hardware to provide finer grained frequency control
   mechanism. The default is disabled.
+* `active` is a boolean to enable amd-cppc driver in active(autonomous) mode.
+  In this mode, users don't rely on Xen governor to do performance monitoring
+  and tuning. Hardware built-in CPPC power algorithm will calculate the runtime
+  workload and adjust cores frequency automatically according to the power
+  supply, thermal, core voltage and some other hardware conditions.
+  The default is disabled, and the option only applies when `amd-cppc` is
+  enabled.
 
 There is also support for `;`-separated fallback options:
 `cpufreq=hwp;xen,verbose`.  This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c 
b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 5b99b86fb7..bb7f4e4a9e 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -67,9 +67,14 @@
  * max_perf.
  * Field des_perf conveys performance level Xen governor is requesting. And it
  * may be set to any performance value in the range [min_perf, max_perf],
- * inclusive.
+ * inclusive. In active mode, des_perf must be zero.
  * Field epp represents energy performance preference, which only has meaning
- * when active mode is enabled.
+ * when active mode is enabled. The EPP is used in the CCLK DPM controller
+ * to drive the frequency that a core is going to operate during short periods
+ * of activity, called minimum active frequency, It could contatin a range of
+ * values from 0 to 0xff. An EPP of zero sets the min active frequency to
+ * maximum frequency, while an EPP of 0xff sets the min active frequency to
+ * approxiately Idle frequency.
  */
 struct amd_cppc_drv_data
 {
@@ -106,6 +111,12 @@ static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data 
*,
  */
 static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
 static DEFINE_PER_CPU_READ_MOSTLY(uint8_t, epp_init);
+#ifndef NDEBUG
+static bool __ro_after_init opt_active_mode;
+#else
+static bool __initdata opt_active_mode;
+#endif
+
 
 static bool __init amd_cppc_handle_option(const char *s, const char *end)
 {
@@ -118,6 +129,13 @@ static bool __init amd_cppc_handle_option(const char *s, 
const char *end)
         return true;
     }
 
+    ret = parse_boolean("active", s, end);
+    if ( ret >= 0 )
+    {
+        opt_active_mode = ret;
+        return true;
+    }
+
     return false;
 }
 
@@ -270,6 +288,7 @@ static void amd_cppc_write_request(unsigned int cpu, 
uint8_t min_perf,
 
     data->req.min_perf = min_perf;
     data->req.max_perf = max_perf;
+    ASSERT(!opt_active_mode || !des_perf);
     data->req.des_perf = des_perf;
     data->req.epp = epp;
 
@@ -417,7 +436,7 @@ static int cf_check amd_cppc_cpufreq_cpu_exit(struct 
cpufreq_policy *policy)
     return 0;
 }
 
-static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+static int amd_cppc_cpufreq_init_perf(struct cpufreq_policy *policy)
 {
     unsigned int cpu = policy->cpu;
     struct amd_cppc_drv_data *data;
@@ -450,12 +469,103 @@ static int cf_check amd_cppc_cpufreq_cpu_init(struct 
cpufreq_policy *policy)
 
     amd_cppc_boost_init(policy, data);
 
+    return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+    int ret;
+
+    ret = amd_cppc_cpufreq_init_perf(policy);
+    if ( ret )
+        return ret;
+
     amd_cppc_verbose(policy->cpu,
                      "CPU initialized with amd-cppc passive mode\n");
 
     return 0;
 }
 
+static int cf_check amd_cppc_epp_cpu_init(struct cpufreq_policy *policy)
+{
+    int ret;
+
+    ret = amd_cppc_cpufreq_init_perf(policy);
+    if ( ret )
+        return ret;
+
+    policy->policy = cpufreq_policy_from_governor(policy->governor);
+
+    amd_cppc_verbose(policy->cpu,
+                     "CPU initialized with amd-cppc active mode\n");
+
+    return 0;
+}
+
+static void amd_cppc_prepare_policy(struct cpufreq_policy *policy,
+                                    uint8_t *max_perf, uint8_t *min_perf,
+                                    uint8_t *epp)
+{
+    const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
+                                                   policy->cpu);
+
+    /*
+     * On default, set min_perf with lowest_nonlinear_perf, and max_perf
+     * with the highest, to ensure performance scaling in P-states range.
+     */
+    *max_perf = data->caps.highest_perf;
+    *min_perf = data->caps.lowest_nonlinear_perf;
+
+    /*
+     * In policy CPUFREQ_POLICY_PERFORMANCE, increase min_perf to
+     * highest_perf to achieve ultmost performance.
+     * In policy CPUFREQ_POLICY_POWERSAVE, decrease max_perf to
+     * lowest_nonlinear_perf to achieve ultmost power saving.
+     * Set governor only to help print proper policy info to users.
+     */
+    switch ( policy->policy )
+    {
+    case CPUFREQ_POLICY_PERFORMANCE:
+        /* Force the epp value to be zero for performance policy */
+        *epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+        *min_perf = *max_perf;
+        policy->governor = &cpufreq_gov_performance;
+        break;
+
+    case CPUFREQ_POLICY_POWERSAVE:
+        /* Force the epp value to be 0xff for powersave policy */
+        *epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+        *max_perf = *min_perf;
+        policy->governor = &cpufreq_gov_powersave;
+        break;
+
+    case CPUFREQ_POLICY_ONDEMAND:
+        /*
+         * Set epp with medium value to show no preference over performance
+         * or powersave
+         */
+        *epp = CPPC_ENERGY_PERF_BALANCE;
+        policy->governor = &cpufreq_gov_dbs;
+        break;
+
+    default:
+        *epp = per_cpu(epp_init, policy->cpu);
+        break;
+    }
+}
+
+static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
+{
+    uint8_t max_perf, min_perf, epp;
+
+    amd_cppc_prepare_policy(policy, &max_perf, &min_perf, &epp);
+
+    amd_cppc_write_request(policy->cpu, min_perf,
+                           0 /* no des_perf in active mode */,
+                           max_perf, epp);
+    return 0;
+}
+
 static const struct cpufreq_driver __initconst_cf_clobber
 amd_cppc_cpufreq_driver =
 {
@@ -466,10 +576,27 @@ amd_cppc_cpufreq_driver =
     .exit   = amd_cppc_cpufreq_cpu_exit,
 };
 
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_epp_driver =
+{
+    .name       = XEN_AMD_CPPC_EPP_DRIVER_NAME,
+    .verify     = amd_cppc_cpufreq_verify,
+    .setpolicy  = amd_cppc_epp_set_policy,
+    .init       = amd_cppc_epp_cpu_init,
+    .exit       = amd_cppc_cpufreq_cpu_exit,
+};
+
 int __init amd_cppc_register_driver(void)
 {
+    int ret;
+
     if ( !cpu_has_cppc )
         return -ENODEV;
 
-    return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+    if ( opt_active_mode )
+        ret = cpufreq_register_driver(&amd_cppc_epp_driver);
+    else
+        ret = cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+
+    return ret;
 }
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 987c3b5929..e2cc9ff2af 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -250,6 +250,7 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
     data->min = policy->min;
     data->max = policy->max;
     data->limits = policy->limits;
+    data->policy = policy->policy;
     if (cpufreq_driver.setpolicy)
         return alternative_call(cpufreq_driver.setpolicy, data);
 
@@ -281,3 +282,17 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
 
     return __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
 }
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov)
+{
+    if ( !strncmp(gov->name, "performance", CPUFREQ_NAME_LEN) )
+        return CPUFREQ_POLICY_PERFORMANCE;
+
+    if ( !strncmp(gov->name, "powersave", CPUFREQ_NAME_LEN) )
+        return CPUFREQ_POLICY_POWERSAVE;
+
+    if ( !strncmp(gov->name, "ondemand", CPUFREQ_NAME_LEN) )
+        return CPUFREQ_POLICY_ONDEMAND;
+
+    return CPUFREQ_POLICY_UNKNOWN;
+}
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 5d4881eea8..9ef7c4683a 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -81,6 +81,7 @@ struct cpufreq_policy {
     int8_t              turbo;  /* tristate flag: 0 for unsupported
                                  * -1 for disable, 1 for enabled
                                  * See CPUFREQ_TURBO_* below for defines */
+    unsigned int        policy; /* CPUFREQ_POLICY_* */
 };
 DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy);
 
@@ -131,6 +132,23 @@ extern int cpufreq_register_governor(struct 
cpufreq_governor *governor);
 extern struct cpufreq_governor *__find_governor(const char *governor);
 #define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
 
+/*
+ * Performance Policy
+ * If cpufreq_driver->target() exists, the ->governor decides what frequency
+ * within the limits is used. If cpufreq_driver->setpolicy() exists, these
+ * following policies are available:
+ * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
+ * CPUFREQ_POLICY_POWERSAVE represents least power consumption
+ * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
+ * powersave
+ */
+#define CPUFREQ_POLICY_UNKNOWN      0
+#define CPUFREQ_POLICY_POWERSAVE    1
+#define CPUFREQ_POLICY_PERFORMANCE  2
+#define CPUFREQ_POLICY_ONDEMAND     3
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
+
 /* pass a target to the cpufreq driver */
 extern int __cpufreq_driver_target(struct cpufreq_policy *policy,
                                    unsigned int target_freq,
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index aa29a5401c..eb3a23b038 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -454,6 +454,7 @@ struct xen_set_cppc_para {
 };
 
 #define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
+#define XEN_AMD_CPPC_EPP_DRIVER_NAME "amd-cppc-epp"
 #define XEN_HWP_DRIVER_NAME "hwp"
 
 /*
--
generated by git-patchbot for /home/xen/git/xen.git#master



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.