|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 06/15] cpufreq: Add Hardware P-State (HWP) driver
On Mon, Jul 10, 2023 at 9:13 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 06.07.2023 20:54, Jason Andryuk wrote:
> > @@ -510,6 +510,22 @@ choice of `dom0-kernel` is deprecated and not
> > supported by all Dom0 kernels.
> > * `<maxfreq>` and `<minfreq>` are integers which represent max and min
> > processor frequencies
> > respectively.
> > * `verbose` option can be included as a string or also as
> > `verbose=<integer>`
> > + for `xen`. It is a boolean for `hwp`.
> > +* `hwp` selects Hardware-Controlled Performance States (HWP) on supported
> > Intel
> > + hardware. HWP is a Skylake+ feature which provides better CPU power
> > + management. The default is disabled. If `hwp` is selected, but hardware
> > + support is not available, Xen will fallback to cpufreq=xen.
> > +* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC). HDC enables
> > the
> > + processor to autonomously force physical package components into idle
> > state.
> > + The default is enabled, but the option only applies when `hwp` is
> > enabled.
> > +
> > +There is also support for `;`-separated fallback options:
> > +`cpufreq=hwp,verbose;xen`. This first tries `hwp` and falls back to `xen`
> > +if unavailable.
>
> In the given example, does "verbose" also apply to the fallback case? If so,
> perhaps better "cpufreq=hwp;xen,verbose", to eliminate that ambiguity?
Yes, "verbose" is applied to both. I can make the change. I
mentioned it in the commit message, but I'll mention it here as well.
> > --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > @@ -642,7 +642,24 @@ static int __init cf_check cpufreq_driver_init(void)
> > switch ( boot_cpu_data.x86_vendor )
> > {
> > case X86_VENDOR_INTEL:
> > - ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> > + unsigned int i;
>
> At the moment we still don't mix declarations and statements, i.e. all
> declarations have to be at the top of a block/scope. What iirc we do use
> in a couple of places (and what hence you may want to do here as well) is
> ...
>
> > + ret = -ENOENT;
> > +
> > + for ( i = 0; i < cpufreq_xen_cnt; i++ )
>
> ... declare the induction variable inside the loop header.
Sounds good, thanks.
> > + {
> > + switch ( cpufreq_xen_opts[i] )
> > + {
> > + case CPUFREQ_xen:
> > + ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> > + break;
> > + case CPUFREQ_hwp:
> > + ret = hwp_register_driver();
> > + break;
> > + }
> > +
> > + if ( ret == 0 )
> > + break;
> > + }
> > break;
>
> In this model any kind of failure results in the fallback to be tried
> (and the fallback's error to be returned to the caller rather than
> the primary one). This may or may not be what we actually want;
> personally I would have expected
>
> if ( ret != -ENODEV )
> break;
>
> or some such instead.
I guess this comes back to our fruit preferences. :)
I can switch it around like that, and make hwp_register_driver()
return -ENODEV for hwp_available() returning false.
> > +static bool hwp_handle_option(const char *s, const char *end)
> > +{
> > + int ret;
> > +
> > + ret = parse_boolean("verbose", s, end);
> > + if ( ret >= 0 ) {
>
> Nit: Style (brace placement).
>
> > + cpufreq_verbose = ret;
> > + return true;
> > + }
> > +
> > + ret = parse_boolean("hdc", s, end);
> > + if ( ret >= 0 ) {
>
> Same here.
Thanks. Sorry about those.
> > + opt_cpufreq_hdc = ret;
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +int __init hwp_cmdline_parse(const char *s, const char *e)
> > +{
> > + do
> > + {
> > + const char *end = strpbrk(s, ",;");
> > +
> > + if ( s && !hwp_handle_option(s, end) )
>
> This check of s not being NULL comes too late, as strpbrk() would have
> de-referenced it already. Considering ...
>
> > + {
> > + printk(XENLOG_WARNING "cpufreq/hwp: option '%s' not
> > recognized\n",
> > + s);
> > +
> > + return -1;
> > + }
> > +
> > + s = end ? ++end : end;
> > + } while ( s && s < e );
>
> ... this it probably wants to move even ahead of the loop.
I'll switch from do/while to just while and then the NULL check will
be covered. In practice, this function is never called with s ==
NULL.
> > +static int hdc_set_pkg_hdc_ctl(unsigned int cpu, bool val)
> > +{
> > + uint64_t msr;
> > +
> > + if ( rdmsr_safe(MSR_PKG_HDC_CTL, msr) )
> > + {
> > + hwp_err(cpu, "rdmsr_safe(MSR_PKG_HDC_CTL)\n");
> > + return -1;
> > + }
> > +
> > + if ( val )
> > + msr |= PKG_HDC_CTL_HDC_PKG_ENABLE;
> > + else
> > + msr &= ~PKG_HDC_CTL_HDC_PKG_ENABLE;
> > +
> > + if ( wrmsr_safe(MSR_PKG_HDC_CTL, msr) )
> > + {
> > + hwp_err(cpu, "wrmsr_safe(MSR_PKG_HDC_CTL): %016lx\n", msr);
> > + return -1;
> > + }
> > +
> > + return 0;
> > +}
>
> Please can you use either boolean return values or proper 0 / -errno
> ones? (Same again then in the subsequent function.)
Sure, I'll use booleans.
> > +static void cf_check hwp_init_msrs(void *info)
> > +{
> > + struct cpufreq_policy *policy = info;
> > + struct hwp_drv_data *data = this_cpu(hwp_drv_data);
> > + uint64_t val;
> > +
> > + /*
> > + * Package level MSR, but we don't have a good idea of packages here,
> > so
> > + * just do it everytime.
> > + */
> > + if ( rdmsr_safe(MSR_PM_ENABLE, val) )
> > + {
> > + hwp_err(policy->cpu, "rdmsr_safe(MSR_PM_ENABLE)\n");
> > + data->curr_req.raw = -1;
> > + return;
> > + }
> > +
> > + /* Ensure we don't generate interrupts */
> > + if ( feature_hwp_notification )
> > + wrmsr_safe(MSR_HWP_INTERRUPT, 0);
> > +
> > + hwp_verbose("CPU%u: MSR_PM_ENABLE: %016lx\n", policy->cpu, val);
> > + if ( !(val & PM_ENABLE_HWP_ENABLE) )
> > + {
> > + val |= PM_ENABLE_HWP_ENABLE;
> > + if ( wrmsr_safe(MSR_PM_ENABLE, val) )
> > + {
> > + hwp_err(policy->cpu, "wrmsr_safe(MSR_PM_ENABLE, %lx)\n", val);
> > + data->curr_req.raw = -1;
> > + return;
> > + }
> > + }
> > +
> > + if ( rdmsr_safe(MSR_HWP_CAPABILITIES, data->hwp_caps) )
> > + {
> > + hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_CAPABILITIES)\n");
> > + goto error;
> > + }
> > +
> > + if ( rdmsr_safe(MSR_HWP_REQUEST, data->curr_req.raw) )
> > + {
> > + hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_REQUEST)\n");
> > + goto error;
> > + }
> > +
> > + /*
> > + * Check for APERF/MPERF support in hardware
> > + * also check for boost/turbo support
> > + */
> > + intel_feature_detect(policy);
> > +
> > + if ( feature_hdc )
> > + {
> > + if ( hdc_set_pkg_hdc_ctl(policy->cpu, opt_cpufreq_hdc) ||
> > + hdc_set_pm_ctl1(policy->cpu, opt_cpufreq_hdc) ) {
>
> Please can these two if()s be joined and the well-placed brace be
> retained?
Sure.
> > + hwp_err(policy->cpu, "Disabling HDC support\n");
> > + feature_hdc = false;
> > + goto error;
>
> Why? Can't you continue just with HDC turned off?
Yes, that is what I intended to implement after your earlier review,
but I failed to actually delete the goto.
> > +static void cf_check hwp_write_request(void *info)
> > +{
> > + const struct cpufreq_policy *policy = info;
> > + struct hwp_drv_data *data = this_cpu(hwp_drv_data);
> > + union hwp_request hwp_req = data->curr_req;
> > +
> > + data->ret = 0;
> > +
> > + BUILD_BUG_ON(sizeof(union hwp_request) != sizeof(hwp_req.raw));
>
> You changed only the right side to not be sizeof(<type>).
Updated. I was just focused on removing the uint64_t from your earlier comment.
> > +static int cf_check hwp_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > + unsigned int cpu = policy->cpu;
> > + struct hwp_drv_data *data;
> > +
> > + data = xzalloc(struct hwp_drv_data);
> > + if ( !data )
> > + return -ENOMEM;
> > +
> > + policy->governor = &cpufreq_gov_hwp;
> > +
> > + per_cpu(hwp_drv_data, cpu) = data;
> > +
> > + on_selected_cpus(cpumask_of(cpu), hwp_init_msrs, policy, 1);
>
> Could I talk you into moving the helper function immediately ahead of
> this (sole) one using it, much like you have it for hwp_cpufreq_target()
> and hwp_write_request()?
Yes. sounds good. I'll move hdc_set_pkg_hdc_ctl(), hdc_set_pm_ctl1(),
hwp_get_cpu_speeds() as well since they are all called by
hwp_init_msrs().
> > + if ( data->curr_req.raw == -1 )
> > + {
> > + hwp_err(cpu, "Could not initialize HWP properly\n");
> > + per_cpu(hwp_drv_data, cpu) = NULL;
> > + xfree(data);
> > + return -ENODEV;
> > + }
> > +
> > + data->minimum = data->curr_req.min_perf;
> > + data->maximum = data->curr_req.max_perf;
> > + data->desired = data->curr_req.desired;
> > + data->energy_perf = data->curr_req.energy_perf;
> > + data->activity_window = data->curr_req.activity_window;
> > +
> > + if ( cpu == 0 )
> > + hwp_verbose("CPU%u: HWP_CAPABILITIES: %016lx\n", cpu,
> > data->hwp_caps);
>
> While I'm fine with this (perhaps apart from you using "cpu == 0",
> which is an idiom we're trying to get rid of), ...
Oh, I didn't know that. What is the preferred way to identify the
BSP? This doesn't necessarily run on the BSP, so "cpu"/"policy->cpu"
is all we have to make a determination.
> > + hwp_verbose("CPU%u: rdmsr HWP_REQUEST %016lx\n", cpu,
> > data->curr_req.raw);
>
> ... this once-per-CPU message still looks to verbose to me. Perhaps
> for both:
> - print for the BSP,
> - print when AP value differs from BSP (albeit I don't know how
> [un]likely that is)?
On my test systems, the values have all been identical. But your
differing values idea seems good.
> > +static int cf_check hwp_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> > +{
> > + struct hwp_drv_data *data = per_cpu(hwp_drv_data, policy->cpu);
> > + per_cpu(hwp_drv_data, policy->cpu) = NULL;
> > + xfree(data);
>
> Nit: Style (blank line between declaration(s) and statement(s) please.
> (Also at least once again below.)
>
> > --- a/xen/drivers/cpufreq/cpufreq.c
> > +++ b/xen/drivers/cpufreq/cpufreq.c
> > @@ -63,12 +63,18 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
> > /* set xen as default cpufreq */
> > enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
> >
> > -static int __init cpufreq_cmdline_parse(const char *s);
> > +enum cpufreq_xen_opt cpufreq_xen_opts[2] = { CPUFREQ_xen, };
> > +unsigned int cpufreq_xen_cnt = 1;
>
> Looks like both can be __initdata?
Yes, thanks.
> As to the array initializer: For one Misra won't like the 2nd slot not
> initialized. Plus the implicit 0 there is nothing else than CPUFREQ_xen,
> which also ends up a little fragile. Perhaps 0 wants to stand for
> CPUFREQ_none (or whatever name you deem appropriate)?
:) I had a CPUFREQ_none originally, but dropped it as there was no
need for one with cpufreq_xen_cnt controlling the iteration. I'll add
it back. (gcc 12 at least complains that the switch in
cpufreq_driver_init() needs to handle CPUFREQ_none, so I'll just have
it return 0 in that case.)
Thanks for the review.
Regards,
Jason
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |