|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> On 27.04.2022 14:45, Roger Pau Monné wrote:
> > On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >> From: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
> >>
> >> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> >> exclusive - only one of them can be enabled. By default, 'intel_idle'
> >> driver
> >> enables C1 and disables C1E. However, some users prefer to use C1E instead
> >> of
> >> C1, because it saves more energy.
> >>
> >> This patch adds a new module parameter ('preferred_cstates') for enabling
> >> C1E
> >> and disabling C1. Here is the idea behind it.
> >>
> >> 1. This option has effect only for "mutually exclusive" C-states like C1
> >> and
> >> C1E on SPR.
> >> 2. It does not have any effect on independent C-states, which do not
> >> require
> >> other C-states to be disabled (most states on most platforms as of
> >> today).
> >> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
> >> reasonable default, such as enabling C1 on SPR by default. On other
> >> platforms, the default may be different.
> >> 4. Users can override the default using the 'preferred_cstates' parameter.
> >> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
> >> existing 'states_off' parameter.
> >> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
> >> other mutually exclusive C-states, if they come in the future.
> >>
> >> Today 'intel_idle' can only be compiled-in, which means that on SPR, in
> >> order
> >> to disable C1 and enable C1E, users should boot with the following kernel
> >> argument: intel_idle.preferred_cstates=4
> >>
> >> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >> da0e58c038e6
> >>
> >> Enable C1E (if requested) not only on the BSP's socket / package.
> >
> > Maybe we should also add a note here that the command line option for
> > Xen is preferred-cstates instead of intel_idle.preferred_cstates?
> >
> > I think this is a bad interface however, we should have a more generic
> > option (ie: cstate-mode = 'performance | powersave') so that users
> > don't have to fiddle with model specific C state masks.
>
> Performance vs powersave doesn't cover it imo, especially if down
> the road more states would appear which can be controlled this way.
> I don't think there's a way around providing _some_ way to control
> things one a per-state level. When porting this over, I too didn't
> like this interface very much, but I had no good replacement idea.
I think it's fine to have this more fine grained control of C states,
but it doesn't seem practical from a user (or distro) PoV. But then I
also wonder how much of a difference this will make regarding power
consumption.
> >> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>
> >> static unsigned int mwait_substates;
> >>
> >> +/*
> >> + * Some platforms come with mutually exclusive C-states, so that if one is
> >> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >> + * Sapphire Rapids platform. This parameter allows for selecting the
> >> + * preferred C-states among the groups of mutually exclusive C-states -
> >> the
> >> + * selected C-states will be registered, the other C-states from the
> >> mutually
> >> + * exclusive group won't be registered. If the platform has no mutually
> >> + * exclusive C-states, this parameter has no effect.
> >> + */
> >> +static unsigned int __ro_after_init preferred_states_mask;
> >> +integer_param("preferred-cstates", preferred_states_mask);
> >> +
> >> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >> static unsigned int lapic_timer_reliable_states = (1 << 1);
> >> @@ -96,6 +108,7 @@ struct idle_cpu {
> >> unsigned long auto_demotion_disable_flags;
> >> bool byt_auto_demotion_disable_flag;
> >> bool disable_promotion_to_c1e;
> >> + bool enable_promotion_to_c1e;
> >
> > I'm confused by those fields, shouldn't we just have:
> > promotion_to_c1e = true | false?
> >
> > As one field is the negation of the other:
> > enable_promotion_to_c1e = !disable_promotion_to_c1e
> >
> > I know this is code from Linux, but would like to understand why two
> > fields are needed.
>
> This really is a tristate; Linux is now changing their global variable
> to an enum, but we don't have an equivalent of that global variable.
So it would be: leave default, disable C1E promotion, enable C1E
promotion.
And Linux is leaving the {disable,enable}_promotion_to_c1e in
idle_cpu?
I guess there's not much we can do unless we want to diverge from
upstream.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |