[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/cpuidle: split the max_cstate variable


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 24 Apr 2026 11:33:08 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2PkjFE5OCtlFkQQ/+7FBEENbQux1J5Go+P3YiSQzl6g=; b=KaM2Z8Q9pEU5Ptwq7QvdI+oh42ogjzCgqlvpNF6jUxPXIV46Fn9YT/dpSz4nrgFZZsrrlnbJNwHbo2554wIYDl4a+AEOG4JYc9+ggRCCyvUUAxFIjHTMwfJapJlZYRnc7nt8dJ5v82vzdBsrePKZGyI+y1lPFR/s/xrpZvyFaiDkRTp6iuybdQei7UQvTBabWa7rdEViUAGifmmAjiP8ax8QyJSIZfavxOMKvGMcEz1h4rUmeOWO8rcWiMBrjX1XliNRE4teSVfYoEosmQNqJQcDsfeMSqdOMbhubAOb8MONUq0MMsi42uYYm1ZzHR6+f0FvREalYMJHwd2spiUu3A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=OqBsug8pmh7flRmteGQ1CQdRL6P2paqj6vVyxzyxFmbT6OqLJ7VVpJsuIExfbUt0Kn/p7UfQCVVXewY4bFXNwCjV/gbjF7o7QpHl4lKjCwT8ShK1hN2yLRs8YvS/wfA2ui6Uw+5qe53+gk2VJ4Uyp083niWkByjHQjB/KvCzj7hTQPz+TbMEsz4InMT8o+lbVzkrJK1IrV0Vq9mElxPgF4KT+ZEUQjexVjte3eENFgYFO90/6RfRbzkIVYPgq84Zik+wejBUbXrNI9PAm7Q21v6e0CWWVfKcuAI0rT0xnZcPIqPZdW3Q4yVXnQVcFm0PsMfBEYfJ7TnaPaBThkW8Iw==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 24 Apr 2026 09:33:24 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Apr 21, 2026 at 09:34:32AM +0200, Jan Beulich wrote:
> On 20.04.2026 18:14, Roger Pau Monné wrote:
> > On Wed, Apr 08, 2026 at 01:34:43PM +0200, Jan Beulich wrote:
> >> @@ -690,18 +694,18 @@ static void cf_check acpi_processor_idle
> >>      u32 exp = 0, pred = 0;
> >>      u32 irq_traced[4] = { 0 };
> >>  
> >> -    if ( max_cstate > 0 && power &&
> >> +    if ( max_cstate() > 0 && power &&
> >>           (next_state = cpuidle_current_governor->select(power)) > 0 )
> >>      {
> >>          unsigned int max_state = sched_has_urgent_vcpu() ? ACPI_STATE_C1
> >> -                                                         : max_cstate;
> >> +                                                         : max_cstate();
> >>  
> >>          do {
> >>              cx = &power->states[next_state];
> >>          } while ( (cx->type > max_state ||
> >>                     cx->entry_method == ACPI_CSTATE_EM_NONE ||
> >>                     (cx->entry_method == ACPI_CSTATE_EM_FFH &&
> >> -                    cx->type == max_cstate &&
> >> +                    cx->type == max_allowed_cstate &&
> > 
> > I'm afraid I'm missing why this uses max_allowed_cstate instead of
> > max_state.
> 
> max_allowed_cstate is what needs using along with ...
> 
> >>                      (cx->address & MWAIT_SUBSTATE_MASK) > max_csubstate)) 
> >> &&
> 
> ... max_csubstate, as both are driven by the "max_cstate=" command line
> option. Renaming max_csubstate to max_allowed_csubstate would be an
> option, but would incure yet more churn.
> 
> >> --- a/xen/arch/x86/cpu/mwait-idle.c
> >> +++ b/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -1045,15 +1045,16 @@ static void cf_check mwait_idle(void)
> >>    u64 before, after;
> >>    u32 exp = 0, pred = 0, irq_traced[4] = { 0 };
> >>  
> >> -  if (max_cstate > 0 && power &&
> >> +  if (max_cstate() > 0 && power &&
> >>        (next_state = cpuidle_current_governor->select(power)) > 0) {
> >>            unsigned int max_state = sched_has_urgent_vcpu() ? ACPI_STATE_C1
> >> -                                                           : max_cstate;
> >> +                                                           : max_cstate();
> >>  
> >>            do {
> >>                    cx = &power->states[next_state];
> >> -          } while ((cx->type > max_state || (cx->type == max_cstate &&
> >> -                    MWAIT_HINT2SUBSTATE(cx->address) > max_csubstate)) &&
> >> +          } while ((cx->type > max_state ||
> >> +                          (cx->type == max_allowed_cstate &&
> > 
> > Indentation is weird for the above line IMO, you should use hard 3
> > tabs plus spaces afterwards, like the surrounding indentation?
> 
> Ouch, indeed.
> 
> >> +                     MWAIT_HINT2SUBSTATE(cx->address) > max_csubstate)) &&
> >>                     --next_state);
> >>            if (!next_state)
> >>                    cx = NULL;
> > 
> > Seeing max_cstate() is used in multiple places here, you might want to
> > introduce a local max_cstate variable?
> 
> Except that Misra doesn't like such naming, and any other name would feel
> odd to use.
> 
> >> --- a/xen/include/xen/acpi.h
> >> +++ b/xen/include/xen/acpi.h
> >> @@ -142,30 +142,33 @@ int acpi_gsi_to_irq (u32 gsi, unsigned i
> >>  
> >>  #ifdef    CONFIG_ACPI_CSTATE
> >>  /*
> >> - * max_cstate sets the highest legal C-state.
> >> - * max_cstate = 0: C0 okay, but not C1
> >> - * max_cstate = 1: C1 okay, but not C2
> >> - * max_cstate = 2: C2 okay, but not C3 etc.
> >> -
> >> - * max_csubstate sets the highest legal C-state sub-state. Only applies 
> >> to the
> >> - * highest legal C-state.
> >> - * max_cstate = 1, max_csubstate = 0 ==> C0, C1 okay, but not C1E
> >> - * max_cstate = 1, max_csubstate = 1 ==> C0, C1 and C1E okay, but not C2
> >> - * max_cstate = 2, max_csubstate = 0 ==> C0, C1, C1E, C2 okay, but not C3
> >> - * max_cstate = 2, max_csubstate = 1 ==> C0, C1, C1E, C2 okay, but not C3
> >> + * max_{allowed,usable}_cstate sets the highest allowed / usable C-state, 
> >> where
> >> + * "allowed" is command line / sysctl based.
> > 
> > Hm, this is a bit misleading, because max_usable_cstate is also
> > command line based (plus system errata).  What about:
> > 
> > "max_{allowed,usable}_cstate sets the highest allowed / usable C-state.
> > max_usable_cstate can only be set from the command line, while
> > max_allowed_cstate can be set from both command line and systcl."
> 
> Well. While I think I get your point, what I'm trying to get across is that
> max_usable_cstate is internally controlled (bounded by command line setting
> of max_allowed_cstate, but possibly forced lower than that internally). So
> maybe
> 
> "max_{allowed,usable}_cstate sets the highest allowed / usable C-state.
>  max_usable_cstate, while affected by the command line, is internally driven,
>  whereas max_allowed_cstate can be set from both command line and systcl."
> 
> ?

Sure LGTM.

> >> + * max_*_cstate = 0: C0 okay, but not C1
> >> + * max_*_cstate = 1: C1 okay, but not C2
> >> + * max_*_cstate = 2: C2 okay, but not C3 etc.
> >> + *
> >> + * max_csubstate sets the highest allowed C-state sub-state. Only applies 
> >> to
> >> + * the highest allowed C-state.
> >> + * max_allowed_cstate = 1, max_csubstate = 0 ==> C0, C1 okay, but not C1E
> >> + * max_allowed_cstate = 1, max_csubstate = 1 ==> C0, C1 and C1E okay, but 
> >> not C2
> >> + * max_allowed_cstate = 2, max_csubstate = 0 ==> C0, C1, C1E, C2 okay, 
> >> but not C3
> >> + * max_allowed_cstate = 2, max_csubstate = 1 ==> C0, C1, C1E, C2 okay, 
> >> but not C3
> >>   */
> >>  
> >> -extern unsigned int max_cstate;
> >> +extern unsigned int max_usable_cstate;
> >> +extern unsigned int max_allowed_cstate;
> >>  extern unsigned int max_csubstate;
> >>  
> >> +#define max_cstate() min(max_usable_cstate, max_allowed_cstate)
> > 
> > I would be tempted to drop the ending parenthesis so that you don't
> > need to adjust callers, but that's likely misleading, as then it would
> > need to be uppercase MAX_CSTATE.
> 
> I deliberately want to have the parentheses, to make sure all uses of
> max_cstate (without the parentheses) have been covered (by converting in
> whatever appropriate way). Which extends to possible backports. In a
> subsequent, not to be backported commit we could drop them again if so
> desired.
> 
> >>  static inline unsigned int acpi_get_cstate_limit(void)
> >>  {
> >> -  return max_cstate;
> >> +  return max_allowed_cstate;
> >>  }
> >>  static inline void acpi_set_cstate_limit(unsigned int new_limit)
> >>  {
> >> -  max_cstate = new_limit;
> >> -  return;
> >> +  max_allowed_cstate = new_limit;
> > 
> > Do we want to check the new limit doesn't exceed max_usable_cstate and
> > return -ERANGE or similar on failure?
> > 
> > After this change it's a bit weird to silently ignore invalid values
> > IMO.
> 
> I disagree. Those values may be valid, just not usable (i.e. they are
> still a valid upper bound, but we'd never go as high up). If people wanted
> to use the same settings across their fleet, undue (and confusing) errors
> might result on some of their systems if we did as you suggest. Plus we
> have always accepted arbitrarily large (and hence entirely meaningless)
> values anyway.

OK, fair enough.  With the minor above adjustments (indentation and
comment):

Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.