[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 5/5] x86/mwait-idle: adjust the SKX C6 parameters if PC6 is disabled
On Mon, Sep 06, 2021 at 03:02:12PM +0200, Jan Beulich wrote: > From: Chen Yu <yu.c.chen@xxxxxxxxx> > > Because cpuidle assumes worst-case C-state parameters, PC6 parameters > are used for describing C6, which is worst-case for requesting CC6. > When PC6 is enabled, this is appropriate. But if PC6 is disabled > in the BIOS, the exit latency and target residency should be adjusted > accordingly. > > Exit latency: > Previously the C6 exit latency was measured as the PC6 exit latency. > With PC6 disabled, the C6 exit latency should be the one of CC6. > > Target residency: > With PC6 disabled, the idle duration within [CC6, PC6) would make the > idle governor choose C1E over C6. This would cause low energy-efficiency. > We should lower the bar to request C6 when PC6 is disabled. > > To fill this gap, check if PC6 is disabled in the BIOS in the > MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency > for C6 and set target_residency to 3 times of the new exit latency. [This > is consistent with how intel_idle driver uses _CST to calculate the > target_residency.] As a result, the OS would be more likely to choose C6 > over C1E when PC6 is disabled, which is reasonable, because if C6 is > enabled, it implies that the user cares about energy, so choosing C6 more > frequently makes sense. > > The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC > wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU > model number as SkX, but their CC6 exit latencies are similar to the SKX > one, 96us and 89us respectively, so reuse the SKX value for them. > > There is a concern that it might be better to use a more generic approach > instead of optimizing every platform. However, if the required code > complexity and different PC6 bit interpretation on different platforms > are taken into account, tuning the code per platform seems to be an > acceptable tradeoff. > > Link: > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel.github.io%2Fwult%2F&data=04%7C01%7Croger.pau%40citrix.com%7Ccdf115e71eb14429868408d97136a902%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637665301851513469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=D9sIoe%2BFAvia3OuRsT19VAxkTqrlnPHnKqTSKiW6pRM%3D&reserved=0 > # [1] > Suggested-by: Len Brown <len.brown@xxxxxxxxx> > Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx> > Reviewed-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx> > [ rjw: Subject and changelog edits ] > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > [Linux commit: 64233338499126c5c31e07165735ab5441c7e45a] > > Pull in Linux'es MSR_PKG_CST_CONFIG_CONTROL. Alongside the dropping of > "const" from skx_cstates[] add __read_mostly, and extend that to other > similar non-const tables. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Acked-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- a/xen/arch/x86/cpu/mwait-idle.c > +++ b/xen/arch/x86/cpu/mwait-idle.c > @@ -484,7 +484,7 @@ static const struct cpuidle_state bdw_cs > {} > }; > > -static struct cpuidle_state skl_cstates[] = { > +static struct cpuidle_state __read_mostly skl_cstates[] = { > { > .name = "C1-SKL", > .flags = MWAIT2flg(0x00), > @@ -536,7 +536,7 @@ static struct cpuidle_state skl_cstates[ > {} > }; > > -static const struct cpuidle_state skx_cstates[] = { > +static struct cpuidle_state __read_mostly skx_cstates[] = { > { > .name = "C1-SKX", > .flags = MWAIT2flg(0x00), > @@ -674,7 +674,7 @@ static const struct cpuidle_state knl_cs > {} > }; > > -static struct cpuidle_state bxt_cstates[] = { > +static struct cpuidle_state __read_mostly bxt_cstates[] = { > { > .name = "C1-BXT", > .flags = MWAIT2flg(0x00), > @@ -870,9 +870,9 @@ static void auto_demotion_disable(void * > { > u64 msr_bits; > > - rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits); > + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits); > msr_bits &= ~(icpu->auto_demotion_disable_flags); > - wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits); > + wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits); > } > > static void byt_auto_demotion_disable(void *dummy) > @@ -1141,7 +1141,7 @@ static void __init sklh_idle_state_table > if ((mwait_substates & (MWAIT_CSTATE_MASK << 28)) == 0) > return; > > - rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr); > + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr); > > /* PC10 is not enabled in PKG C-state limit */ > if ((msr & 0xF) != 8) > @@ -1161,6 +1161,36 @@ static void __init sklh_idle_state_table > } > > /* > + * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake > + * idle states table. > + */ > +static void __init skx_idle_state_table_update(void) > +{ > + unsigned long long msr; > + > + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr); > + > + /* > + * 000b: C0/C1 (no package C-state support) > + * 001b: C2 > + * 010b: C6 (non-retention) > + * 011b: C6 (retention) > + * 111b: No Package C state limits. > + */ > + if ((msr & 0x7) < 2) { I wouldn't mind adding this mask field to msr-index.h. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |