[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 5/5] x86/mwait-idle: adjust the SKX C6 parameters if PC6 is disabled
On 18.01.2022 11:48, Roger Pau Monné wrote: > On Mon, Sep 06, 2021 at 03:02:12PM +0200, Jan Beulich wrote: >> From: Chen Yu <yu.c.chen@xxxxxxxxx> >> >> Because cpuidle assumes worst-case C-state parameters, PC6 parameters >> are used for describing C6, which is worst-case for requesting CC6. >> When PC6 is enabled, this is appropriate. But if PC6 is disabled >> in the BIOS, the exit latency and target residency should be adjusted >> accordingly. >> >> Exit latency: >> Previously the C6 exit latency was measured as the PC6 exit latency. >> With PC6 disabled, the C6 exit latency should be the one of CC6. >> >> Target residency: >> With PC6 disabled, the idle duration within [CC6, PC6) would make the >> idle governor choose C1E over C6. This would cause low energy-efficiency. >> We should lower the bar to request C6 when PC6 is disabled. >> >> To fill this gap, check if PC6 is disabled in the BIOS in the >> MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency >> for C6 and set target_residency to 3 times of the new exit latency. [This >> is consistent with how intel_idle driver uses _CST to calculate the >> target_residency.] As a result, the OS would be more likely to choose C6 >> over C1E when PC6 is disabled, which is reasonable, because if C6 is >> enabled, it implies that the user cares about energy, so choosing C6 more >> frequently makes sense. >> >> The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC >> wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU >> model number as SkX, but their CC6 exit latencies are similar to the SKX >> one, 96us and 89us respectively, so reuse the SKX value for them. >> >> There is a concern that it might be better to use a more generic approach >> instead of optimizing every platform. However, if the required code >> complexity and different PC6 bit interpretation on different platforms >> are taken into account, tuning the code per platform seems to be an >> acceptable tradeoff. >> >> Link: >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel.github.io%2Fwult%2F&data=04%7C01%7Croger.pau%40citrix.com%7Ccdf115e71eb14429868408d97136a902%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637665301851513469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=D9sIoe%2BFAvia3OuRsT19VAxkTqrlnPHnKqTSKiW6pRM%3D&reserved=0 >> # [1] >> Suggested-by: Len Brown <len.brown@xxxxxxxxx> >> Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx> >> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx> >> [ rjw: Subject and changelog edits ] >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> >> [Linux commit: 64233338499126c5c31e07165735ab5441c7e45a] >> >> Pull in Linux'es MSR_PKG_CST_CONFIG_CONTROL. Alongside the dropping of >> "const" from skx_cstates[] add __read_mostly, and extend that to other >> similar non-const tables. >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > Acked-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> Thanks. >> @@ -1161,6 +1161,36 @@ static void __init sklh_idle_state_table >> } >> >> /* >> + * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake >> + * idle states table. >> + */ >> +static void __init skx_idle_state_table_update(void) >> +{ >> + unsigned long long msr; >> + >> + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr); >> + >> + /* >> + * 000b: C0/C1 (no package C-state support) >> + * 001b: C2 >> + * 010b: C6 (non-retention) >> + * 011b: C6 (retention) >> + * 111b: No Package C state limits. >> + */ >> + if ((msr & 0x7) < 2) { > > I wouldn't mind adding this mask field to msr-index.h. This wouldn't buy us much since - as per the original Linux change - the manifest constant then still wouldn't be used here. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |