[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 4/5] x86/mwait-idle: disable IBRS during long idle
On Thu, Aug 18, 2022 at 03:04:51PM +0200, Jan Beulich wrote: > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > Having IBRS enabled while the SMT sibling is idle unnecessarily slows > down the running sibling. OTOH, disabling IBRS around idle takes two > MSR writes, which will increase the idle latency. > > Therefore, only disable IBRS around deeper idle states. Shallow idle > states are bounded by the tick in duration, since NOHZ is not allowed > for them by virtue of their short target residency. > > Only do this for mwait-driven idle, since that keeps interrupts disabled > across idle, which makes disabling IBRS vs IRQ-entry a non-issue. > > Note: C6 is a random threshold, most importantly C1 probably shouldn't > disable IBRS, benchmarking needed. > > Suggested-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > Signed-off-by: Borislav Petkov <bp@xxxxxxx> > Reviewed-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> > Signed-off-by: Borislav Petkov <bp@xxxxxxx> > Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > bf5835bcdb96 > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Acked-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> One unrelated comment below. > --- > v3: New. > > --- a/xen/arch/x86/cpu/mwait-idle.c > +++ b/xen/arch/x86/cpu/mwait-idle.c > @@ -141,6 +141,12 @@ static const struct cpuidle_state { > #define CPUIDLE_FLAG_TLB_FLUSHED 0x10000 > > /* > + * Disable IBRS across idle (when KERNEL_IBRS), is exclusive vs IRQ_ENABLE > + * above. > + */ > +#define CPUIDLE_FLAG_IBRS 0x20000 > + > +/* > * MWAIT takes an 8-bit "hint" in EAX "suggesting" > * the C-state (top nibble) and sub-state (bottom nibble) > * 0x00 means "MWAIT(C1)", 0x10 means "MWAIT(C2)" etc. > @@ -530,31 +536,31 @@ static struct cpuidle_state __read_mostl > }, > { > .name = "C6", > - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, > + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | > CPUIDLE_FLAG_IBRS, > .exit_latency = 85, > .target_residency = 200, > }, > { > .name = "C7s", > - .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED, > + .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED | > CPUIDLE_FLAG_IBRS, > .exit_latency = 124, > .target_residency = 800, > }, > { > .name = "C8", > - .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED, > + .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED | > CPUIDLE_FLAG_IBRS, > .exit_latency = 200, > .target_residency = 800, > }, > { > .name = "C9", > - .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED, > + .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED | > CPUIDLE_FLAG_IBRS, > .exit_latency = 480, > .target_residency = 5000, > }, > { > .name = "C10", > - .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, > + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED | > CPUIDLE_FLAG_IBRS, > .exit_latency = 890, > .target_residency = 5000, > }, > @@ -576,7 +582,7 @@ static struct cpuidle_state __read_mostl > }, > { > .name = "C6", > - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, > + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | > CPUIDLE_FLAG_IBRS, > .exit_latency = 133, > .target_residency = 600, > }, > @@ -906,6 +912,7 @@ static const struct cpuidle_state snr_cs > static void cf_check mwait_idle(void) > { > unsigned int cpu = smp_processor_id(); > + struct cpu_info *info = get_cpu_info(); > struct acpi_processor_power *power = processor_powers[cpu]; > struct acpi_processor_cx *cx = NULL; > unsigned int next_state; > @@ -932,8 +939,6 @@ static void cf_check mwait_idle(void) > pm_idle_save(); > else > { > - struct cpu_info *info = get_cpu_info(); > - > spec_ctrl_enter_idle(info); > safe_halt(); > spec_ctrl_exit_idle(info); Do we need to disable speculation just for the hlt if there's no C state change? It would seem to me like the MSR writes could add a lot of latency when there's no C state change. Maybe I'm confused. Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |