|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm
On 03/07/2025 5:36 pm, Roger Pau Monné wrote:
> On Wed, Jul 02, 2025 at 03:41:19PM +0100, Andrew Cooper wrote:
>> In order elide IPIs, we must be able to identify whether a target CPU is in
>> MWAIT at the point it is woken up. i.e. the store to wake it up must also
>> identify the state.
>>
>> Create a new in_mwait variable beside __softirq_pending, so we can use a
>> CMPXCHG to set the softirq while also observing the status safely. Implement
>> an x86 version of arch_pend_softirq() which does this.
>>
>> In mwait_idle_with_hints(), advertise in_mwait, with an explanation of
>> precisely what it means. X86_BUG_MONITOR can be accounted for simply by not
>> advertising in_mwait.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> CC: Anthony PERARD <anthony.perard@xxxxxxxxxx>
>> CC: Michal Orzel <michal.orzel@xxxxxxx>
>> CC: Julien Grall <julien@xxxxxxx>
>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>>
>> This is modelled after Linux's TIF_NEED_RESCHED (single bit equivelent of all
>> of __softirq_pending), and TIF_POLLING_NRFLAG (arch-neutral "in_mwait").
>>
>> In Linux, they're both in the same flags field, which adds complexity. In
>> Xen, __softirq_pending is already unsigned long for everything other than
>> x86,
>> so adding an arch-neutral "in_mwait" would need wider changes.
>> ---
>> xen/arch/x86/acpi/cpu_idle.c | 20 +++++++++++++++++-
>> xen/arch/x86/include/asm/hardirq.h | 14 +++++++++++-
>> xen/arch/x86/include/asm/softirq.h | 34 ++++++++++++++++++++++++++++++
>> 3 files changed, 66 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
>> index caa0fef0b3b1..13040ef467a0 100644
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -439,7 +439,21 @@ __initcall(cpu_idle_key_init);
>> void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>> {
>> unsigned int cpu = smp_processor_id();
>> - const unsigned int *this_softirq_pending = &softirq_pending(cpu);
>> + irq_cpustat_t *stat = &irq_stat[cpu];
>> + const unsigned int *this_softirq_pending = &stat->__softirq_pending;
>> +
>> + /*
>> + * By setting in_mwait, we promise to other CPUs that we'll notice
>> changes
>> + * to __softirq_pending without being sent an IPI. We achieve this by
>> + * either not going to sleep, or by having hardware notice on our
>> behalf.
>> + *
>> + * Some errata exist where MONITOR doesn't work properly, and the
>> + * workaround is to force the use of an IPI. Cause this to happen by
>> + * simply not advertising outselves as being in_mwait.
>> + */
>> + alternative_io("movb $1, %[in_mwait]",
>> + "", X86_BUG_MONITOR,
>> + [in_mwait] "=m" (stat->in_mwait));
>>
>> monitor(this_softirq_pending, 0, 0);
>> smp_mb();
>> @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned
>> int ecx)
>> mwait(eax, ecx);
>> spec_ctrl_exit_idle(info);
>> }
>> +
>> + alternative_io("movb $0, %[in_mwait]",
>> + "", X86_BUG_MONITOR,
>> + [in_mwait] "=m" (stat->in_mwait));
> Isn't it a bit overkill to use alternatives here when you could have a
> conditional based on a global variable to decide whether to set/clear
> in_mwait?
I view it differently. Why should the common case suffer overhead
(extra memory read and conditional branch) to work around 3 buggy pieces
of hardware in a common path?
>> }
>>
>> static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
>> diff --git a/xen/arch/x86/include/asm/hardirq.h
>> b/xen/arch/x86/include/asm/hardirq.h
>> index f3e93cc9b507..1647cff04dc8 100644
>> --- a/xen/arch/x86/include/asm/hardirq.h
>> +++ b/xen/arch/x86/include/asm/hardirq.h
>> @@ -5,7 +5,19 @@
>> #include <xen/types.h>
>>
>> typedef struct {
>> - unsigned int __softirq_pending;
>> + /*
>> + * The layout is important. Any CPU can set bits in __softirq_pending,
>> + * but in_mwait is a status bit owned by the CPU. softirq_mwait_raw
>> must
>> + * cover both, and must be in a single cacheline.
>> + */
>> + union {
>> + struct {
>> + unsigned int __softirq_pending;
>> + bool in_mwait;
> Given the usage in assembly where you store 0 and 1, this might better
> be a uint8_t then?
We have loads of asm code which accesses bool's like this. C guarantees
that sizeof(bool) >= sizeof(char).
>
>> + };
>> + uint64_t softirq_mwait_raw;
>> + };
> This could be a named union type ...
>
>> +
>> unsigned int __local_irq_count;
>> unsigned int nmi_count;
>> unsigned int mce_count;
>> diff --git a/xen/arch/x86/include/asm/softirq.h
>> b/xen/arch/x86/include/asm/softirq.h
>> index e4b194f069fb..069e5716a68d 100644
>> --- a/xen/arch/x86/include/asm/softirq.h
>> +++ b/xen/arch/x86/include/asm/softirq.h
>> @@ -1,6 +1,8 @@
>> #ifndef __ASM_SOFTIRQ_H__
>> #define __ASM_SOFTIRQ_H__
>>
>> +#include <asm/system.h>
>> +
>> #define NMI_SOFTIRQ (NR_COMMON_SOFTIRQS + 0)
>> #define TIME_CALIBRATE_SOFTIRQ (NR_COMMON_SOFTIRQS + 1)
>> #define VCPU_KICK_SOFTIRQ (NR_COMMON_SOFTIRQS + 2)
>> @@ -9,4 +11,36 @@
>> #define HVM_DPCI_SOFTIRQ (NR_COMMON_SOFTIRQS + 4)
>> #define NR_ARCH_SOFTIRQS 5
>>
>> +/*
>> + * Ensure softirq @nr is pending on @cpu. Return true if an IPI can be
>> + * skipped, false if the IPI cannot be skipped.
>> + *
>> + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in order
>> to
>> + * set softirq @nr while also observing in_mwait in a race-free way.
>> + */
>> +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int
>> cpu)
>> +{
>> + uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw;
>> + uint64_t old, new;
> ... so that you also use it here?
No, it cant. The of softirq_pending() in common code requires no
intermediate field names, and I'm not untangling that mess in a series
wanting backporting.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |