|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0
On 23.10.2023 14:46, Roger Pau Monne wrote:
> Sporadically we have seen the following during AP bringup on AMD platforms
> only:
>
> microcode: CPU59 updated from revision 0x830107a to 0x830107a, date =
> 2023-05-17
> microcode: CPU60 updated from revision 0x830104d to 0x830107a, date =
> 2023-05-17
> CPU60: No irq handler for vector 27 (IRQ -2147483648)
> microcode: CPU61 updated from revision 0x830107a to 0x830107a, date =
> 2023-05-17
>
> This is similar to the issue raised on Linux commit 36e9e1eab777e, where they
> observed i8259 (active) vectors getting delivered to CPUs different than 0.
>
> On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt
> descriptors to contain all possible CPUs, so that APs will reserve the vector
> at startup if any legacy IRQ is still delivered through the i8259. Note that
> if the IO-APIC takes over those interrupt descriptors the CPU mask will be
> reset.
>
> Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected even
> when all i8259 pins are masked, and hence would need to be handled on all
> CPUs.
>
> Do not reserve the PIC spurious vectors on all CPUs, but do check for such
> spurious interrupts on all CPUs if the vendor is AMD or Hygon.
The first half of this sentence reads as if it was describing a change,
but aiui there's no change to existing behavior in this regard anymore.
Maybe use something like "continue to reserve PIC vectors on CPU0 only"?
> Note that once
> the vectors get used by devices detecting PIC spurious interrupts will no
> longer be possible, however the device should be able to cope with spurious
> interrupt. Such PIC spurious interrupts occurring when the vector is in use
> by
> a local APIC routed source will lead to an extra EOI, which might
> unintentionally clear a different vector from ISR. Note this is already the
> current behavior, so assume it's infrequent enough to not cause real issues.
>
> Finally, adjust the printed message to display the CPU where the spurious
> interrupt has been received, so it looks like:
>
> microcode: CPU1 updated from revision 0x830107a to 0x830107a, date =
> 2023-05-17
> cpu1: spurious 8259A interrupt: IRQ7
> microcode: CPU2 updated from revision 0x830104d to 0x830107a, date =
> 2023-05-17
>
> Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs')
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Changes since v1:
> - Do not reserved spurious PIC vectors on APs, but still check for spurious
> PIC interrupts.
> - Reword commit message.
> ---
> Not sure if the Fixes tag is the most appropriate here, since AFAICT this is a
> hardware glitch, but it makes it easier to see to which versions the fix
> should
> be backported, because Xen previous behavior was to reserve all legacy vectors
> on all CPUs.
I'm inclined to suggest to (informally) invent an Amends: tag.
> --- a/xen/arch/x86/i8259.c
> +++ b/xen/arch/x86/i8259.c
> @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
>
> bool bogus_8259A_irq(unsigned int irq)
> {
> + if ( smp_processor_id() &&
> + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> + /*
> + * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, as
> it
> + * has been observed that during unknown circumstances spurious PIC
> + * interrupts have been delivered to CPUs different than the BSP.
> + */
> + return false;
> +
> return !_mask_and_ack_8259A_irq(irq);
> }
I don't think this function should be changed; imo the adjustment belongs
at the call site.
> @@ -349,7 +359,22 @@ void __init init_IRQ(void)
> continue;
> desc->handler = &i8259A_irq_type;
> per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
> - cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
> +
> + /*
> + * The interrupt affinity logic never targets interrupts to offline
> + * CPUs, hence it's safe to use cpumask_all here.
> + *
> + * Legacy PIC interrupts are only targeted to CPU0, but depending on
> + * the platform they can be distributed to any online CPU in
> hardware.
> + * Note this behavior has only been observed on AMD hardware. In
> order
> + * to cope install all active legacy vectors on all CPUs.
> + *
> + * IO-APIC will change the destination mask if/when taking ownership
> of
> + * the interrupt.
> + */
> + cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor &
> + (X86_VENDOR_AMD |
> X86_VENDOR_HYGON) ?
> + &cpumask_all : cpumask_of(cpu));
Nit: Imo this would better be wrapped as
cpumask_copy(desc->arch.cpu_mask,
boot_cpu_data.x86_vendor &
(X86_VENDOR_AMD | X86_VENDOR_HYGON) ?
&cpumask_all : cpumask_of(cpu));
or (considering how we often prefer to wrap conditional operators)
cpumask_copy(desc->arch.cpu_mask,
boot_cpu_data.x86_vendor &
(X86_VENDOR_AMD | X86_VENDOR_HYGON)
? &cpumask_all : cpumask_of(cpu));
or
cpumask_copy(desc->arch.cpu_mask,
boot_cpu_data.x86_vendor &
(X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
: cpumask_of(cpu));
, possibly with the argument spanning three lines additionally
parenthesized as a whole.
(Of course an amd_like() construct like we have in the emulator would
further help readability here, but the way that works it cannot be
easily generalized for use outside of the emulator.)
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |