[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0
On Tue, Oct 24, 2023 at 11:33:21AM +0200, Jan Beulich wrote: > On 23.10.2023 14:46, Roger Pau Monne wrote: > > Sporadically we have seen the following during AP bringup on AMD platforms > > only: > > > > microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = > > 2023-05-17 > > microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = > > 2023-05-17 > > CPU60: No irq handler for vector 27 (IRQ -2147483648) > > microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = > > 2023-05-17 > > > > This is similar to the issue raised on Linux commit 36e9e1eab777e, where > > they > > observed i8259 (active) vectors getting delivered to CPUs different than 0. > > > > On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt > > descriptors to contain all possible CPUs, so that APs will reserve the > > vector > > at startup if any legacy IRQ is still delivered through the i8259. Note > > that > > if the IO-APIC takes over those interrupt descriptors the CPU mask will be > > reset. > > > > Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected > > even > > when all i8259 pins are masked, and hence would need to be handled on all > > CPUs. > > > > Do not reserve the PIC spurious vectors on all CPUs, but do check for such > > spurious interrupts on all CPUs if the vendor is AMD or Hygon. > > The first half of this sentence reads as if it was describing a change, > but aiui there's no change to existing behavior in this regard anymore. > Maybe use something like "continue to reserve PIC vectors on CPU0 only"? > > > Note that once > > the vectors get used by devices detecting PIC spurious interrupts will no > > longer be possible, however the device should be able to cope with spurious > > interrupt. Such PIC spurious interrupts occurring when the vector is in > > use by > > a local APIC routed source will lead to an extra EOI, which might > > unintentionally clear a different vector from ISR. Note this is already the > > current behavior, so assume it's infrequent enough to not cause real issues. > > > > Finally, adjust the printed message to display the CPU where the spurious > > interrupt has been received, so it looks like: > > > > microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = > > 2023-05-17 > > cpu1: spurious 8259A interrupt: IRQ7 > > microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = > > 2023-05-17 > > > > Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs') > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > Changes since v1: > > - Do not reserved spurious PIC vectors on APs, but still check for spurious > > PIC interrupts. > > - Reword commit message. > > --- > > Not sure if the Fixes tag is the most appropriate here, since AFAICT this > > is a > > hardware glitch, but it makes it easier to see to which versions the fix > > should > > be backported, because Xen previous behavior was to reserve all legacy > > vectors > > on all CPUs. > > I'm inclined to suggest to (informally) invent an Amends: tag. > > > --- a/xen/arch/x86/i8259.c > > +++ b/xen/arch/x86/i8259.c > > @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq); > > > > bool bogus_8259A_irq(unsigned int irq) > > { > > + if ( smp_processor_id() && > > + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) > > ) > > + /* > > + * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, > > as it > > + * has been observed that during unknown circumstances spurious PIC > > + * interrupts have been delivered to CPUs different than the BSP. > > + */ > > + return false; > > + > > return !_mask_and_ack_8259A_irq(irq); > > } > > I don't think this function should be changed; imo the adjustment belongs > at the call site. It makes the call site much more complex to follow, in fact I was considering pulling the PIC vector range checks into bogus_8259A_irq(). Would that convince you into placing the check here rather than in the caller context? > > @@ -349,7 +359,22 @@ void __init init_IRQ(void) > > continue; > > desc->handler = &i8259A_irq_type; > > per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq; > > - cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu)); > > + > > + /* > > + * The interrupt affinity logic never targets interrupts to offline > > + * CPUs, hence it's safe to use cpumask_all here. > > + * > > + * Legacy PIC interrupts are only targeted to CPU0, but depending > > on > > + * the platform they can be distributed to any online CPU in > > hardware. > > + * Note this behavior has only been observed on AMD hardware. In > > order > > + * to cope install all active legacy vectors on all CPUs. > > + * > > + * IO-APIC will change the destination mask if/when taking > > ownership of > > + * the interrupt. > > + */ > > + cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor & > > + (X86_VENDOR_AMD | > > X86_VENDOR_HYGON) ? > > + &cpumask_all : cpumask_of(cpu)); > > Nit: Imo this would better be wrapped as > > cpumask_copy(desc->arch.cpu_mask, > boot_cpu_data.x86_vendor & > (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? > &cpumask_all : cpumask_of(cpu)); > > or (considering how we often prefer to wrap conditional operators) > > cpumask_copy(desc->arch.cpu_mask, > boot_cpu_data.x86_vendor & > (X86_VENDOR_AMD | X86_VENDOR_HYGON) > ? &cpumask_all : cpumask_of(cpu)); > > or > > cpumask_copy(desc->arch.cpu_mask, > boot_cpu_data.x86_vendor & > (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all > : cpumask_of(cpu)); > > , possibly with the argument spanning three lines additionally > parenthesized as a whole. > > (Of course an amd_like() construct like we have in the emulator would > further help readability here, but the way that works it cannot be > easily generalized for use outside of the emulator.) I think the last one could be my preferred indentation, will adjust to that. Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |