[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/5] xen/x86: prevent local APIC errors at shutdown



On Tue, Feb 11, 2025 at 07:39:12AM +0100, Jan Beulich wrote:
> On 06.02.2025 16:06, Roger Pau Monne wrote:
> > The following series aims to prevent local APIC errors from stalling the
> > shtudown process.  On XenServer testing we have seen reports of AMD
> > boxes sporadically getting stuck in a spam of:
> > 
> > APIC error on CPU0: 00(08), Receive accept error
> > 
> > Messages during shutdown, as a result of device interrupts targeting
> > CPUs that are offline (and have the local APIC disabled).
> 
> One more thought here: Have you/we perhaps discovered the reason why there
> was that 1ms delay at the end of fixup_irqs() that was badly commented,
> and that you removed in e2bb28d62158 ("x86/irq: forward pending interrupts
> to new destination in fixup_irqs()")? May be worth mentioning that by way
> of a Fixes: tag.

Hm, so you think the delay was added there as a way to ensure any
pending interrupts would get drained (ie: serviced) on the old target?

I'm maybe a bit confused, but I don't think the delay would help much
with preventing the local APIC errors?  Regardless of the wait, if the
interrupts target offline CPUs there's a chance receive accept errors
will be triggered on AMD.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.