[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.20 v3 0/5] xen/x86: prevent local APIC errors at shutdown
On Wed, Feb 12, 2025 at 09:51:16AM +0100, Jan Beulich wrote: > On 12.02.2025 09:33, Oleksii Kurochko wrote: > > > > On 2/11/25 7:39 PM, Roger Pau Monné wrote: > >> On Tue, Feb 11, 2025 at 12:02:04PM +0100, Roger Pau Monne wrote: > >>> Hello, > >>> > >>> The following series aims to prevent local APIC errors from stalling the > >>> shtudown process. On XenServer testing we have seen reports of AMD > >>> boxes sporadically getting stuck in a spam of: > >>> > >>> APIC error on CPU0: 00(08), Receive accept error > >>> > >>> Messages during shutdown, as a result of device interrupts targeting > >>> CPUs that are offline (and have the local APIC disabled). > >>> > >>> First patch strictly solves the issue of shutdown getting stuck, further > >>> patches aim to quiesce interrupts from all devices (known by Xen) as an > >>> attempt to prevent a spurious "APIC error on CPU0: 00(00)" plus also > >>> make kexec more reliable. > >>> > >>> Thanks, Roger. > >>> > >>> Roger Pau Monne (5): > >>> x86/shutdown: offline APs with interrupts disabled on all CPUs > >>> x86/irq: drop fixup_irqs() parameters > >>> x86/smp: perform disabling on interrupts ahead of AP shutdown > >>> x86/pci: disable MSI(-X) on all devices at shutdown > >>> x86/iommu: disable interrupts at shutdown > >> This is now fully reviewed, can I get your opinion (and > >> release-acked-by) on which patches we should take for 4.20? > > > > If my understanding is correct to unblock shutdown process, it is enough > > just > > to have only first patch merged, correct? So the first patch should be > > merged. > > > > As second patch doesn't have functional changes, IMO, it could be merged to > > despite of the fact we have Hard code freeze period. > > > > All other patches, I would like to ask additional opinion (as I am an > > expert in x86), > > at first glance it looks like an absence of these patches in staging branch > > will > > lead only to triggering "Receive accept error" which I believe won't block > > shutdown > > process, so these patches could be postponed until 4.21. On other side, if > > it is > > low-risk fixes then we could consider to merge them now. I expect the following patches might make kexec'ing from Xen a bit more reliable, as the kexec'ed kernel should find an environment with interrupts from all Xen known devices quiesced. > I'm not Roger, but as a data point: While I'm uncertain about patch 2, all > others in this series will very likely be backported anyway. I plan to backport the series to the XenServer patch queue also when it goes in. Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |