[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure



Hi,

 

Am Montag, 28. Oktober 2019, 18:30:12 CET schrieb Stonehouse, Robert:

> This is a heads-up as I have observed that the following commit (backported onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail.

> ========

> commit c719519a4183d0630121f6abeba420f49dbc3229

> Author: Jan Beulich <jbeulich@xxxxxxxx>

> AuthorDate: Fri Jul 5 10:32:41 2019 +0200

> Commit: Jan Beulich <jbeulich@xxxxxxxx>

> CommitDate: Fri Jul 5 10:32:41 2019 +0200

>

> x86/SMP: don't try to stop already stopped CPUs

>

> In particular with an enabled IOMMU (but not really limited to this

> case), trying to invoke fixup_irqs() after having already done

> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:

> ========

>

> The test was performing "echo c > /proc/sysrq-trigger" in dom0 and the loaded crash kernel fails to show any signs of starting. This is the end of the Xen console ...

> ========

> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

> <machine hangs here then reboots via the BIOS after 5 seconds>

> ========

> Expected behaviour is that the kdump kernel immediately loads and then performs the crash dump

 

I can confirm this behavior but with xen version (4.11.0_08-1) from

SuSE SLES12 SP4 which doesn't contain the said commit

c719519a4183d0630121f6abeba420f49dbc3229.
But I can see this only on systems with newer Intel CPUS like

"Intel(R) Xeon(R) Gold 6242 CPU".


Dietmar.

 

>

> I'm sorry that I have not yet had time to check if this affects vanilla stable-4.11 or master. I just wanted to be certain that you don't have the same issue.

>

>

> Reverting one hunk via the following commit fixes things for me (this is an experiment and not at all a proposed fix)

> ========

> --- a/xen/arch/x86/smp.c

> +++ b/xen/arch/x86/smp.c

> @@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)

> void smp_send_stop(void)

> {

> unsigned int cpu = smp_processor_id();

> +

> + local_irq_disable();

> + fixup_irqs(cpumask_of(cpu), 0);

> + local_irq_enable();

>

> if ( num_online_cpus() > 1 )

> {

> int timeout = 10;

>

> - local_irq_disable();

> - fixup_irqs(cpumask_of(cpu), 0);

> - local_irq_enable();

> -

> smp_call_function(stop_this_cpu, NULL, 0);

>

> /* Wait 10ms for all other CPUs to go offline. */

> ========

>

> Regards

> Rob

>

> _______________________________________________

> Xen-devel mailing list

> Xen-devel@xxxxxxxxxxxxxxxxxxxx

> https://lists.xenproject.org/mailman/listinfo/xen-devel

 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.