Hi,
Am Montag, 28. Oktober 2019, 18:30:12 CET schrieb Stonehouse, Robert:
> This is a heads-up as I have observed that the following commit (backported onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail.
> ========
> commit c719519a4183d0630121f6abeba420f49dbc3229
> Author: Jan Beulich <jbeulich@xxxxxxxx>
> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
> Commit: Jan Beulich <jbeulich@xxxxxxxx>
> CommitDate: Fri Jul 5 10:32:41 2019 +0200
>
> x86/SMP: don't try to stop already stopped CPUs
>
> In particular with an enabled IOMMU (but not really limited to this
> case), trying to invoke fixup_irqs() after having already done
> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
> ========
>
> The test was performing "echo c > /proc/sysrq-trigger" in dom0 and the loaded crash kernel fails to show any signs of starting. This is the end of the Xen console ...
> ========
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
> <machine hangs here then reboots via the BIOS after 5 seconds>
> ========
> Expected behaviour is that the kdump kernel immediately loads and then performs the crash dump
I can confirm this behavior but with xen version (4.11.0_08-1) from
SuSE SLES12 SP4 which doesn't contain the said commit
c719519a4183d0630121f6abeba420f49dbc3229. But I can see this only on systems with newer Intel CPUS like
"Intel(R) Xeon(R) Gold 6242 CPU".
Dietmar.
>
> I'm sorry that I have not yet had time to check if this affects vanilla stable-4.11 or master. I just wanted to be certain that you don't have the same issue.
>
>
> Reverting one hunk via the following commit fixes things for me (this is an experiment and not at all a proposed fix)
> ========
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
> void smp_send_stop(void)
> {
> unsigned int cpu = smp_processor_id();
> +
> + local_irq_disable();
> + fixup_irqs(cpumask_of(cpu), 0);
> + local_irq_enable();
>
> if ( num_online_cpus() > 1 )
> {
> int timeout = 10;
>
> - local_irq_disable();
> - fixup_irqs(cpumask_of(cpu), 0);
> - local_irq_enable();
> -
> smp_call_function(stop_this_cpu, NULL, 0);
>
> /* Wait 10ms for all other CPUs to go offline. */
> ========
>
> Regards
> Rob
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel
|