[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

To: Sean Christopherson <seanjc@xxxxxxxxxx>
From: "Guilherme G. Piccoli" <gpiccoli@xxxxxxxxxx>
Date: Tue, 10 May 2022 17:11:08 -0300
Cc: akpm@xxxxxxxxxxxxxxxxxxxx, bhe@xxxxxxxxxx, pmladek@xxxxxxxx, kexec@xxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, bcm-kernel-feedback-list@xxxxxxxxxxxx, linuxppc-dev@xxxxxxxxxxxxxxxx, linux-alpha@xxxxxxxxxxxxxxx, linux-edac@xxxxxxxxxxxxxxx, linux-hyperv@xxxxxxxxxxxxxxx, linux-leds@xxxxxxxxxxxxxxx, linux-mips@xxxxxxxxxxxxxxx, linux-parisc@xxxxxxxxxxxxxxx, linux-pm@xxxxxxxxxxxxxxx, linux-remoteproc@xxxxxxxxxxxxxxx, linux-s390@xxxxxxxxxxxxxxx, linux-tegra@xxxxxxxxxxxxxxx, linux-um@xxxxxxxxxxxxxxxxxxx, linux-xtensa@xxxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, openipmi-developer@xxxxxxxxxxxxxxxxxxxxx, rcu@xxxxxxxxxxxxxxx, sparclinux@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, x86@xxxxxxxxxx, kernel-dev@xxxxxxxxxx, kernel@xxxxxxxxxxxx, halves@xxxxxxxxxxxxx, fabiomirmar@xxxxxxxxx, alejandro.j.jimenez@xxxxxxxxxx, andriy.shevchenko@xxxxxxxxxxxxxxx, arnd@xxxxxxxx, bp@xxxxxxxxx, corbet@xxxxxxx, d.hatayama@xxxxxxxxxxxxxx, dave.hansen@xxxxxxxxxxxxxxx, dyoung@xxxxxxxxxx, feng.tang@xxxxxxxxx, gregkh@xxxxxxxxxxxxxxxxxxx, mikelley@xxxxxxxxxxxxx, hidehiro.kawai.ez@xxxxxxxxxxx, jgross@xxxxxxxx, john.ogness@xxxxxxxxxxxxx, keescook@xxxxxxxxxxxx, luto@xxxxxxxxxx, mhiramat@xxxxxxxxxx, mingo@xxxxxxxxxx, paulmck@xxxxxxxxxx, peterz@xxxxxxxxxxxxx, rostedt@xxxxxxxxxxx, senozhatsky@xxxxxxxxxxxx, stern@xxxxxxxxxxxxxxxxxxx, tglx@xxxxxxxxxxxxx, vgoyal@xxxxxxxxxx, vkuznets@xxxxxxxxxx, will@xxxxxxxxxx, "David P . Reed" <dpreed@xxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>
Delivery-date: Tue, 10 May 2022 20:13:27 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09/05/2022 12:52, Sean Christopherson wrote:
> I find the shortlog to be very confusing, the bug has nothing to do with 
> disabling
> VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent 
> doom
> if VMX is already disabled :-).  The issue is really that 
> nmi_shootdown_cpus() doesn't
> play nice with being called twice.
> 

Hey Sean, OK - I agree with you, the issue is really about the double
list addition.

> [...]
> 
> Call stacks for the two callers would be very, very helpful.
> [...]

> This feels like were just adding more duct tape to the mess.  nmi_shootdown() 
> is
> still unsafe for more than one caller, and it takes a _lot_ of staring and 
> searching
> to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, 
> i.e.
> that it will call smp_ops.crash_stop_other_cpus() and not just 
> smp_send_stop().
> 
> Rather than shared a flag between two relatively unrelated functions, what if 
> we
> instead disabling virtualization in crash_nmi_callback() and then turn the 
> reboot
> call into a nop if an NMI shootdown has already occurred?  That will also add 
> a
> bit of documentation about multiple shootdowns not working.
> 
> And I believe there's also a lurking bug in 
> native_machine_emergency_restart() that
> can be fixed with cleanup.  SVM can also block INIT and so should be disabled 
> during
> an emergency reboot.
> 
> The attached patches are compile tested only.  If they seem sane, I'll post an
> official mini series.

Thanks Sean, it makes sense - my patch is more a "band-aid" whereas
yours fixes it in a more generic way. Confess I found the logic of your
patch complex, but as you said, it requires a *lot* of code analysis to
understand these multiple shutdown patches, the problem is complicated
by nature heh

I've tested your patch 0001 and it works well for all cases [0], so go
ahead and submit the miniseries, feel free to add:

Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx>


I've read patch 0002 and it makes sense to me as well, a good proactive
bug fix =)

With that said, I'll of course drop this one from V2 of this series.
Cheers,


Guilherme




[0]
A summary of my tests and the code paths that the panic shutdown take
depending on some conditions:

New function that disables VMX/SVM: cpu_crash_disable_virtualization()
[should be executed in every online CPU on shutdown)

The panic path triggers the following call stacks depending on kdump and
post_notifiers:


(1) kexec/kdump + !crash_kexec_post_notifiers
->machine_crash_shutdown()
----.crash_shutdown() <custom handler>
------native_machine_crash_shutdown() [all custom handlers except Xen PV
call the native generic function]
--------crash_smp_send_stop()
----------kdump_nmi_shootdown_cpus()
------------nmi_shootdown_cpus(kdump_nmi_callback)
--------------crash_nmi_callback()
----------------kdump_nmi_callback()
------------------cpu_crash_disable_virtualization()


(2) kexec/kdump + crash_kexec_post_notifiers
->crash_smp_send_stop()
----kdump_nmi_shootdown_cpus()
------nmi_shootdown_cpus(kdump_nmi_callback)
--------crash_nmi_callback()
----------kdump_nmi_callback()
------------cpu_crash_disable_virtualization()

After this path, will execute machine_crash_shutdown() but
crash_smp_send_stop()
is guarded against double execution. Also, emergency restart calls
emergency_vmx_disable_all() .


(3) !kexec/kdump + crash_kexec_post_notifiers

Same as (2)


(4) !kexec/kdump + !crash_kexec_post_notifiers
-> smp_send_stop()
----native_stop_other_cpus()
------apic_send_IPI_allbutself(REBOOT_VECTOR)
--------sysvec_reboot
----------cpu_emergency_vmxoff() <if the IPI approach succeeded, CPU
stopped here>

If not:
------register_stop_handler()
--------apic_send_IPI_allbutself(NMI_VECTOR)
----------smp_stop_nmi_callback()
------------cpu_emergency_vmxoff()

After that, emergency_vmx_disable_all() gets called in the emergency
restart path as well.

References:
- Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  - From: Sean Christopherson

Prev by Date: [ovmf test] 170308: regressions - FAIL
Next by Date: [ovmf test] 170309: regressions - FAIL
Previous by thread: Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
Next by thread: Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.