|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
> During automated testing of the Xen crash path via fatal NMIs, we discovered
> the following deadlock:
>
> [22:47:04 UTC] (XEN) ****************************************
> [22:47:04 UTC] (XEN) Panic on CPU 0:
> [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi)
> [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
> [22:47:04 UTC] (XEN) ****************************************
> [22:47:04 UTC] (XEN)
> [22:47:04 UTC] (XEN) Reboot in five seconds...
> [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002
> [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0
> [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0
> [22:47:05 UTC] (XEN)
> [22:47:05 UTC] (XEN) ****************************************
> [22:47:05 UTC] (XEN) Panic on CPU 0:
> [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed
> [22:47:05 UTC] (XEN) ****************************************
> [22:47:05 UTC] (XEN)
> [22:47:05 UTC] (XEN) Reboot in five seconds...
> [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not
> available.
>
> The problem is quite difficult to reproduce (seen twice, unable to reproduce
> manually), but inspection of kexec_crash() path shows a certain deadlock
> because of one_cpu_only().
>
>
> This patch replaces the use of one_cpu_only() with a recursive spinlock which
> provides exclusion between different CPUs racing into kexec_crash(), but
> prevents deadlock if the same CPU reenters kexec_crash() again.
Does kexec_crash() not mind being reentered in this way?
-- Keir
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>
> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6
> }
> }
>
> -static void one_cpu_only(void)
> -{
> - /* Only allow the first cpu to continue - force other cpus to spin */
> - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
> - for ( ; ; ) ;
> -}
> -
> /* Save the registers in the per-cpu crash note buffer. */
> void kexec_crash_save_cpu(void)
> {
> @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void)
> watchdog_disable();
> console_start_sync();
> spin_debug_disable();
> - one_cpu_only();
> acpi_dmar_reinstate();
> }
>
> void kexec_crash(void)
> {
> + static DEFINE_SPINLOCK(crash_lock);
> int pos;
>
> pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0);
> if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
> return;
>
> + /* In a cascade failure, it is possible to re-enter kexec_crash on the
> same
> + * CPU. This spinlock is deliberately never unlocked, to allow
> reentrance
> + * on the same CPU, but provides excusion from different CPUs racing. */
> + spin_lock_recursive(&crash_lock);
> +
> kexecing = TRUE;
> + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
>
> kexec_common_shutdown();
> kexec_crash_save_cpu();
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |