[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.



In some cases, such as suffering a queued-invalidation timeout while
performing an iommu_crash_shutdown(), Xen can end up reentering the crash
path. Previously, this would result in a deadlock in one_cpu_only(), as the
test_and_set_bit() would fail.

The crash path is not reentrant, and even if it could be made to be so, it is
almost certain that we would fall over the same reentry condition again.

The new code can distinguish a reentry case from multiple cpus racing down the
crash path.  In the case that a reentry is detected, return back out to the
nested panic() call, which will maybe_reboot() on our behalf.  This requires a
bit of return plumbing back up to kexec_crash().

While fixing this deadlock, also fix up an minor niggle seen recently from a
XenServer crash report.  The report was from a Bank 8 MCE, which had managed
to crash on all cpus at once.  The result was a lot of stack traces with cpus
in kexec_common_shutdown(), which was infact the inlined version of
one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
afford to prevent inlining for the sake of clarity in the stack traces.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CC: Keir Fraser <keir@xxxxxxx>
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Tim Deegan <tim@xxxxxxx>
CC: David Vrabel <david.vrabel@xxxxxxxxxx>
---
 xen/common/kexec.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 17f3ed7..481b0c2 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram)
     }
 }
 
-static void one_cpu_only(void)
+/*
+ * Only allow one cpu to continue on the crash path, forcing others to spin.
+ * Racing on the crash path from here will end in misery.  If we reenter,
+ * something has very gone wrong and retrying will (almost certainly) be
+ * futile.  Return up to our nested panic() to try and reboot.
+ *
+ * This is noinline to make it obvious in stack traces which cpus have lost
+ * the race (as opposed to being somewhere in kexec_common_shutdown())
+ */
+static int noinline one_cpu_only(void)
 {
-    /* Only allow the first cpu to continue - force other cpus to spin */
-    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
-        for ( ; ; ) ;
+    static unsigned int crashing_cpu = -1;
+    unsigned int cpu = smp_processor_id();
+
+    if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 )
+    {
+        /* Not the first entry into one_cpu_only(). */
+        if ( crashing_cpu == cpu )
+        {
+            printk("Reentered the crash path.  Something is very broken\n");
+            return -EBUSY;
+        }
+
+        /*
+         * Another cpu has beaten us to this point.  Wait here patiently for
+         * it to kill us.
+         */
+        for ( ; ; )
+            halt();
+    }
+
+    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
+    return 0;
 }
 
 /* Save the registers in the per-cpu crash note buffer. */
@@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void)
     return out;
 }
 
-static void kexec_common_shutdown(void)
+static int kexec_common_shutdown(void)
 {
+    int ret;
+
+    ret = one_cpu_only();
+    if ( ret )
+        return ret;
+
     watchdog_disable();
     console_start_sync();
     spin_debug_disable();
-    one_cpu_only();
     acpi_dmar_reinstate();
+
+    return 0;
 }
 
 void kexec_crash(void)
@@ -309,7 +344,9 @@ void kexec_crash(void)
 
     kexecing = TRUE;
 
-    kexec_common_shutdown();
+    if ( kexec_common_shutdown() != 0 )
+        return;
+
     kexec_crash_save_cpu();
     machine_crash_shutdown();
     machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.