[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] x86/watchdog: Use real timestamps for watchdog timeout



Do not assume that we will only receive interrupts at a rate of nmi_hz.  On a
test system being debugged, I observed a PCI SERR being continuously asserted
without the SERR bit being set.  The result was Xen "exceeding" a 300 second
timeout within 1 second.

Change the nmi_watchdog_tick() timecounting to use timestamps rather than a
rate calculated from nmi_hz (which itself has been seen to deviate on some
systems due to Turbo/Pstates).

Also, move the comment to a more appropriate place (as we would expect to
enter the old if() block once a second anyway), and fix up two trailing
whitespace issues.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

diff -r c6fb586f83a0 -r ebb0070be9fd xen/arch/x86/nmi.c
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -396,7 +396,7 @@ static struct notifier_block cpu_nmi_nfb
 };
 
 static DEFINE_PER_CPU(unsigned int, last_irq_sums);
-static DEFINE_PER_CPU(unsigned int, alert_counter);
+static DEFINE_PER_CPU(s_time_t, last_irq_change);
 
 static atomic_t watchdog_disable_count = ATOMIC_INIT(1);
 
@@ -432,23 +432,22 @@ void nmi_watchdog_tick(struct cpu_user_r
     if ( (this_cpu(last_irq_sums) == sum) &&
          !atomic_read(&watchdog_disable_count) )
     {
-        /*
-         * Ayiee, looks like this CPU is stuck ... wait for the timeout
-         * before doing the oops ...
-         */
-        this_cpu(alert_counter)++;
-        if ( this_cpu(alert_counter) == opt_watchdog_timeout*nmi_hz )
+        s_time_t last_change = this_cpu(last_irq_change);
+
+        if ( (NOW() - last_change) > SECONDS(opt_watchdog_timeout) )
         {
+            /* Ayiee, looks like this CPU is stuck. */
+
             console_force_unlock();
             printk("Watchdog timer detects that CPU%d is stuck!\n",
                    smp_processor_id());
             fatal_trap(TRAP_nmi, regs);
         }
-    } 
-    else 
+    }
+    else
     {
         this_cpu(last_irq_sums) = sum;
-        this_cpu(alert_counter) = 0;
+        this_cpu(last_irq_change) = NOW();
     }
 
     if ( nmi_perfctr_msr )

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.