[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen staging] x86/boot: attempt to print trace and panic on AP bring up stall



commit a3a607f33f81c3ce3e3e0eaa2fe97d263278b464
Author:     Roger Pau Monne <roger.pau@xxxxxxxxxx>
AuthorDate: Mon Apr 14 15:57:55 2025 +0200
Commit:     Roger Pau Monne <roger.pau@xxxxxxxxxx>
CommitDate: Tue May 27 09:11:37 2025 +0200

    x86/boot: attempt to print trace and panic on AP bring up stall
    
    With the current AP bring up code, Xen can get stuck indefinitely if an AP
    freezes during boot after the 'callin' step.  Introduce a 5s timeout while
    waiting for APs to finish startup.
    
    On failure of an AP to complete startup, send an NMI to trigger the
    printing of a stack backtrace on the stuck AP and panic on the BSP.
    
    This patch was done while investigating the issue caused by Intel erratum
    ICX143.  It wasn't helpful in that case, but it's still and improvement
    when debugging AP bring up related issues.
    
    Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
    Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
 xen/arch/x86/smpboot.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index adcaec6899..a90caf45a5 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1375,6 +1375,7 @@ int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t 
pxm)
 int __cpu_up(unsigned int cpu)
 {
     int apicid, ret;
+    s_time_t start;
 
     if ( (apicid = x86_cpu_to_apicid[cpu]) == BAD_APICID )
         return -ENODEV;
@@ -1393,10 +1394,18 @@ int __cpu_up(unsigned int cpu)
     time_latch_stamps();
 
     set_cpu_state(CPU_STATE_ONLINE);
+    start = NOW();
     while ( !cpu_online(cpu) )
     {
         cpu_relax();
         process_pending_softirqs();
+        if ( (NOW() - start) > SECONDS(5) )
+        {
+            /* AP is stuck, send NMI and panic. */
+            show_execution_state_nmi(cpumask_of(cpu), true);
+            panic("APIC ID %#x (CPU%u) stuck while starting up\n",
+                  apicid, cpu);
+        }
     }
 
     return 0;
--
generated by git-patchbot for /home/xen/git/xen.git#staging



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.