[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] x86/crash: Indicate how well nmi_shootdown_cpus() managed to do.
>>> On 24.09.13 at 21:56, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > Having nmi_shootdown_cpus() report which pcpus failed to be shot down is a > useful debugging hint as to what possibly went wrong (especially when the > crash logs seem to indicate that an NMI timeout occurred while waiting for > one > of the problematic pcpus to perform an action). > > This is achieved by swapping an atomic_t count of unreported pcpus with a > cpumask. In the case that the 1 second timeout occurs, use the cpumask to > identify the problematic pcpus. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > CC: Keir Fraser <keir@xxxxxxx> > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Tim Deegan <tim@xxxxxxx> > > --- > > We in XenServer have seen a few crashes like this recently, and having an > extra bit of debugging on the serial console or in the conring is > substantially more helpful than trying to piece the crash together after-the- > fact based on what information is missing. > --- > xen/arch/x86/crash.c | 20 ++++++++++++++++---- > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c > index 0a807d1..5f0f07c 100644 > --- a/xen/arch/x86/crash.c > +++ b/xen/arch/x86/crash.c > @@ -22,6 +22,7 @@ > #include <xen/perfc.h> > #include <xen/kexec.h> > #include <xen/sched.h> > +#include <xen/keyhandler.h> > #include <public/xen.h> > #include <asm/shared.h> > #include <asm/hvm/support.h> > @@ -30,7 +31,7 @@ > #include <xen/iommu.h> > #include <asm/hpet.h> > > -static atomic_t waiting_for_crash_ipi; > +static cpumask_t waiting_to_crash; > static unsigned int crashing_cpu; > static DEFINE_PER_CPU_READ_MOSTLY(bool_t, crash_save_done); > > @@ -65,7 +66,7 @@ void __attribute__((noreturn)) do_nmi_crash(struct > cpu_user_regs *regs) > __stop_this_cpu(); > > this_cpu(crash_save_done) = 1; > - atomic_dec(&waiting_for_crash_ipi); > + cpumask_clear_cpu(cpu, &waiting_to_crash); > } > > /* Poor mans self_nmi(). __stop_this_cpu() has reverted the LAPIC > @@ -122,7 +123,8 @@ static void nmi_shootdown_cpus(void) > crashing_cpu = cpu; > local_irq_count(crashing_cpu) = 0; > > - atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1); > + cpumask_copy(&waiting_to_crash, &cpu_online_map); > + cpumask_clear_cpu(cpu, &waiting_to_crash); cpumask_andnot(&waiting_to_crash, &cpu_online_map, cpumask_of(cpu)); Jan > > /* Change NMI trap handlers. Non-crashing pcpus get nmi_crash which > * invokes do_nmi_crash (above), which cause them to write state and > @@ -162,12 +164,22 @@ static void nmi_shootdown_cpus(void) > smp_send_nmi_allbutself(); > > msecs = 1000; /* Wait at most a second for the other cpus to stop */ > - while ( (atomic_read(&waiting_for_crash_ipi) > 0) && msecs ) > + while ( (cpumask_weight(&waiting_to_crash) > 0) && msecs ) > { > mdelay(1); > msecs--; > } > > + /* Leave a hint of how well we did trying to shoot down the other cpus > */ > + if ( msecs ) > + printk("Shot down all cpus\n"); > + else > + { > + cpulist_scnprintf(keyhandler_scratch, sizeof keyhandler_scratch, > + &waiting_to_crash); > + printk("Failed to shoot down cpus {%s}\n", keyhandler_scratch); > + } > + > /* Crash shutdown any IOMMU functionality as the crashdump kernel is > not > * happy when booting if interrupt/dma remapping is still enabled */ > iommu_crash_shutdown(); > -- > 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |