[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86: show remote CPU state upon fatal NMI
>>> On 14.06.16 at 17:03, <andrew.cooper3@xxxxxxxxxx> wrote: > On 14/06/16 15:33, Jan Beulich wrote: >> @@ -525,6 +526,35 @@ void vcpu_show_execution_state(struct vc >> vcpu_unpause(v); >> } >> >> +static cpumask_t nmi_show_state_mask; >> +static bool_t opt_nmi_show_all; >> + >> +static int __init get_nmi_show_all(void) >> +{ >> + const char *s = strchr(opt_nmi, ','); >> + >> + if ( s && !strcmp(s + 1, "show-all") ) >> + opt_nmi_show_all = 1; >> + >> + return 0; >> +} >> +presmp_initcall(get_nmi_show_all); >> + >> +static int nmi_show_execution_state(const struct cpu_user_regs *regs, int >> cpu) >> +{ >> + if ( !cpumask_test_cpu(cpu, &nmi_show_state_mask) ) >> + return 0; >> + >> + if ( opt_nmi_show_all ) >> + show_execution_state(regs); >> + else >> + printk(XENLOG_ERR "CPU%d @ %04x:%08lx (%pS)\n", cpu, regs->cs, >> regs->rip, >> + guest_mode(regs) ? _p(regs->rip) : NULL); >> + cpumask_clear_cpu(cpu, &nmi_show_state_mask); > > I would clear the mask before printing state. Given the nature of this > handler, it liable to contend sufficiently on the console lock to induce > the further watchdog timeout. I had it that way for a brief period of time, but it's wrong: It would let the master continue its bringing down of the host before all CPUs managed to print their stuff. And I don't see any issue with spin lock contention here at all - performance is certainly not of any concern. >> @@ -570,6 +600,15 @@ void fatal_trap(const struct cpu_user_re >> printk("Faulting linear address: %p\n", _p(cr2)); >> show_page_walk(cr2); >> } >> + else if ( trapnr == TRAP_nmi ) >> + { >> + cpumask_andnot(&nmi_show_state_mask, &cpu_online_map, >> + cpumask_of(smp_processor_id())); >> + set_nmi_callback(nmi_show_execution_state); >> + smp_send_nmi_allbutself(); > > This would cause far less spinlock contention as > > for_each_cpu( cpu, nmi_show_state_mask ) > smp_send_nmi(cpu); > > I realise this involves introducing a new smp function, but it would > substantially reduce contention on the console lock. Again, I don't see why lock contention would matter here. And then I also don't see how sending the IPIs individually would make matters significantly better: The sending will surely finish much faster than the printing. > I would recommend moving this clause into nmi_watchdog_tick(), so that > it doesn't get involved for non-watchdog NMIs. IOCK/SERR NMIs won't > have anything interesting to print from here. I would also recommend > disabling the watchdog before IPI'ing. And indeed I would have wanted it there, but I can't see how it can reasonably be put there: fatal_trap() doesn't return, so we can't put it after. And we definitely want to get state of the local CPU out before we try to log state of any of the remote CPUs. So the only options I see would be to - somehow specially flag the regs structure, but that feels hackish (among other aspects nmi_watchdog_tick() has that parameter const qualified for the very reason that it isn't supposed to fiddle with it), - introduce a global flag. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |