[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86: show remote CPU state upon fatal NMI
On 15/06/16 13:59, Jan Beulich wrote: >>>> On 15.06.16 at 13:03, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 15/06/16 08:55, Jan Beulich wrote: >>>>> @@ -570,6 +600,15 @@ void fatal_trap(const struct cpu_user_re >>>>> printk("Faulting linear address: %p\n", _p(cr2)); >>>>> show_page_walk(cr2); >>>>> } >>>>> + else if ( trapnr == TRAP_nmi ) >>>>> + { >>>>> + cpumask_andnot(&nmi_show_state_mask, &cpu_online_map, >>>>> + cpumask_of(smp_processor_id())); >>>>> + set_nmi_callback(nmi_show_execution_state); >>>>> + smp_send_nmi_allbutself(); >>>> This would cause far less spinlock contention as >>>> >>>> for_each_cpu( cpu, nmi_show_state_mask ) >>>> smp_send_nmi(cpu); >>>> >>>> I realise this involves introducing a new smp function, but it would >>>> substantially reduce contention on the console lock. >>> Again, I don't see why lock contention would matter here. And then >>> I also don't see how sending the IPIs individually would make matters >>> significantly better: The sending will surely finish much faster than >>> the printing. >> Contention is a problem because you have replaced the NMI callback, and >> the watchdog is still running. Especially if sync_console is in effect, >> you are liable to incur a further timeout, queueing up more NMIs. >> >> Although now I think of it, that won't matter so long as the NMIs don't >> nest. >> >> The one advantage of sending the NMIs in order is that the information >> dump will happen in order, which is slightly more use than having them >> in a random order on a large machine. > How that? All the NMIs will still arrive at about the same time, so > while some low numbered CPUs may indeed get their state printed > in order, higher numbered ones may still make it into the lock region > in any order. (And no, building upon ticket locks making randomness > much less likely is neither an option, nor would it really help: Just > think of a lower numbered CPU first having to come out of a deep > C-state or running at a much lower P-state than a higher numbered > one.) Hmm true. There isn't a acknowledgement of the start of the NMI handler, and putting one in sounds like far more effort and fragility than it is worth. > >>>> I would recommend moving this clause into nmi_watchdog_tick(), so that >>>> it doesn't get involved for non-watchdog NMIs. IOCK/SERR NMIs won't >>>> have anything interesting to print from here. I would also recommend >>>> disabling the watchdog before IPI'ing. >>> And indeed I would have wanted it there, but I can't see how it can >>> reasonably be put there: fatal_trap() doesn't return, so we can't put >>> it after. And we definitely want to get state of the local CPU out >>> before we try to log state of any of the remote CPUs. So the only >>> options I see would be to >>> - somehow specially flag the regs structure, but that feels hackish >>> (among other aspects nmi_watchdog_tick() has that parameter >>> const qualified for the very reason that it isn't supposed to fiddle >>> with it), >>> - introduce a global flag. >> How about a boolean flag to fatal_trap()? It doesn't have many callers, >> and this kind of printing might also be useful for some MCEs. > Ah, right, there indeed aren't that many. Can you qualify "some" > a bit better, so that maybe I can have the patch pass true there > right away? Any MCE which is in-practice synchronous might benefit. I wouldn't worry about updating the MCE callers as part of this. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |