[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path
At 11:35 +0000 on 12 Dec (1355312134), Andrew Cooper wrote: > diff -r ef8c1b607b10 -r 96b068439bc4 xen/arch/x86/crash.c > --- a/xen/arch/x86/crash.c > +++ b/xen/arch/x86/crash.c > @@ -32,41 +32,127 @@ > > static atomic_t waiting_for_crash_ipi; > static unsigned int crashing_cpu; > +static DEFINE_PER_CPU_READ_MOSTLY(bool_t, crash_save_done); > > -static int crash_nmi_callback(struct cpu_user_regs *regs, int cpu) > +/* This becomes the NMI handler for non-crashing CPUs, when Xen is crashing. > */ > +void __attribute__((noreturn)) do_nmi_crash(struct cpu_user_regs *regs) > { > - /* Don't do anything if this handler is invoked on crashing cpu. > - * Otherwise, system will completely hang. Crashing cpu can get > - * an NMI if system was initially booted with nmi_watchdog parameter. > + int cpu = smp_processor_id(); > + > + /* nmi_shootdown_cpus() should ensure that this assertion is correct. */ > + ASSERT( cpu != crashing_cpu ); nmi_shootdown_cpus() has a go, but I think it would have to do an atomic compare-exchange on crashing_cpu to actually be sure that only one CPU is crashing at a time. If two CPUs try to lead crashes at the same time, it will deadlock here, with NMIs disabled on all CPUs. The old behaviour was also not entirely race-free, but a little more likey to make progress, as in most cases at least one CPU would see cpu == crashing_cpu here and return. Using cmpxchg in nmi_shootdown_cpus() would have the drawback that if one CPU tried to kexec and wedged, another CPU couldn't then have a go. Not sure which of all these unlikely scenarios is worst. :) > diff -r ef8c1b607b10 -r 96b068439bc4 xen/arch/x86/x86_64/entry.S > --- a/xen/arch/x86/x86_64/entry.S > +++ b/xen/arch/x86/x86_64/entry.S > @@ -635,11 +635,45 @@ ENTRY(nmi) > movl $TRAP_nmi,4(%rsp) > jmp handle_ist_exception > > +ENTRY(nmi_crash) > + cli Aren't interrupts disabled already because we came in through an interrupt gate? > + pushq $0 > + movl $TRAP_nmi,4(%rsp) > + SAVE_ALL > + movq %rsp,%rdi Why? do_nmi_crash() doesn't actually use any of its arguments. AFAICT, we could pretty much use do_nmi_crash explicitly in the IDT entry. > + callq do_nmi_crash /* Does not return */ > + ud2 > + > ENTRY(machine_check) > pushq $0 > movl $TRAP_machine_check,4(%rsp) > jmp handle_ist_exception > > +/* Enable NMIs. No special register assumptions. Only %rax is not > preserved. */ > +ENTRY(enable_nmis) > + movq %rsp, %rax /* Grab RSP before pushing */ > + > + /* Set up stack frame */ > + pushq $0 /* SS */ > + pushq %rax /* RSP */ > + pushfq /* RFLAGS */ > + pushq $__HYPERVISOR_CS /* CS */ > + leaq 1f(%rip),%rax > + pushq %rax /* RIP */ > + > + iretq /* Disable the hardware NMI latch */ > +1: > + retq This could be made a bit smaller by something like: popq %rcx /* Remember return address */ movq %rsp, %rax /* Grab RSP before pushing */ /* Set up stack frame */ pushq $0 /* SS */ pushq %rax /* RSP */ pushfq /* RFLAGS */ pushq $__HYPERVISOR_CS /* CS */ pushq %rcx /* RIP */ iretq /* Disable the hardware NMI latch and return */ which also keeps the call/return counts balanced. But tbh I'm not sure it's worth it. And I suspect you'll tell me why it's wrong. :) > + > +/* No op trap handler. Required for kexec crash path. This is not > + * declared with the ENTRY() macro to avoid wasted alignment space. > + */ > +.globl trap_nop > +trap_nop: > + iretq The commit message says this reuses the iretq in enable_nmis(). Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |