[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1 of 4 RFC] x86/kexec: Change NMI and MCE handling on kexec path
>>> On 04.12.12 at 19:16, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > --- a/xen/arch/x86/crash.c > +++ b/xen/arch/x86/crash.c > @@ -32,41 +32,97 @@ > > static atomic_t waiting_for_crash_ipi; > static unsigned int crashing_cpu; > +static DEFINE_PER_CPU_READ_MOSTLY(bool_t, crash_save_done); > > -static int crash_nmi_callback(struct cpu_user_regs *regs, int cpu) > +/* This becomes the NMI handler for non-crashing CPUs. */ > +void __attribute__((noreturn)) do_nmi_crash(struct cpu_user_regs *regs) > { > - /* Don't do anything if this handler is invoked on crashing cpu. > - * Otherwise, system will completely hang. Crashing cpu can get > - * an NMI if system was initially booted with nmi_watchdog parameter. > + /* nmi_shootdown_cpus() should ensure that this assertion is correct. */ > + ASSERT( crashing_cpu != smp_processor_id() ); > + > + /* Save crash information and shut down CPU. Attempt only once. */ > + if ( ! this_cpu(crash_save_done) ) > + { > + kexec_crash_save_cpu(); > + __stop_this_cpu(); > + > + this_cpu(crash_save_done) = 1; > + } > + > + /* Poor mans self_nmi(). __stop_this_cpu() has reverted the LAPIC > + * back to its boot state, so we are unable to rely on the regular > + * apic_* functions, due to 'x2apic_enabled' being possibly wrong. > + * (The likely senario is that we have reverted from x2apic mode to > + * xapic, at which point #GPFs will occur if we use the apic_* > + * functions) > + * > + * The ICR and APIC ID of the LAPIC are still valid even during > + * software disable (Intel SDM Vol 3, 10.4.7.2), which allows us to > + * queue up another NMI, to force us back into this loop if we exit. > */ > - if ( cpu == crashing_cpu ) > - return 1; > - local_irq_disable(); > + switch ( current_local_apic_mode() ) > + { > + u32 apic_id; > > - kexec_crash_save_cpu(); > + case APIC_MODE_X2APIC: > + apic_id = apic_rdmsr(APIC_ID); > > - __stop_this_cpu(); > + apic_wrmsr(APIC_ICR, APIC_DM_NMI | APIC_DEST_PHYSICAL | > ((u64)apic_id << 32)); As in the description you talk about the LAPIC being (possibly) disabled here - did you check that this would not #GP in that case? > + break; > > - atomic_dec(&waiting_for_crash_ipi); > + case APIC_MODE_XAPIC: > + apic_id = GET_xAPIC_ID(apic_mem_read(APIC_ID)); > + > + while ( apic_mem_read(APIC_ICR) & APIC_ICR_BUSY ) > + cpu_relax(); > + > + apic_mem_write(APIC_ICR2, apic_id << 24); > + apic_mem_write(APIC_ICR, APIC_DM_NMI | APIC_DEST_PHYSICAL); > + break; > + > + default: > + break; > + } > > for ( ; ; ) > halt(); > - > - return 1; > } >... > ENTRY(machine_check) > pushq $0 > movl $TRAP_machine_check,4(%rsp) > jmp handle_ist_exception > > +/* No op trap handler. Required for kexec path. */ > +ENTRY(trap_nop) I'd prefer you to re-use an existing IRETQ (e.g. the one you add below) here - this is not performance critical, and hence there's no need for it to be aligned. Jan > + iretq > + > +/* Enable NMIs. No special register assumptions, and all preserved. */ > +ENTRY(enable_nmis) > + push %rax > + movq %rsp, %rax /* Grab RSP before pushing */ > + > + /* Set up stack frame */ > + pushq $0 /* SS */ > + pushq %rax /* RSP */ > + pushfq /* RFLAGS */ > + pushq $__HYPERVISOR_CS /* CS */ > + leaq 1f(%rip),%rax > + pushq %rax /* RIP */ > + > + iretq /* Disable the hardware NMI latch */ > +1: > + popq %rax > + retq > + > .section .rodata, "a", @progbits > > ENTRY(exception_table) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |