[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2 of 3] KEXEC: Allocate crash notes on boot v6



>>> On 22.12.11 at 18:36, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> Currently, the buffers for crash notes are allocated per CPU when a
> KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
> question.
> 
> Although it certainly works in general, there are a few edge case problems:
> 
> 1) There appears to be no guarentee that dom0 will make this hypercall
>    for each pcpu on the system.  There appears to be variable
>    behaviour depending on how many cpus dom0 is compiled for, and how
>    many vcpus Xen gives to dom0.
> 
> 2) The allocation of these buffers occur at the whim of dom0.  While
>    this is typically very early on dom0 boot, but not guarenteed.
> 
> 3) It is possible (although not sensible) for a crash
>    kernel to be loaded without these (or some of these) hypercalls
>    being made. Under these circumstances, a crash would cause the
>    crash note code path will suffer a slew of null pointer deferences.
> 
> 4) If the hypercalls are made after late in the day, it is possible
>    for the hypercall to return -ENOMEM.  As code tends to be more
>    fragile once memory is enhausted, the likelyhood of us needing the
>    crash kernel is greater.

Also for the record, I continue to believe that none of these warrant
the changes made here. Better error/NULL checking may instead be
required in a few places, plus possibly some Dom0 code adjustments
(see my recent 2.6.18 kernel changes, on top of which pCPU hot add
support was implemented [which can't go into the 2.6.18 kernel as it
lacks general pCPU hot add code]).

> In addition, my forthcoming code to support 32bit kdump kernels on
> 64bit Xen on large (>64GB) boxes will require some guarentees as to
> where the crash note buffers are actually allocated in physical
> memory.  This is far easier to sort out at boot time, rather than
> after dom0 has been booted and potentially using the physical memory
> required.

This one is the only part of the rationale for these changes that I
view as half way reasonable, though with allocation in Xen going
from top down, not being able to allocate these is about the same
as e.g. Dom0 not being able to set up its swiotlb. That's what the
(by default) 128M reservation is for. So if anything, the number
there may want some tweaking for the case of very many CPUs.

Jan

> Therefore, allocate the crash note buffers at boot time.
> 
> Changes since v5:
>  *  Introduce sizeof_cpu_notes to move calculation of note size into a
>     separate location.
>  *  Tweak sizeof_note() to return size_t rather than int, as it is a
>     function based upon sizeof().
> 
> Changes since v4:
> 
>  *  Replace the current cpu crash note scheme of using void pointers
>     and hand calculating the size each time is needed, by a range
>     structure containing a pointer and a size.  This removes duplicate
>     times where the size is calculated.
>  *  Tweak kexec_get_cpu().  Don't fail if a cpu is offline because it
>     may already have crash notes, and may be up by the time a crash
>     happens.  Split the error conditions up to return ERANGE for an
>     out-of-range cpu request rather than EINVAL.  Finally, returning a
>     range of zeros is acceptable, so do this in preference to failing.
> 
> Changes since v3:
> 
>  *  Alter the spinlocks to avoid calling xmalloc/xfree while holding
>     the lock.
>  *  Tidy up the coding style used.
> 
> Changes since v2:
> 
>  *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
>     rather than statically using NR_CPUS.
>  *  Fix the incorrect use of signed integers for cpu id.
>  *  Fix collateral damage to do_crashdump_trigger() and
>     crashdump_trigger_handler caused by reordering sizeof_note() and
>     setup_note()
>  *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
>     No functional change.
>  *  Change kexec_get_cpu() to attempt to allocate crash note buffers
>     in case we have more free memory now than when the pcpu came up.
>  *  Now that there are two codepaths possibly allocating crash notes,
>     protect the allocation itself with a spinlock.
> 
> Changes since v1:
> 
>  *  Use cpu hotplug notifiers to handle allocating of the notes
>     buffers rather than assuming the boot state of cpus will be the
>     same as the crash state.
>  *  Move crash_notes from being per_cpu.  This is because the kdump
>     kernel elf binary put in the crash area is hard coded to physical
>     addresses which the dom0 kernel typically obtains at boot time.
>     If a cpu is offlined, its buffer should not be deallocated because
>     the kdump kernel would read junk when trying to get the crash
>     notes.  Similarly, the same problem would occur if the cpu was
>     re-onlined later and its crash notes buffer was allocated elsewhere.
>  *  Only attempt to allocate buffers if a crash area has been
>     specified.  Else, allocating crash note buffers is a waste of
>     space.  Along with this, change the test in kexec_get_cpu to
>     return -EINVAL if no buffers have been allocated.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> 
> diff -r f571dc8e4368 -r 3da37c68284f xen/common/kexec.c
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -25,13 +25,19 @@
>  #include <xen/kexec.h>
>  #include <public/elfnote.h>
>  #include <xsm/xsm.h>
> +#include <xen/cpu.h>
>  #ifdef CONFIG_COMPAT
>  #include <compat/kexec.h>
>  #endif
>  
>  bool_t kexecing = FALSE;
>  
> -static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
> +/* Memory regions to store the per cpu register state etc. on a crash. */
> +typedef struct { Elf_Note * start; size_t size; } crash_note_range_t;
> +static crash_note_range_t * crash_notes;
> +
> +/* Lock to prevent race conditions when allocating the crash note buffers. 
> */
> +static DEFINE_SPINLOCK(crash_notes_lock);
>  
>  static Elf_Note *xen_crash_note;
>  
> @@ -173,13 +179,17 @@ static void one_cpu_only(void)
>  void kexec_crash_save_cpu(void)
>  {
>      int cpu = smp_processor_id();
> -    Elf_Note *note = per_cpu(crash_notes, cpu);
> +    Elf_Note *note;
>      ELF_Prstatus *prstatus;
>      crash_xen_core_t *xencore;
>  
> +    BUG_ON ( ! crash_notes );
> +
>      if ( cpumask_test_and_set_cpu(cpu, &crash_saved_cpus) )
>          return;
>  
> +    note = crash_notes[cpu].start;
> +
>      prstatus = (ELF_Prstatus *)ELFNOTE_DESC(note);
>  
>      note = ELFNOTE_NEXT(note);
> @@ -265,13 +275,6 @@ static struct keyhandler crashdump_trigg
>      .desc = "trigger a crashdump"
>  };
>  
> -static __init int register_crashdump_trigger(void)
> -{
> -    register_keyhandler('C', &crashdump_trigger_keyhandler);
> -    return 0;
> -}
> -__initcall(register_crashdump_trigger);
> -
>  static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
>  {
>      int l = strlen(name) + 1;
> @@ -281,13 +284,144 @@ static void setup_note(Elf_Note *n, cons
>      n->type = type;
>  }
>  
> -static int sizeof_note(const char *name, int descsz)
> +static size_t sizeof_note(const char *name, int descsz)
>  {
>      return (sizeof(Elf_Note) +
>              ELFNOTE_ALIGN(strlen(name)+1) +
>              ELFNOTE_ALIGN(descsz));
>  }
>  
> +static size_t sizeof_cpu_notes(const unsigned long cpu)
> +{
> +    /* All CPUs present a PRSTATUS and crash_xen_core note. */
> +    size_t bytes =
> +        + sizeof_note("CORE", sizeof(ELF_Prstatus)) +
> +        + sizeof_note("Xen", sizeof(crash_xen_core_t));
> +
> +    /* CPU0 also presents the crash_xen_info note. */
> +    if ( ! cpu )
> +        bytes = bytes +
> +            sizeof_note("Xen", sizeof(crash_xen_info_t));
> +
> +    return bytes;
> +}
> +
> +/* Allocate a crash note buffer for a newly onlined cpu. */
> +static int kexec_init_cpu_notes(const unsigned long cpu)
> +{
> +    Elf_Note * note = NULL;
> +    int ret = 0;
> +    int nr_bytes = 0;
> +
> +    BUG_ON( cpu >= nr_cpu_ids || ! crash_notes );
> +
> +    /* If already allocated, nothing to do. */
> +    if ( crash_notes[cpu].start )
> +        return ret;
> +
> +    nr_bytes = sizeof_cpu_notes(cpu);
> +    note = xmalloc_bytes(nr_bytes);
> +
> +    /* Protect the write into crash_notes[] with a spinlock, as this 
> function
> +     * is on a hotplug path and a hypercall path. */
> +    spin_lock(&crash_notes_lock);
> +
> +    /* If we are racing with another CPU and it has beaten us, give up
> +     * gracefully. */
> +    if ( crash_notes[cpu].start )
> +    {
> +        spin_unlock(&crash_notes_lock);
> +        /* Always return ok, because whether we successfully allocated or 
> not,
> +         * another CPU has successfully allocated. */
> +        if ( note )
> +            xfree(note);
> +    }
> +    else
> +    {
> +        crash_notes[cpu].start = note;
> +        crash_notes[cpu].size = nr_bytes;
> +        spin_unlock(&crash_notes_lock);
> +
> +        /* If the allocation failed, and another CPU did not beat us, give
> +         * up with ENOMEM. */
> +        if ( ! note )
> +            ret = -ENOMEM;
> +        /* else all is good so lets set up the notes. */
> +        else
> +        {
> +            /* Set up CORE note. */
> +            setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
> +            note = ELFNOTE_NEXT(note);
> +
> +            /* Set up Xen CORE note. */
> +            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
> +                       sizeof(crash_xen_core_t));
> +
> +            if ( ! cpu )
> +            {
> +                /* Set up Xen Crash Info note. */
> +                xen_crash_note = note = ELFNOTE_NEXT(note);
> +                setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
> +                           sizeof(crash_xen_info_t));
> +            }
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static int cpu_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +    unsigned long cpu = (unsigned long)hcpu;
> +
> +    /* Only hook on CPU_UP_PREPARE because once a crash_note has been 
> reported
> +     * to dom0, it must keep it around in case of a crash, as the crash 
> kernel
> +     * will be hard coded to the original physical address reported. */
> +    switch ( action )
> +    {
> +    case CPU_UP_PREPARE:
> +        /* Ignore return value.  If this boot time, -ENOMEM will cause all
> +         * manner of problems elsewhere very soon, and if it is during 
> runtime,
> +         * then failing to allocate crash notes is not a good enough reason 
> to
> +         * fail the CPU_UP_PREPARE */
> +        kexec_init_cpu_notes(cpu);
> +        break;
> +    default:
> +        break;
> +    }
> +    return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_nfb = {
> +    .notifier_call = cpu_callback
> +};
> +
> +static int __init kexec_init(void)
> +{
> +    void *cpu = (void *)(unsigned long)smp_processor_id();
> +
> +    /* If no crash area, no need to allocate space for notes. */
> +    if ( !kexec_crash_area.size )
> +        return 0;
> +
> +    register_keyhandler('C', &crashdump_trigger_keyhandler);
> +
> +    crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids);
> +    if ( ! crash_notes )
> +        return -ENOMEM;
> +
> +    memset(crash_notes, 0, sizeof(crash_note_range_t) * nr_cpu_ids);
> +
> +    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
> +    register_cpu_notifier(&cpu_nfb);
> +    return 0;
> +}
> +/* The reason for this to be a presmp_initcall as opposed to a regular
> + * __initcall is to allow the setup of the cpu hotplug handler before APs 
> are
> + * brought up. */
> +presmp_initcall(kexec_init);
> +
>  static int kexec_get_reserve(xen_kexec64_range_t *range)
>  {
>      if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
> @@ -302,44 +436,29 @@ static int kexec_get_reserve(xen_kexec64
>  static int kexec_get_cpu(xen_kexec64_range_t *range)
>  {
>      int nr = range->nr;
> -    int nr_bytes = 0;
>  
> -    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
> +    if ( nr < 0 || nr >= nr_cpu_ids )
> +        return -ERANGE;
> +
> +    if ( ! crash_notes )
>          return -EINVAL;
>  
> -    nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
> -    nr_bytes += sizeof_note("Xen", sizeof(crash_xen_core_t));
> +    /* Try once again to allocate room for the crash notes.  It is just 
> possible
> +     * that more space has become available since we last tried.  If space 
> has
> +     * already been allocated, kexec_init_cpu_notes() will return early 
> with 0.
> +     */
> +    kexec_init_cpu_notes(nr);
>  
> -    /* The Xen info note is included in CPU0's range. */
> -    if ( nr == 0 )
> -        nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
> +    /* In the case of still not having enough memory to allocate buffer 
> room,
> +     * returning a range of 0,0 is still valid. */
> +    if ( crash_notes[nr].start )
> +    {
> +        range->start = __pa(crash_notes[nr].start);
> +        range->size = crash_notes[nr].size;
> +    }
> +    else
> +        range->start = range->size = 0;
>  
> -    if ( per_cpu(crash_notes, nr) == NULL )
> -    {
> -        Elf_Note *note;
> -
> -        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
> -
> -        if ( note == NULL )
> -            return -ENOMEM;
> -
> -        /* Setup CORE note. */
> -        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
> -
> -        /* Setup Xen CORE note. */
> -        note = ELFNOTE_NEXT(note);
> -        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, 
> sizeof(crash_xen_core_t));
> -
> -        if (nr == 0)
> -        {
> -            /* Setup system wide Xen info note. */
> -            xen_crash_note = note = ELFNOTE_NEXT(note);
> -            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, 
> sizeof(crash_xen_info_t));
> -        }
> -    }
> -
> -    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
> -    range->size = nr_bytes;
>      return 0;
>  }
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx 
> http://lists.xensource.com/xen-devel 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.