[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration



----- srivatsa.bhat@xxxxxxxxxxxxxxxxxx wrote:

> Subsystems that want to register CPU hotplug callbacks, as well as
> perform
> initialization for the CPUs that are already online, often do it as
> shown
> below:
> 
>       get_online_cpus();
> 
>       for_each_online_cpu(cpu)
>               init_cpu(cpu);
> 
>       register_cpu_notifier(&foobar_cpu_notifier);
> 
>       put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running
> concurrently
> with CPU hotplug operations).
> 
> Interestingly, the balloon code in xen can actually prevent double
> initialization and hence can use the following simplified form of
> callback
> registration:
> 
>       register_cpu_notifier(&foobar_cpu_notifier);
> 
>       get_online_cpus();
> 
>       for_each_online_cpu(cpu)
>               init_cpu(cpu);
> 
>       put_online_cpus();
> 
> A hotplug operation that occurs between registering the notifier and
> calling
> get_online_cpus(), won't disrupt anything, because the code takes care
> to
> perform the memory allocations only once.
> 
> So reorganize the balloon code in xen this way to fix the deadlock
> with
> callback registration.
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
> ---
> 
>  drivers/xen/balloon.c |   35 +++++++++++++++++++++++------------
>  1 file changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 37d06ea..afe1a3f 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned
> long start_pfn,
>       }
>  }
>  
> +static int alloc_balloon_scratch_page(int cpu)
> +{
> +     if (per_cpu(balloon_scratch_page, cpu) != NULL)
> +             return 0;
> +
> +     per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> +     if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> +             pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> +             return -ENOMEM;
> +     }
> +
> +     return 0;
> +}
> +
> +
>  static int balloon_cpu_notify(struct notifier_block *self,
>                                   unsigned long action, void *hcpu)
>  {
>       int cpu = (long)hcpu;
>       switch (action) {
>       case CPU_UP_PREPARE:
> -             if (per_cpu(balloon_scratch_page, cpu) != NULL)
> -                     break;
> -             per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> -             if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> -                     pr_warn("Failed to allocate balloon_scratch_page for 
> cpu %d\n",
> cpu);
> +             if (alloc_balloon_scratch_page(cpu))
>                       return NOTIFY_BAD;
> -             }
>               break;
>       default:
>               break;
> @@ -624,15 +634,16 @@ static int __init balloon_init(void)
>               return -ENODEV;
>  
>       if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> -             for_each_online_cpu(cpu)
> -             {
> -                     per_cpu(balloon_scratch_page, cpu) = 
> alloc_page(GFP_KERNEL);
> -                     if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> -                             pr_warn("Failed to allocate 
> balloon_scratch_page for cpu %d\n",
> cpu);
> +             register_cpu_notifier(&balloon_cpu_notifier);
> +
> +             get_online_cpus();
> +             for_each_online_cpu(cpu) {
> +                     if (alloc_balloon_scratch_page(cpu)) {
> +                             put_online_cpus();
>                               return -ENOMEM;


Not that original code was doing a particularly thorough job of cleaning up on 
allocation failure but if it couldn't get memory it would not register the 
notifier. So perhaps you should unregister it before returning here.

I am also not sure how we were susceptible to the deadlock here since we didn't 
call get_online_cpus(). (We probably should have but then commit description 
should say it).

-boris

>                       }
>               }
> -             register_cpu_notifier(&balloon_cpu_notifier);
> +             put_online_cpus();
>       }
>  
>       pr_info("Initialising balloon driver\n");

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.