[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration
----- srivatsa.bhat@xxxxxxxxxxxxxxxxxx wrote: > Subsystems that want to register CPU hotplug callbacks, as well as > perform > initialization for the CPUs that are already online, often do it as > shown > below: > > get_online_cpus(); > > for_each_online_cpu(cpu) > init_cpu(cpu); > > register_cpu_notifier(&foobar_cpu_notifier); > > put_online_cpus(); > > This is wrong, since it is prone to ABBA deadlocks involving the > cpu_add_remove_lock and the cpu_hotplug.lock (when running > concurrently > with CPU hotplug operations). > > Interestingly, the balloon code in xen can actually prevent double > initialization and hence can use the following simplified form of > callback > registration: > > register_cpu_notifier(&foobar_cpu_notifier); > > get_online_cpus(); > > for_each_online_cpu(cpu) > init_cpu(cpu); > > put_online_cpus(); > > A hotplug operation that occurs between registering the notifier and > calling > get_online_cpus(), won't disrupt anything, because the code takes care > to > perform the memory allocations only once. > > So reorganize the balloon code in xen this way to fix the deadlock > with > callback registration. > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> > Cc: David Vrabel <david.vrabel@xxxxxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx > Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> > --- > > drivers/xen/balloon.c | 35 +++++++++++++++++++++++------------ > 1 file changed, 23 insertions(+), 12 deletions(-) > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 37d06ea..afe1a3f 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned > long start_pfn, > } > } > > +static int alloc_balloon_scratch_page(int cpu) > +{ > + if (per_cpu(balloon_scratch_page, cpu) != NULL) > + return 0; > + > + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); > + if (per_cpu(balloon_scratch_page, cpu) == NULL) { > + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", > cpu); > + return -ENOMEM; > + } > + > + return 0; > +} > + > + > static int balloon_cpu_notify(struct notifier_block *self, > unsigned long action, void *hcpu) > { > int cpu = (long)hcpu; > switch (action) { > case CPU_UP_PREPARE: > - if (per_cpu(balloon_scratch_page, cpu) != NULL) > - break; > - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); > - if (per_cpu(balloon_scratch_page, cpu) == NULL) { > - pr_warn("Failed to allocate balloon_scratch_page for > cpu %d\n", > cpu); > + if (alloc_balloon_scratch_page(cpu)) > return NOTIFY_BAD; > - } > break; > default: > break; > @@ -624,15 +634,16 @@ static int __init balloon_init(void) > return -ENODEV; > > if (!xen_feature(XENFEAT_auto_translated_physmap)) { > - for_each_online_cpu(cpu) > - { > - per_cpu(balloon_scratch_page, cpu) = > alloc_page(GFP_KERNEL); > - if (per_cpu(balloon_scratch_page, cpu) == NULL) { > - pr_warn("Failed to allocate > balloon_scratch_page for cpu %d\n", > cpu); > + register_cpu_notifier(&balloon_cpu_notifier); > + > + get_online_cpus(); > + for_each_online_cpu(cpu) { > + if (alloc_balloon_scratch_page(cpu)) { > + put_online_cpus(); > return -ENOMEM; Not that original code was doing a particularly thorough job of cleaning up on allocation failure but if it couldn't get memory it would not register the notifier. So perhaps you should unregister it before returning here. I am also not sure how we were susceptible to the deadlock here since we didn't call get_online_cpus(). (We probably should have but then commit description should say it). -boris > } > } > - register_cpu_notifier(&balloon_cpu_notifier); > + put_online_cpus(); > } > > pr_info("Initialising balloon driver\n"); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |