[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()



On 10.03.2020 08:28, Juergen Gross wrote:
> @@ -143,51 +143,75 @@ static int qhimark = 10000;
>  static int qlowmark = 100;
>  static int rsinterval = 1000;
>  
> -struct rcu_barrier_data {
> -    struct rcu_head head;
> -    atomic_t *cpu_count;
> -};
> +/*
> + * rcu_barrier() handling:
> + * cpu_count holds the number of cpu required to finish barrier handling.
> + * Cpus are synchronized via softirq mechanism. rcu_barrier() is regarded to
> + * be active if cpu_count is not zero. In case rcu_barrier() is called on
> + * multiple cpus it is enough to check for cpu_count being not zero on entry
> + * and to call process_pending_softirqs() in a loop until cpu_count drops to
> + * zero, as syncing has been requested already and we don't need to sync
> + * multiple times.
> + * In order to avoid hangs when rcu_barrier() is called mutiple times on the
> + * same cpu in fast sequence and a slave cpu couldn't drop out of the
> + * barrier handling fast enough a second counter done_count is needed.
> + */
> +static atomic_t cpu_count = ATOMIC_INIT(0);
> +static atomic_t done_count = ATOMIC_INIT(0);

From its use below this looks more like "todo_count" or
"pending_count".

> +void rcu_barrier(void)
>  {
> -    atomic_t cpu_count = ATOMIC_INIT(0);
> -    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
> +    unsigned int n_cpus;
> +
> +    while ( !get_cpu_maps() )
> +    {
> +        process_pending_softirqs();
> +        if ( !atomic_read(&cpu_count) )
> +            return;
> +
> +        cpu_relax();
> +    }
> +
> +    n_cpus = num_online_cpus();
> +
> +    if ( atomic_cmpxchg(&cpu_count, 0, n_cpus) == 0 )
> +    {
> +        atomic_add(n_cpus, &done_count);
> +        cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
> +    }
> +
> +    while ( atomic_read(&done_count) )

Don't you leave a window for races here, in that done_count
gets set to non-zero only after setting cpu_count? A CPU
losing the cmpxchg attempt above may observe done_count
still being zero, and hence exit without waiting for the
count to actually _drop_ to zero.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.