[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.



On Thu, 27 Jul 2017, Dario Faggioli wrote:
> Xen is a tickless (micro-)kernel. This means that, when a CPU
> becomes idle, we stop all the activity on it, including any
> periodic tick or timer.
> 
> When we imported RCU from Linux, Linux (x86) was a ticking
> kernel, i.e., there was a periodic timer tick always running,
> even on totally idle CPUs. This was bad from a power efficiency
> perspective, but it's what maked it possible to monitor the
> quiescent states of all the CPUs, and hence tell when an RCU
> grace period ends.
> 
> In Xen, that is impossible, and that's particularly problematic
> when system is idle (or lightly loaded) systems, as CPUs that
> are idle may never have the chance to tell RCU about their
> quiescence, and grace periods could extend indefinitely!
> 
> This has led, on x86, to long (an unpredictable) delays between
> RCU callbacks queueing and invokation. On ARM, we actually see
> infinite grace periods (e.g., complate_domain_destroy() may
> never be actually invoked on an idle system). See here:
> 
>  https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html
> 
> The first step for fixing this situation is for RCU to record,
> at the beginning of a grace period, which CPUs are already idle.
> In fact, being idle, they can't be in the middle of any read-side
> critical section, and we don't have to wait for them to declare
> a grace period finished.
> 
> This is tracked in a cpumask, in a way that is very similar to
> how Linux also was achieving the same on s390 --which indeed was
> tickless already, even back then-- and to what it started to do
> for x86, from 2.6.21 on (see commit 79bf2bb3 "tick-management:
> dyntick / highres functionality").
> 
> While there, also adopt the memory barrier introduced by Linux
> commit commit c3f59023 ("Fix RCU race in access of nohz_cpu_mask").
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> ---
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> Cc: Julien Grall <julien.grall@xxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
>  xen/arch/arm/domain.c         |    2 ++
>  xen/arch/x86/acpi/cpu_idle.c  |   25 +++++++++++++++++--------
>  xen/arch/x86/cpu/mwait-idle.c |    9 ++++++++-
>  xen/arch/x86/domain.c         |    8 +++++++-
>  xen/common/rcupdate.c         |   28 ++++++++++++++++++++++++++--
>  xen/include/xen/rcupdate.h    |    3 +++
>  6 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index fce29cb..666b7ef 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -50,8 +50,10 @@ static void do_idle(void)
>      local_irq_disable();
>      if ( cpu_is_haltable(cpu) )
>      {
> +        rcu_idle_enter(cpu);
>          dsb(sy);
>          wfi();
> +        rcu_idle_exit(cpu);
>      }
>      local_irq_enable();
>  
> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> index 8cc5a82..f0fdc87 100644
> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -52,7 +52,8 @@ static struct rcu_ctrlblk {
>      int  next_pending;  /* Is the next batch already waiting?         */
>  
>      spinlock_t  lock __cacheline_aligned;
> -    cpumask_t   cpumask; /* CPUs that need to switch in order    */
> +    cpumask_t   cpumask; /* CPUs that need to switch in order ... */
> +    cpumask_t   idle_cpumask; /* ... unless they are already idle */
>      /* for current batch to proceed.        */
>  } __cacheline_aligned rcu_ctrlblk = {
>      .cur = -300,
> @@ -248,7 +249,14 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp)
>          smp_wmb();
>          rcp->cur++;
>  
> -        cpumask_copy(&rcp->cpumask, &cpu_online_map);
> +       /*
> +        * Accessing idle_cpumask before incrementing rcp->cur needs a
> +        * Barrier  Otherwise it can cause tickless idle CPUs to be
                    ^ otherwise


> +        * included in rcp->cpumask, which will extend graceperiods
> +        * unnecessarily.
> +        */

It doesn't look like this comment applies to this code: we are accessing
idle_cpumask after rcp->cur here. Unless you meant "Accessing
idle_cpumask *after* incrementing rcp->cur."

Also could you please add a pointer to the other barrier in the pair
(barriers always go in pair, for example I think the smp_wmb() above in
rcu_start_batch is matched by the smp_rmb() in __rcu_process_callbacks.)


> +        smp_mb();
> +        cpumask_andnot(&rcp->cpumask, &cpu_online_map, &rcp->idle_cpumask);
>      }
>  }
>  
> @@ -474,7 +482,23 @@ static struct notifier_block cpu_nfb = {
>  void __init rcu_init(void)
>  {
>      void *cpu = (void *)(long)smp_processor_id();
> +
> +    cpumask_setall(&rcu_ctrlblk.idle_cpumask);
>      cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
>      register_cpu_notifier(&cpu_nfb);
>      open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
>  }
> +
> +/*
> + * The CPU is becoming idle, so no more read side critical
> + * sections, and one more step toward grace period.
> + */
> +void rcu_idle_enter(unsigned int cpu)
> +{
> +    cpumask_set_cpu(cpu, &rcu_ctrlblk.idle_cpumask);
> +}
> +
> +void rcu_idle_exit(unsigned int cpu)
> +{
> +    cpumask_clear_cpu(cpu, &rcu_ctrlblk.idle_cpumask);
> +}
> diff --git a/xen/include/xen/rcupdate.h b/xen/include/xen/rcupdate.h
> index 557a7b1..561ac43 100644
> --- a/xen/include/xen/rcupdate.h
> +++ b/xen/include/xen/rcupdate.h
> @@ -146,4 +146,7 @@ void call_rcu(struct rcu_head *head,
>  
>  int rcu_barrier(void);
>  
> +void rcu_idle_enter(unsigned int cpu);
> +void rcu_idle_exit(unsigned int cpu);
> +
>  #endif /* __XEN_RCUPDATE_H */
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.