Xen project Mailing List

Re: [Xen-devel] [PATCH 5/5] xen: RCU: avoid busy waiting until the end of grace period.

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Wed, 9 Aug 2017 19:34:05 +0200

Cc: andrew.cooper3@xxxxxxxxxx, julien.grall@xxxxxxx, sstabellini@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 09 Aug 2017 17:34:33 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 2017-08-07 at 02:54 -0600, Jan Beulich wrote: > > > > Dario Faggioli <dario.faggioli@xxxxxxxxxx> 07/27/17 10:01 AM > > > > >>> > > > > Instead of having the CPU where a callback is queued, busy > > looping on rcu_pending(), use a timer. > > Isn't this rcu_needs_cpu(), according to patch 4? > I was referring to the rcu_pending() in do_softirq(). In fact, if this (roughly) is idle_loop: for ( ; ; ) { if ( unlikely(tasklet_work_to_do(cpu)) ) do_tasklet(); else pm_idle(); do_softirq(); check_for_livepatch_work(); } we don't have tasklet work to do, so we call pm_idle(). However, if we have a callback queued, rcu_needs_cpu() returns true (without calling rcu_pending()), which means cpu_is_haltable() returns false, and hence we exit pm_idle() without actually going idle, and we call do_softirq(), which does: if ( rcu_pending(cpu) ) rcu_check_callbacks(cpu); in a rather tight loop. IAC, I certainly can rephrase the sentence above, and make all this more clear. > > --- a/xen/arch/x86/acpi/cpu_idle.c > > +++ b/xen/arch/x86/acpi/cpu_idle.c > > @@ -576,10 +576,10 @@ static void acpi_processor_idle(void) > > return; > > } > > > > > + rcu_idle_timer_start(); > > cpufreq_dbs_timer_suspend(); > > Is the ordering wrt cpufreq handling here arbitrary? > Yes. > If so, wouldn't it be > better to suspend cpufreq handling before starting the idle timer > (and > also invert the ordering when going back to normal operation below)? > Well, as said above, it's arbitrary. Therefore, yes, I'm ok with doing things like you suggest. > > - > > sched_tick_suspend(); > > At which point I'd then wonder whether this couldn't be integrated > into > sched_tick_{suspend,resume}(), avoiding arch-specific code to be > altered. > Sure, I'm all for it. It's easier, cleaner, and makes a lot of sense! Only problem may be if some new arch decide not/forget to call sched_tick_suspend() and resume(). Perhaps I can add some checking in place, to make sure that this can't happen (in debug builds only, of course). > > @@ -756,6 +756,7 @@ static void mwait_idle(void) > > local_irq_enable(); > > sched_tick_resume(); > > cpufreq_dbs_timer_resume(); > > + rcu_idle_timer_stop(); > > Indentation looks odd here, but I can't exclude this being an effect > produced > by my mail web frontend. > Yep, it was indeed wrong. Fixed. Sorry. Thanks. :-) > > --- a/xen/common/rcupdate.c > > +++ b/xen/common/rcupdate.c > > @@ -84,8 +84,14 @@ struct rcu_data { > > > > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) > > If I'm not mistaken someone else had already commented on this: If > this > is (mostly) arbitrary, please say so in a comment. > Indeed there is being a long 'debate'. :-) For v2, I'm changing this to something more dynamic. And I'll sure add a comment explaining the adopted solution. > > @@ -402,7 +408,48 @@ int rcu_needs_cpu(int cpu) > > { > > struct rcu_data *rdp = &per_cpu(rcu_data, cpu); > > > > > - return (!!rdp->curlist || rcu_pending(cpu)); > > + return (!!rdp->curlist || rcu_pending(cpu)) && !rdp- > > >idle_timer_active; > > Please take the opportunity and drop the pointless !! here (unless > it's needed > for better matching up with the Linux original). > It's in the Linux code, indeed: http://elixir.free-electrons.com/linux/v2.6.21/source/kernel/rcupdate.c#L517 However, this very line is already different (because Linux has rdp_bh->curlist, which we don't, and we have rdp->idle_timer_active). So I guess I can drop '!!'. Any patch touching this line coming from Linux will need manual adjustment anyway. > > +/* > > + * Timer for making sure the CPU where a callback is queued does > > + * periodically poke rcu_pedning(), so that it will invoke the > > callback > > + * not too late after the end of the grace period. > > + */ > > +void rcu_idle_timer_start() > > +{ > > + struct rcu_data *rdp = &this_cpu(rcu_data); > > + > > + if (likely(!rdp->curlist)) > > + return; > > I would have expected this to be the inverse of the original > condition in > rcu_needs_cpu() - why is there no rcu_pending() invocation here? > The original rcu_needs_cpu() condition is: rcu->curlist || rcu_pending(cpu) So, you're saying we need something like this: if (!rdp->curlist && !rcu_pending(cpu)) return; Well, if I do this, what happens is that the if evaluate to false (and hence we don't exit the function, and we arm the timer), on CPUs that: - does not have a calback queued (curlist == NULL) - are in the process of quiescing, by going idle. In fact, in that case, here's what happens (I compacted the trace, and slightly edited some of the lines, for better readability, and for keeping the highlighting the characteristics of the situation under investigation): * complete_domain_destroy callback is queued on CPU 0. The CPU quiesces (goes through softirq), but stay budy, so no timer is armed (that's not super relevant, but is what happens in this case): x-|- d0v12 rcu_call fn=complete_domain_destroy x-|- d0v12 rcu_pending? yes (ret=2): no pending entries but new entries x-|- d0v12 raise_softirq nr 3 x-|- d0v12 rcu_process_callbacks, rdp_curlist: null, rdp_nxtlist: yes x-|- d0v12 do_softirq x-|- d0v12 rcu_pending? yes (ret=5): waiting for CPU to quiesce (rdp_qs_pending=1) x-|- d0v12 raise_softirq nr 3 x-|- d0v12 rcu_process_callbacks, rdp_curlist: yes, rdp_nxtlist: null x-|- d0v12 rcu_check_quiesc_state, rdp_qs_pending: yes x-|- d0v12 rcu_cpu_quiet, rcp_cpumask=0x00004100 x-|- d0v12 rcu_pending? no * the vCPU running on CPU 2, which participates in the grace period, blocks, and CPU 2 goes idle. That means that CPU 2 quiesces (goes through softirq), and we can forget about it. Surely we don't need the timer to fire on it, as the callback is not queued there... |-x- d0v7 vcpu_block d0v7 |-x- d0v7 raise_softirq nr 1 |-x- d0v7 do_softirq |-x- d0v7 rcu_pending? yes (ret=4): waiting for CPU to quiesce (rdp_qs_pending=0) |-x- d0v7 raise_softirq nr 3 |-x- d0v7 softirq_handler nr 1 |-x- d0v7 runstate_change d0v7 running->blocked |-x- d?v? runstate_change d32767v14 runnable->running Now, this is cpufreq_dbs_timer_suspend(): |-x- d32767v14 timer_stop t=do_dbs_timer And this is sched_tick_suspend(), which now calls rcu_idle_timer_start(), gated by the condition suggested above. |-x- d32767v14 rcu_pending? yes (ret=4): waiting for CPU to quiesce (rdp_qs_pending=0) |-x- d32767v14 timer_set t=rcu_idle_timer_handler, expires_in=9787us So, the CPU is actually quiescing, and since it has no callback queues, as we said, we wouldn't have needed the timer. But the condition is false, because, at this stage, rcu_pending() was still true. So, we don't bail early from rcu_idle_timer_start(), and we actually have armed the timer. |-x- d32767v14 rcu_pending? yes (ret=4): waiting for CPU to quiesce (rdp_qs_pending=0) |-x- d32767v14 raise_softirq nr 3 |-x- d32767v14 rcu_process_callbacks, rdp_curlist: null, rdp_nxtlist: null |-x- d32767v14 rcu_check_quiesc_state, rdp_qs_pending: no |-x- d32767v14 rcu_process_callbacks, rdp_curlist: null, rdp_nxtlist: null |-x- d32767v14 rcu_check_quiesc_state, rdp_qs_pending: yes |-x- d32767v14 rcu_cpu_quiet, rcp_cpumask=0x00000100 |-x- d32767v14 pm_idle_start c2 * The timer fires, for no useful reason, and cause a spurious wakeup: |-x- d32767v14 raise_softirq nr 0 |-x- d32767v14 pm_idle_end c2 |-x- d32767v14 irq_enter |-x- d32767v14 irq_direct, vec=0xfa, handler=apic_timer_interrupt |-x- d32767v14 raise_softirq nr 0 |-x- d32767v14 irq_exit, in_irq = 0 |-x- d32767v14 softirq_handler nr 0 |-x- d32767v14 timer_exec t=rcu_idle_timer_handler |-x- d32767v14 timer_reprogr deadline=954.566us |-x- d32767v14 rcu_pending? no |-x- d32767v14 pm_idle_start c3 So, this is why I don't want rcu_pending() in that if. If we don't like this, I can see about moving around a bit the timer starting and stopping helpers (I've a couple of ideas in mind already, but I need to try). Actually, it's entirely possible that it is having rcu_pending(cpu) in rcu_needs_cpu() is, for us, redundant. In fact, although it does make sense in Linux, both code inspection and some investigation I've just done, makes me believe that there won't be cases where a CPU is denied going offline because it sees rcu_pending() returning 1. In fact, when we call rcu_pending(), within cpu_is_haltable(), we have already gone through it before. And if there were pending work, we've raised the softirq and dealt with it. If there weren't, neither there is now. I'm therefore leaning toward removing rcu_pending() from the rcu_needs_cpu() check as well. At that point, we'll indeed have the check inside rcu_start_idle_timer(), be the opposite of the original check in rcu_needs_cpu(). :-) > > @@ -451,6 +500,7 @@ static void rcu_init_percpu_data(int cpu, > > struct rcu_ctrlblk *rcp, > > rdp->qs_pending = 0; > > rdp->cpu = cpu; > > rdp->blimit = blimit; > > + init_timer(&rdp->idle_timer, rcu_idle_timer_handler, (void*) > > rdp, cpu); > > Again, unless it is this bogus way in the Linux original, please drop > the > pointless cast, or at least correct its style. > Ah, no, this one, I can kill it. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.