[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/5] xen: RCU: avoid busy waiting until the end of grace period.

On Mon, 2017-07-31 at 16:58 -0700, Stefano Stabellini wrote:
> On Tue, 1 Aug 2017, Dario Faggioli wrote:
> > On Mon, 2017-07-31 at 14:20 -0700, Stefano Stabellini wrote:
> > > On Thu, 27 Jul 2017, Dario Faggioli wrote:
> > > > 
> > > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> > > > index f0fdc87..4586f2a 100644
> > > > --- a/xen/common/rcupdate.c
> > > > +++ b/xen/common/rcupdate.c
> > > > @@ -84,8 +84,14 @@ struct rcu_data {
> > > >      int cpu;
> > > >      struct rcu_head barrier;
> > > >      long            last_rs_qlen;     /* qlen during the last
> > > > resched */
> > > > +
> > > > +    /* 3) idle CPUs handling */
> > > > +    struct timer idle_timer;
> > > > +    bool idle_timer_active;
> > > >  };
> > > >  
> > > 
> > > Isn't this a bit too short? How is it chosen?
> > > 
> > 
> > What makes you think it's short?
> In terms of power saving and CPU sleep states, 10ms is not much to
> sleep
> for. I wonder if there are any power saving benefits in sleeping for
> only 10ms (especially on x86 where entering and exiting CPU sleep
> states
> takes longer, to be confirmed).
I *think* we should be fine with, say, 100ms. But that's again,
guess/rule of thumb, nothing more than that. And, TBH, I'm not even
sure what a good experiment/benchmark would be, to assess whether a
particular value is good or bad. :-/

>   We might as well do the thing we need
> to do immediately? I guess we cannot do that?
You're guessing correct, we can't. It's indeed a bit tricky, and it
took it a little bit to me as well to figure all of it out properly.

Basically, let's say that, at time t1, on CPU1, someone calls
call_rcu(). The situation on the other CPUs is: CPU0 busy; CPU2 idle
(no timer pending); CPU3 busy.

So, a new grace period starts, and its exact end will be when CPU0,
CPU1 and CPU3 have quiesced once (in Xen, quiescence means: "going
through do_softirq()").

But RCU it's a passive mechanism, i.e., we rely on each CPU coming to
the RCU core logic, and tell <<hey, btw, I've quiesced>>.
Let's say that CPU0 quiesces at time t2 > t1. CPU1 quiesces at time
t3 > t2, and goes fully idle (no timer pending). CPU3 quiesces at time
t4 > t3. Now, at time t4, CPU1 can actually invoke the callbeck, queued
at time t1, from within call_rcu().

This patch series solves two problems, of our current RCU

1) right now, we don't only wait for CPU0, CPU1 and CPU3, we also wait 
   for CPU2. But since CPU2 is fully idle, it just won't bother 
   telling RCU that it has quiesced (well, on x86, that would actually 
   happen at some point, while on ARM, it is really possible that this 
   never happens!). This is solved in patch 3, by introducing the 

2) now (after patch 3) we know that we just can avoid waiting for 
   CPU2. Good. But at time t4, when CPU3 quiesces, marking the end of
   the grace period, how would CPU1 --which is fully idle-- know that
   it can now safely invoke the callback? Again, RCU is passive, so it
   relies on CPU1 to figure that out on its own, next time it wakes
   up, e.g., because of the periodic tick timer. But we don't have a
   periodic tick timer! Well, this again means that, on x86, CPU1 will
   actually figure that out at some (unpredictable) point in future.
   On ARM, not so much. The purpose of the timer in this patch is to
   make sure it always will.
   In fact, with patches 4 and 5 applied, at time t3, we let CPU1 go 
   idle, but we program the timer. It will fire at t3+T (with T=10ms, 
   right now). When this happens, if t3+T > t4, CPU1 invokes the
   callback, and we're done. If not (and CPU1 is still idle) we retry
   in another T.

So, when you say "immediately", *when* do you actually mean?

We can't invoke the callback at t3 (i.e., immediately after CPU1
quiesces), because we need to wait for CPU3 to do the same.

We can't invoke the callback when CPU3 quiesces, at t4 (i.e.,
immediately after the grace period ends) either, because the callback
it's on CPU1, not on CPU3.

Linux's solution is not to stop the tick for CPU1 at time t3. It will
be stopped only after the first firing of the tick itself at time
t > t4 (if CPU1 is still idle, of course). This timer thing is, IMO,
pretty similar. The only difference is that we don't have a tick to not
stop... So I had to make up a very special one. :-D

TL;DR, yes, I also would have loved the world (or even just this
problem) to be simple. It's not! ;-P

Thanks and Regards,
<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.