[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] xen: sched: convert RTDS from time to event driven model



Hi Dario and Tianyang,

I will just comment on the parts Tianyang hasn't replied.

On Tue, Feb 2, 2016 at 10:08 AM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
>
> On Sun, 2016-01-31 at 23:32 -0500, Tianyang Chen wrote:
> > v4 is meant for discussion on the addition of replq.
> >
> In which cases, you should always put something like RFC in the
> subject, so people knows what the intent is even by just skimming their
> inboxes.
>
> > Changes since v3:
> >       removed running queue.
> >       added repl queue to keep track of repl events.
> >       timer is now per scheduler.
> >       timer is init on a valid cpu in a cpupool.
> >
> > Bugs to be fixed: Cpupool and locks. When a pcpu is removed from a
> > pool and added to another, the lock equality assert in free_pdata()
> > fails when Pool-0 is using rtds.
> >
> Known issue. I will fix it for both Credit2 and RTDS, so just ignore
> it.
>
> > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> > index 2e5430f..c36e5de 100644
> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
> > @@ -16,6 +16,7 @@
> >  #include <xen/delay.h>
> >  #include <xen/event.h>
> >  #include <xen/time.h>
> > +#include <xen/timer.h>
> >  #include <xen/perfc.h>
> >  #include <xen/sched-if.h>
> >  #include <xen/softirq.h>
> > @@ -87,7 +88,7 @@
> >  #define RTDS_DEFAULT_BUDGET     (MICROSECS(4000))
> >
> >  #define UPDATE_LIMIT_SHIFT      10
> > -#define MAX_SCHEDULE            (MILLISECS(1))
> > +
> >
> Unrelated to the topic. Can we have a dedicated patch that adds a
> comment above UPDATE_LIMIT_SHIFT, explainig what it is and how it's
> used.
>
> This is something low priority, of course.
>
> > @@ -160,6 +166,7 @@ struct rt_private {
> >   */
> >  struct rt_vcpu {
> >      struct list_head q_elem;    /* on the runq/depletedq list */
> > +    struct list_head p_elem;    /* on the repl event list */
> >
> Mmm... 'p_elem' because the previous one was 'q_elem', and 'p' follows
> 'q', I guess. It feels a bit obscure, though, and I'd prefer a more
> 'talking' name.
>
> For now, and at this level, I guess I can live with it. But in other
> places, things need to be improved (see below).
>
> > @@ -213,8 +220,14 @@ static inline struct list_head
> > *rt_depletedq(const struct scheduler *ops)
> >      return &rt_priv(ops)->depletedq;
> >  }
> >
> > +static inline struct list_head *rt_replq(const struct scheduler
> > *ops)
> > +{
> > +    return &rt_priv(ops)->replq;
> > +}
> > +
> >  /*
> > - * Queue helper functions for runq and depletedq
> > + * Queue helper functions for runq, depletedq
> > + * and repl event q
> >   */
> >
> So, in comments, can we use full words as much as possible. It makes
> them a lot easier to read (so, e.g., "replenishment events queue" or
> "queue of the replenishment events").
>
> I know this very file's style is not like that, from this point of
> view... and, in fact, this is something I would like to change, when we
> get the chance (i.e., when touching the code for other reasons, like in
> this case).
>
> > @@ -228,6 +241,18 @@ __q_elem(struct list_head *elem)
> >      return list_entry(elem, struct rt_vcpu, q_elem);
> >  }
> >
> > +static struct rt_vcpu *
> > +__p_elem(struct list_head *elem)
> > +{
> > +    return list_entry(elem, struct rt_vcpu, p_elem);
> > +}
> > +
> So, while this may well still be fine...
>
> > +static int
> > +__vcpu_on_p(const struct rt_vcpu *svc)
> > +{
> > +   return !list_empty(&svc->p_elem);
> > +}
> > +
> >
> ...here I really would like to go for something that will make it much
> more obvious, just by giving a look at the code, where we are actually
> checking the vcpu to be, without having to remember that p is the
> replenishment queue.
>
> So, I don't know, maybe something like vcpu_on_replq(), or
> vcpu_needs_replenish() ?
>
> > @@ -387,6 +412,13 @@ __q_remove(struct rt_vcpu *svc)
> >          list_del_init(&svc->q_elem);
> >  }
> >
> > +static inline void
> > +__p_remove(struct rt_vcpu *svc)
> > +{
> > +    if ( __vcpu_on_p(svc) )
> > +        list_del_init(&svc->p_elem);
> > +}
> > +
> replq_remove(), or vcpu_repl_cancel() (or something else) ?
>
> > @@ -421,6 +453,32 @@ __runq_insert(const struct scheduler *ops,
> > struct rt_vcpu *svc)
> >  }
> >
> >  /*
> > + * Insert svc into the repl even list:
> > + * vcpus that needs to be repl earlier go first.
> > + */
> > +static void
> > +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> > +{
> >
> This is exactly what I've been trying to say above: __replq_insert() is
> ok, much better than __q_insert()! Do the same everywhere else. :-)
>
> > +    struct rt_private *prv = rt_priv(ops);
> > +    struct list_head *replq = rt_replq(ops);
> > +    struct list_head *iter;
> > +
> > +    ASSERT( spin_is_locked(&prv->lock) );
> > +
> Is this really useful? Considering where this is called, I don't think
> it is.
>
> > +    ASSERT( !__vcpu_on_p(svc) );
> > +
> > +    list_for_each(iter, replq)
> > +    {
> > +        struct rt_vcpu * iter_svc = __p_elem(iter);
> > +        if ( svc->cur_deadline <= iter_svc->cur_deadline )
> > +                break;
> > +    }
> > +
> > +    list_add_tail(&svc->p_elem, iter);
> > +}
> This looks ok. The list_for_each()+list_ad_tail() part would be
> basically identical to what we have inside __runq_insert(), thgough,
> wouldn't it?
>
> It may make sense to define an _ordered_queue_insert() (or
> _deadline_queue_insert(), since it's alwas the deadline that is
> compared) utility function that we can then call in both cases.


I will prefer the _deadline_queue_insert() because it is based on the
EDF scheduling policy.


>
>
> > @@ -449,6 +507,7 @@ rt_init(struct scheduler *ops)
> >      INIT_LIST_HEAD(&prv->sdom);
> >      INIT_LIST_HEAD(&prv->runq);
> >      INIT_LIST_HEAD(&prv->depletedq);
> > +    INIT_LIST_HEAD(&prv->replq);
> >
> Is there a reason why you don't do at least the allocation of the timer
> here? Because, to me, this looks the best place for that.
>
> >      cpumask_clear(&prv->tickled);
> >
> > @@ -473,6 +532,9 @@ rt_deinit(const struct scheduler *ops)
> >          xfree(_cpumask_scratch);
> >          _cpumask_scratch = NULL;
> >      }
> > +
> > +    kill_timer(prv->repl_timer);
> > +
> >      xfree(prv);
> >
> And here, you're freeing prv, without having freed prv->timer, which is
> therefore leaked.
>
> > @@ -632,6 +699,17 @@ rt_vcpu_insert(const struct scheduler *ops,
> > struct vcpu *vc)
> >      vcpu_schedule_unlock_irq(lock, vc);
> >
> >      SCHED_STAT_CRANK(vcpu_insert);
> > +
> > +    if( prv->repl_timer == NULL )
> > +    {
> > +        /* vc->processor has been set in schedule.c */
> > +        cpu = vc->processor;
> > +
> > +        repl_timer = xzalloc(struct timer);
> > +
> > +        prv->repl_timer = repl_timer;
> > +        init_timer(repl_timer, repl_handler, (void *)ops, cpu);
> > +    }
> >  }
> >
> This belong to rt_init, IMO.
>
> > @@ -841,7 +881,6 @@ rt_schedule(const struct scheduler *ops, s_time_t
> > now, bool_t tasklet_work_sched
> >      /* burn_budget would return for IDLE VCPU */
> >      burn_budget(ops, scurr, now);
> >
> > -    __repl_update(ops, now);
> >
> __repl_update() going away is of course fine. But we need to enact the
> budget enforcement logic somewhere in here, don't we?
>
> > @@ -924,6 +967,10 @@ rt_vcpu_sleep(const struct scheduler *ops,
> > struct vcpu *vc)
> >          __q_remove(svc);
> >      else if ( svc->flags & RTDS_delayed_runq_add )
> >          clear_bit(__RTDS_delayed_runq_add, &svc->flags);
> > +
> > +    /* stop tracking the repl time of this vcpu */
> > +   /* if( __vcpu_on_p(svc) )
> > +       __p_remove(svc); */
> >  }
> >
> Is it ok to kill the replenishment in this case?
>
> This is a genuine question. What does the dynamic DS algorithm that
> you're trying to implement here says about this? Meng, maybe you can
> help here?


Based on the DS algorithm (in theory), each VCPU should have its
budget replenished at the beginning of its next period.

However, we are thinking that when a VCPU is put to sleep, no one is
supposed to use it. Do we really need to keep updating the VCPU's
budget? Can we just update the VCPU's budget when it is waken up
later? This could potentially save some implementation overhead, IMHO.
That's why we decided not to update the budget of VCPUs that are put into sleep.


>
>
> Is it ok to do this _because_ you then handle the situation of a
> replenishment having to had happened while the vcpu was asleep in
> rt_vcpu_wake (the '(now>=svc->cur_deadline)' thing)? Yes... it probably
> is ok for that reason...


Yes, exactly. We hope this could reduce the frequency of invoking the
replenishment timer when the system is (kind of) idle.


>
>
> > @@ -1027,23 +1074,21 @@ rt_vcpu_wake(const struct scheduler *ops,
> > struct vcpu *vc)
> >      struct rt_vcpu * const svc = rt_vcpu(vc);
> >      s_time_t now = NOW();
> >      struct rt_private *prv = rt_priv(ops);
> > -    struct rt_vcpu *snext = NULL; /* highest priority on RunQ */
> > -    struct rt_dom *sdom = NULL;
> > -    cpumask_t *online;
> > +    struct timer *repl_timer = prv->repl_timer;
> >
> >      BUG_ON( is_idle_vcpu(vc) );
> >
> >      if ( unlikely(curr_on_cpu(vc->processor) == vc) )
> >      {
> >          SCHED_STAT_CRANK(vcpu_wake_running);
> > -        return;
> > +        goto out;
> >      }
> >
> >      /* on RunQ/DepletedQ, just update info is ok */
> >      if ( unlikely(__vcpu_on_q(svc)) )
> >      {
> >          SCHED_STAT_CRANK(vcpu_wake_onrunq);
> > -        return;
> > +        goto out;
> >      }
> >
> >      if ( likely(vcpu_runnable(vc)) )
> >
> Mmm.. no. At least the first two cases, they should really just be
> 'return'-s.
>
> As you say yourself in the comment further down, right below the 'out:'
> label "a newly waken-up vcpu could xxx". If this is running on a pCPU,
> or it is in a runqueue already, it is not a newly woken-up vcpu.
>
> In fact, we need to check for this cases, but let's at least made them
> act like NOPs, as all other schedulers do and as it's correct.
>
> > @@ -1058,25 +1103,39 @@ rt_vcpu_wake(const struct scheduler *ops,
> > struct vcpu *vc)
> >      if ( unlikely(svc->flags & RTDS_scheduled) )
> >      {
> >          set_bit(__RTDS_delayed_runq_add, &svc->flags);
> > -        return;
> > +        goto out;
> >      }
> >
> In this case, I'm less sure. I think this should just return as well,
> but it may depend on how other stuff are done (like budget
> enforcement), so it's hard to think at it right now, but consider this
> when working on next version.
>
> > +    /* budget repl here is needed before inserting back to runq. If
> > so,
> > +     * it should be taken out of replq and put back. This can
> > potentially
> > +     * cause the timer handler to replenish no vcpu when it triggers
> > if
> > +     * the replenishment is done here already.
> > +     */
> >
> Coding style for long comments wants them to have both "wings" (you're
> missing the opening one, '/*').


BTW, each line should have less than 80 characters.


>
>
> >      if ( now >= svc->cur_deadline)
> > +    {
> >          rt_update_deadline(now, svc);
> > +        if( __vcpu_on_p(svc) )
> > +            __p_remove(svc);
> > +    }
> >
> Well, then maybe the __p_remove() function can be made smart enough to:
>  - check whether the one being removed is the first element of the
>    queue;
>  - if it is, disarm the timer
>  - if there still are elements in the queue, rearm the timer to fire
> at
>    the new first's deadline
>
> This will be useful at least in another case (rt_vcpu_sleep()).
>
> >      /* insert svc to runq/depletedq because svc is not in queue now
> > */
> >      __runq_insert(ops, svc);
> >
> > -    __repl_update(ops, now);
> > -
> > -    ASSERT(!list_empty(&prv->sdom));
> > -    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> > -    online = cpupool_domain_cpumask(sdom->dom);
> > -    snext = __runq_pick(ops, online); /* pick snext from ALL valid
> > cpus */
> > -
> > -    runq_tickle(ops, snext);
> > +    runq_tickle(ops, svc);
> >
> > -    return;
> > +out:
> > +    /* a newly waken-up vcpu could have an earlier release time
> > +     * or it could be the first to program the timer
> > +     */
> > +    if( repl_timer->expires == 0 || repl_timer->expires > svc-
> > >cur_deadline )
> > +        set_timer(repl_timer,svc->cur_deadline);
> > +
> And again, can't this logic be part of __rqplq_insert()? I.e.:
>  - do the insertion
>  - if the inserted element is the new head of the queue, reprogram the
>    timer
>
> > +    /* start tracking the repl time of this vcpu here
> > +    * it stays on the replq unless it goes to sleep
> > +    * or marked as not-runnable
> > +    */
> > +    if( !__vcpu_on_p(svc) )
> > +       __replq_insert(ops, svc);
> >  }
>
> > @@ -1168,6 +1216,80 @@ rt_dom_cntl(
> >      return rc;
> >  }
> >
> > +/* The replenishment timer handler picks vcpus
> > + * from the replq and does the actual replenishment
> > + * the replq only keeps track of runnable vcpus
> > + */
> > +static void repl_handler(void *data){
> > +    unsigned long flags;
> > +    s_time_t now = NOW();
> > +    s_time_t t_next = LONG_MAX; /* the next time timer should be
> > triggered */
> > +    struct scheduler *ops = data;
> > +    struct rt_private *prv = rt_priv(ops);
> > +    struct list_head *replq = rt_replq(ops);
> > +    struct timer *repl_timer = prv->repl_timer;
> > +    struct list_head *iter, *tmp;
> > +    struct rt_vcpu *svc = NULL;
> > +
> > +    stop_timer(repl_timer);
> > +
> > +    spin_lock_irqsave(&prv->lock, flags);
> > +
> > +    list_for_each_safe(iter, tmp, replq)
> > +    {
> > +        svc = __p_elem(iter);
> > +
> > +        if ( now >= svc->cur_deadline )
> > +        {
> > +            rt_update_deadline(now, svc);
> > +
> > +            if( t_next > svc->cur_deadline )
> > +                t_next = svc->cur_deadline;
> > +
> Why do you need to do this? The next time at which the timer needs to
> be reprogrammed is, after you'll have done all the replenishments that
> needed to be done, here in this loop, the deadline of the new head of
> the replenishment queue (if there still are elements in there).
>
> > +            /* when the replenishment happens
> > +             * svc is either on a pcpu or on
> > +             * runq/depletedq
> > +             */
> > +            if( __vcpu_on_q(svc) )
> > +            {
> > +                /* put back to runq */
> > +                __q_remove(svc);
> > +                __runq_insert(ops, svc);
> > +                runq_tickle(ops, svc);
> > +            }
> > +
> Aha! So, it looks the budget enforcement logic ended up here? Mmm...
> no, I think that doing it in rt_schedule(), as we agreed, would make
> things simpler and better looking, both here and there.
>
> This is the replenishment timer handler, and it really should do one
> thins: handle replenishment events. That's it! ;-P
>
> > +            /* resort replq, keep track of this
> > +             * vcpu if it's still runnable. If
> > +             * at this point no vcpu are, then
> > +             * the timer waits for wake() to
> > +             * program it.
> > +             */
> > +            __p_remove(svc);
> > +
> > +            if( vcpu_runnable(svc->vcpu) )
> > +                __replq_insert(ops, svc);
> > +        }
> > +
> > +        else
> > +        {
> > +            /* Break out of the loop if a vcpu's
> > +             * cur_deadline is bigger than now.
> > +             * Also when some replenishment is done
> > +             * in wake(), the code could end up here
> > +             * if the time fires earlier than it should
> > +            */
> > +            if( t_next > svc->cur_deadline )
> > +                t_next = svc->cur_deadline;
> > +            break;
> > +        }
> > +    }
> > +
> > +    set_timer(repl_timer, t_next);
> > +
> Well, what if there are no more replenishments to be performed? You're
> still reprogramming the timer, either at t_next or at cur_deadline,
> which looks arbitrary.
>
> If there are no replenishments, let's just keep the timer off.
>
> So, all in all, this is a very very good first attempt at implementing
> what we agreed upon in previous email... Good work, can't wait to see
> v5! :-D
>
> Let me know if there is anything unclear with any of the comments.


Thanks and Best Regards,

Meng



-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.