[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] xen: sched: convert RTDS from time to event driven model



On Sun, 2016-01-31 at 23:32 -0500, Tianyang Chen wrote:
> v4 is meant for discussion on the addition of replq.
> 
In which cases, you should always put something like RFC in the
subject, so people knows what the intent is even by just skimming their
inboxes.

> Changes since v3:
>       removed running queue.
>       added repl queue to keep track of repl events.
>       timer is now per scheduler.
>       timer is init on a valid cpu in a cpupool.
> 
> Bugs to be fixed: Cpupool and locks. When a pcpu is removed from a
> pool and added to another, the lock equality assert in free_pdata()
> fails when Pool-0 is using rtds.
> 
Known issue. I will fix it for both Credit2 and RTDS, so just ignore
it.

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 2e5430f..c36e5de 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -16,6 +16,7 @@
> Â#include <xen/delay.h>
> Â#include <xen/event.h>
> Â#include <xen/time.h>
> +#include <xen/timer.h>
> Â#include <xen/perfc.h>
> Â#include <xen/sched-if.h>
> Â#include <xen/softirq.h>
> @@ -87,7 +88,7 @@
> Â#define RTDS_DEFAULT_BUDGETÂÂÂÂÂ(MICROSECS(4000))
> Â
> Â#define UPDATE_LIMIT_SHIFTÂÂÂÂÂÂ10
> -#define MAX_SCHEDULEÂÂÂÂÂÂÂÂÂÂÂÂ(MILLISECS(1))
> +
>
Unrelated to the topic. Can we have a dedicated patch that adds a
comment aboveÂUPDATE_LIMIT_SHIFT, explainig what it is and how it's
used.

This is something low priority, of course.

> @@ -160,6 +166,7 @@ struct rt_private {
> Â */
> Âstruct rt_vcpu {
> ÂÂÂÂÂstruct list_head q_elem;ÂÂÂÂ/* on the runq/depletedq list */
> +ÂÂÂÂstruct list_head p_elem;ÂÂÂÂ/* on the repl event list */
> Â
Mmm... 'p_elem' because the previous one was 'q_elem', and 'p' follows
'q', I guess. It feels a bit obscure, though, and I'd prefer a more
'talking' name.

For now, and at this level, I guess I can live with it. But in other
places, things need to be improved (see below).

> @@ -213,8 +220,14 @@ static inline struct list_head
> *rt_depletedq(const struct scheduler *ops)
> ÂÂÂÂÂreturn &rt_priv(ops)->depletedq;
> Â}
> Â
> +static inline struct list_head *rt_replq(const struct scheduler
> *ops)
> +{
> +ÂÂÂÂreturn &rt_priv(ops)->replq;
> +}
> +
> Â/*
> - * Queue helper functions for runq and depletedq
> + * Queue helper functions for runq, depletedq
> + * and repl event q
> Â */
>
So, in comments, can we use full words as much as possible. It makes
them a lot easier to read (so, e.g., "replenishment events queue" or
"queue of the replenishment events").

I know this very file's style is not like that, from this point of
view... and, in fact, this is something I would like to change, when we
get the chance (i.e., when touching the code for other reasons, like in
this case).

> @@ -228,6 +241,18 @@ __q_elem(struct list_head *elem)
> ÂÂÂÂÂreturn list_entry(elem, struct rt_vcpu, q_elem);
> Â}
> Â
> +static struct rt_vcpu *
> +__p_elem(struct list_head *elem)
> +{
> +ÂÂÂÂreturn list_entry(elem, struct rt_vcpu, p_elem);
> +}
> +
So, while this may well still be fine...

> +static int
> +__vcpu_on_p(const struct rt_vcpu *svc)
> +{
> +ÂÂÂreturn !list_empty(&svc->p_elem);
> +}
> +
>
...here I really would like to go for something that will make it much
more obvious, just by giving a look at the code, where we are actually
checking the vcpu to be, without having to remember that p is the
replenishment queue.

So, I don't know, maybe something like vcpu_on_replq(), or
vcpu_needs_replenish() ?

> @@ -387,6 +412,13 @@ __q_remove(struct rt_vcpu *svc)
> ÂÂÂÂÂÂÂÂÂlist_del_init(&svc->q_elem);
> Â}
> Â
> +static inline void
> +__p_remove(struct rt_vcpu *svc)
> +{
> +ÂÂÂÂif ( __vcpu_on_p(svc) )
> +ÂÂÂÂÂÂÂÂlist_del_init(&svc->p_elem);
> +}
> +
replq_remove(), or vcpu_repl_cancel() (or something else) ?

> @@ -421,6 +453,32 @@ __runq_insert(const struct scheduler *ops,
> struct rt_vcpu *svc)
> Â}
> Â
> Â/*
> + * Insert svc into the repl even list:
> + * vcpus that needs to be repl earlier go first.
> + */
> +static void
> +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
>
This is exactly what I've been trying to say above: __replq_insert() is
ok, much better than __q_insert()! Do the same everywhere else. :-)

> +ÂÂÂÂstruct rt_private *prv = rt_priv(ops);
> +ÂÂÂÂstruct list_head *replq = rt_replq(ops);
> +ÂÂÂÂstruct list_head *iter;
> +
> +ÂÂÂÂASSERT( spin_is_locked(&prv->lock) );
> +
Is this really useful? Considering where this is called, I don't think
it is.

> +ÂÂÂÂASSERT( !__vcpu_on_p(svc) );
> +
> +ÂÂÂÂlist_for_each(iter, replq)
> +ÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂstruct rt_vcpu * iter_svc = __p_elem(iter);
> +ÂÂÂÂÂÂÂÂif ( svc->cur_deadline <= iter_svc->cur_deadline )
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂ}
> +
> +ÂÂÂÂlist_add_tail(&svc->p_elem, iter);
> +}
This looks ok. The list_for_each()+list_ad_tail() part would be
basically identical to what we have inside __runq_insert(), thgough,
wouldn't it?

It may make sense to define an _ordered_queue_insert() (or
_deadline_queue_insert(), since it's alwas the deadline that is
compared) utility function that we can then call in both cases.

> @@ -449,6 +507,7 @@ rt_init(struct scheduler *ops)
> ÂÂÂÂÂINIT_LIST_HEAD(&prv->sdom);
> ÂÂÂÂÂINIT_LIST_HEAD(&prv->runq);
> ÂÂÂÂÂINIT_LIST_HEAD(&prv->depletedq);
> +ÂÂÂÂINIT_LIST_HEAD(&prv->replq);
> Â
Is there a reason why you don't do at least the allocation of the timer
here? Because, to me, this looks the best place for that.

> ÂÂÂÂÂcpumask_clear(&prv->tickled);
> Â
> @@ -473,6 +532,9 @@ rt_deinit(const struct scheduler *ops)
> ÂÂÂÂÂÂÂÂÂxfree(_cpumask_scratch);
> ÂÂÂÂÂÂÂÂÂ_cpumask_scratch = NULL;
> ÂÂÂÂÂ}
> +
> +ÂÂÂÂkill_timer(prv->repl_timer);
> +
> ÂÂÂÂÂxfree(prv);
>
And here, you're freeing prv, without having freed prv->timer, which is
therefore leaked.
Â
> @@ -632,6 +699,17 @@ rt_vcpu_insert(const struct scheduler *ops,
> struct vcpu *vc)
> ÂÂÂÂÂvcpu_schedule_unlock_irq(lock, vc);
> Â
> ÂÂÂÂÂSCHED_STAT_CRANK(vcpu_insert);
> +
> +ÂÂÂÂif( prv->repl_timer == NULL )
> +ÂÂÂÂ{ÂÂÂ
> +ÂÂÂÂÂÂÂÂ/* vc->processor has been set in schedule.c */
> +ÂÂÂÂÂÂÂÂcpu = vc->processor;
> +
> +ÂÂÂÂÂÂÂÂrepl_timer = xzalloc(struct timer);
> +
> +ÂÂÂÂÂÂÂÂprv->repl_timer = repl_timer;
> +ÂÂÂÂÂÂÂÂinit_timer(repl_timer, repl_handler, (void *)ops, cpu);
> +ÂÂÂÂ}
> Â}
> Â
This belong to rt_init, IMO.

> @@ -841,7 +881,6 @@ rt_schedule(const struct scheduler *ops, s_time_t
> now, bool_t tasklet_work_sched
> ÂÂÂÂÂ/* burn_budget would return for IDLE VCPU */
> ÂÂÂÂÂburn_budget(ops, scurr, now);
> Â
> -ÂÂÂÂ__repl_update(ops, now);
> Â
__repl_update() going away is of course fine. But we need to enact the
budget enforcement logic somewhere in here, don't we?

> @@ -924,6 +967,10 @@ rt_vcpu_sleep(const struct scheduler *ops,
> struct vcpu *vc)
> ÂÂÂÂÂÂÂÂÂ__q_remove(svc);
> ÂÂÂÂÂelse if ( svc->flags & RTDS_delayed_runq_add )
> ÂÂÂÂÂÂÂÂÂclear_bit(__RTDS_delayed_runq_add, &svc->flags);
> +
> +ÂÂÂÂ/* stop tracking the repl time of this vcpu */
> +ÂÂÂ/* if( __vcpu_on_p(svc) )
> +ÂÂÂÂÂÂÂ__p_remove(svc); */
> Â}
> Â
Is it ok to kill the replenishment in this case?

This is a genuine question. What does the dynamic DS algorithm that
you're trying to implement here says about this? Meng, maybe you can
help here?

Is it ok to do this _because_ you then handle the situation of a
replenishment having to had happened while the vcpu was asleep in
rt_vcpu_wake (the '(now>=svc->cur_deadline)' thing)? Yes... it probably
is ok for that reason...

> @@ -1027,23 +1074,21 @@ rt_vcpu_wake(const struct scheduler *ops,
> struct vcpu *vc)
> ÂÂÂÂÂstruct rt_vcpu * const svc = rt_vcpu(vc);
> ÂÂÂÂÂs_time_t now = NOW();
> ÂÂÂÂÂstruct rt_private *prv = rt_priv(ops);
> -ÂÂÂÂstruct rt_vcpu *snext = NULL; /* highest priority on RunQ */
> -ÂÂÂÂstruct rt_dom *sdom = NULL;
> -ÂÂÂÂcpumask_t *online;
> +ÂÂÂÂstruct timer *repl_timer = prv->repl_timer;
> Â
> ÂÂÂÂÂBUG_ON( is_idle_vcpu(vc) );
> Â
> ÂÂÂÂÂif ( unlikely(curr_on_cpu(vc->processor) == vc) )
> ÂÂÂÂÂ{
> ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_running);
> -ÂÂÂÂÂÂÂÂreturn;
> +ÂÂÂÂÂÂÂÂgoto out;
> ÂÂÂÂÂ}
> Â
> ÂÂÂÂÂ/* on RunQ/DepletedQ, just update info is ok */
> ÂÂÂÂÂif ( unlikely(__vcpu_on_q(svc)) )
> ÂÂÂÂÂ{
> ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_onrunq);
> -ÂÂÂÂÂÂÂÂreturn;
> +ÂÂÂÂÂÂÂÂgoto out;
> ÂÂÂÂÂ}
> Â
> ÂÂÂÂÂif ( likely(vcpu_runnable(vc)) )
>
Mmm.. no. At least the first two cases, they should really just be
'return'-s.

As you say yourself in the comment further down, right below the 'out:'
label "a newly waken-up vcpu could xxx". If this is running on a pCPU,
or it is in a runqueue already, it is not a newly woken-up vcpu.

In fact, we need to check for this cases, but let's at least made them
act like NOPs, as all other schedulers do and as it's correct.

> @@ -1058,25 +1103,39 @@ rt_vcpu_wake(const struct scheduler *ops,
> struct vcpu *vc)
> ÂÂÂÂÂif ( unlikely(svc->flags & RTDS_scheduled) )
> ÂÂÂÂÂ{
> ÂÂÂÂÂÂÂÂÂset_bit(__RTDS_delayed_runq_add, &svc->flags);
> -ÂÂÂÂÂÂÂÂreturn;
> +ÂÂÂÂÂÂÂÂgoto out;
> ÂÂÂÂÂ}
>
In this case, I'm less sure. I think this should just return as well,
but it may depend on how other stuff are done (like budget
enforcement), so it's hard to think at it right now, but consider this
when working on next version.

> +ÂÂÂÂ/* budget repl here is needed before inserting back to runq. If
> so,
> +ÂÂÂÂÂ* it should be taken out of replq and put back. This can
> potentially
> +ÂÂÂÂÂ* cause the timer handler to replenish no vcpu when it triggers
> if
> +ÂÂÂÂÂ* the replenishment is done here already.
> +ÂÂÂÂÂ*/
>
Coding style for long comments wants them to have both "wings" (you're
missing the opening one, '/*').

> ÂÂÂÂÂif ( now >= svc->cur_deadline)
> +ÂÂÂÂ{ÂÂÂ
> ÂÂÂÂÂÂÂÂÂrt_update_deadline(now, svc);
> +ÂÂÂÂÂÂÂÂif( __vcpu_on_p(svc) )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ__p_remove(svc);
> +ÂÂÂÂ}
> Â
Well, then maybe the __p_remove() function can be made smart enough to:
Â- check whether the one being removed is the first element of the
 Âqueue;
Â- if it is, disarm the timer
Â- if there still are elements in the queue, rearm the timer to fire
atÂ
 Âthe new first's deadline

This will be useful at least in another case (rt_vcpu_sleep()).

> ÂÂÂÂÂ/* insert svc to runq/depletedq because svc is not in queue now
> */
> ÂÂÂÂÂ__runq_insert(ops, svc);
> Â
> -ÂÂÂÂ__repl_update(ops, now);
> -
> -ÂÂÂÂASSERT(!list_empty(&prv->sdom));
> -ÂÂÂÂsdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> -ÂÂÂÂonline = cpupool_domain_cpumask(sdom->dom);
> -ÂÂÂÂsnext = __runq_pick(ops, online); /* pick snext from ALL valid
> cpus */
> -
> -ÂÂÂÂrunq_tickle(ops, snext);
> +ÂÂÂÂrunq_tickle(ops, svc);
> Â
> -ÂÂÂÂreturn;
> +out:
> +ÂÂÂÂ/* a newly waken-up vcpu could have an earlier release timeÂ
> +ÂÂÂÂÂ* or it could be the first to program the timer
> +ÂÂÂÂÂ*/
> +ÂÂÂÂif( repl_timer->expires == 0 || repl_timer->expires > svc-
> >cur_deadline )
> +ÂÂÂÂÂÂÂÂset_timer(repl_timer,svc->cur_deadline);
> +
And again, can't this logic be part of __rqplq_insert()? I.e.:
Â- do the insertion
Â- if the inserted element is the new head of the queue, reprogram theÂ
 Âtimer

> +ÂÂÂÂ/* start tracking the repl time of this vcpu hereÂ
> +ÂÂÂÂ* it stays on the replq unless it goes to sleep
> +ÂÂÂÂ* or marked as not-runnable
> +ÂÂÂÂ*/
> +ÂÂÂÂif( !__vcpu_on_p(svc) )
> +ÂÂÂÂÂÂÂ__replq_insert(ops, svc);
> Â}

> @@ -1168,6 +1216,80 @@ rt_dom_cntl(
> ÂÂÂÂÂreturn rc;
> Â}
> Â
> +/* The replenishment timer handler picks vcpus
> + * from the replq and does the actual replenishment
> + * the replq only keeps track of runnable vcpus
> + */
> +static void repl_handler(void *data){
> +ÂÂÂÂunsigned long flags;
> +ÂÂÂÂs_time_t now = NOW();
> +ÂÂÂÂs_time_t t_next = LONG_MAX; /* the next time timer should be
> triggered */
> +ÂÂÂÂstruct scheduler *ops = data;Â
> +ÂÂÂÂstruct rt_private *prv = rt_priv(ops);
> +ÂÂÂÂstruct list_head *replq = rt_replq(ops);
> +ÂÂÂÂstruct timer *repl_timer = prv->repl_timer;
> +ÂÂÂÂstruct list_head *iter, *tmp;
> +ÂÂÂÂstruct rt_vcpu *svc = NULL;
> +
> +ÂÂÂÂstop_timer(repl_timer);
> +
> +ÂÂÂÂspin_lock_irqsave(&prv->lock, flags);
>
> +ÂÂÂÂlist_for_each_safe(iter, tmp, replq)
> +ÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂsvc = __p_elem(iter);
> +
> +ÂÂÂÂÂÂÂÂif ( now >= svc->cur_deadline )
> +ÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂrt_update_deadline(now, svc);
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂif( t_next > svc->cur_deadline )
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂt_next = svc->cur_deadline;
> +ÂÂÂÂÂÂÂÂÂÂÂÂ
Why do you need to do this? The next time at which the timer needs to
be reprogrammed is, after you'll have done all the replenishments that
needed to be done, here in this loop, the deadline of the new head of
the replenishment queue (if there still are elements in there).

> +ÂÂÂÂÂÂÂÂÂÂÂÂ/* when the replenishment happens
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* svc is either on a pcpu or on
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* runq/depletedq
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ*/
> +ÂÂÂÂÂÂÂÂÂÂÂÂif( __vcpu_on_q(svc) )
> +ÂÂÂÂÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/* put back to runq */
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__q_remove(svc);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__runq_insert(ops, svc);
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrunq_tickle(ops, svc);
> +ÂÂÂÂÂÂÂÂÂÂÂÂ}
> +
Aha! So, it looks the budget enforcement logic ended up here? Mmm...
no, I think that doing it in rt_schedule(), as we agreed, would make
things simpler and better looking, both here and there.

This is the replenishment timer handler, and it really should do one
thins: handle replenishment events. That's it! ;-P

> +ÂÂÂÂÂÂÂÂÂÂÂÂ/* resort replq, keep track of this
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* vcpu if it's still runnable. IfÂ
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* at this point no vcpu are, then
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* the timer waits for wake() to
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* program it.
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ*/
> +ÂÂÂÂÂÂÂÂÂÂÂÂ__p_remove(svc);
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂif( vcpu_runnable(svc->vcpu) )
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__replq_insert(ops, svc);
> +ÂÂÂÂÂÂÂÂ}
> +
> +ÂÂÂÂÂÂÂÂelse
> +ÂÂÂÂÂÂÂÂ{
> +ÂÂÂÂÂÂÂÂÂÂÂÂ/* Break out of the loop if a vcpu's
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* cur_deadline is bigger than now.
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* Also when some replenishment is done
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* in wake(), the code could end up here
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ* if the time fires earlier than it should
> +ÂÂÂÂÂÂÂÂÂÂÂÂ*/
> +ÂÂÂÂÂÂÂÂÂÂÂÂif( t_next > svc->cur_deadline )
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂt_next = svc->cur_deadline;
> +ÂÂÂÂÂÂÂÂÂÂÂÂbreak;
> +ÂÂÂÂÂÂÂÂ}
> +ÂÂÂÂ}
> +
> +ÂÂÂÂset_timer(repl_timer, t_next);
> +
Well, what if there are no more replenishments to be performed? You're
still reprogramming the timer, either at t_next or at cur_deadline,
which looks arbitrary.

If there are no replenishments, let's just keep the timer off.

So, all in all, this is a very very good first attempt at implementing
what we agreed upon in previous email... Good work, can't wait to see
v5! :-D

Let me know if there is anything unclear with any of the comments.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.