|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V2 1/1] Improved RTDS scheduler
On Tue, Jan 19, 2016 at 11:35 AM, Tianyang Chen <tiche@xxxxxxxxxxxxxx> wrote:
>
>
>
> On 1/15/2016 10:33 AM, Meng Xu wrote:
>>
>> On Wed, Jan 6, 2016 at 2:57 AM, Tianyang Chen <tiche@xxxxxxxxxxxxxx> wrote:
>>>
>>>
>>>
>>>
>>> On 12/31/2015 8:44 AM, Meng Xu wrote:
>>>>
>>>>
>>>> On Thu, Dec 31, 2015 at 6:20 PM, Tianyang Chen <tiche@xxxxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Budget replenishment is now handled by a dedicated timer which is
>>>>> triggered at the most imminent release time of all runnable vcpus.
>>>>>
>>>>> Changes since V1:
>>>>> None
>>>>>
>>>>> Signed-off-by: Tianyang Chen <tiche@xxxxxxxxxxxxxx>
>>>>> Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx>
>>>>> Signed-off-by: Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>
>>>>> ---
>>>>> xen/common/sched_rt.c | 159
>>>>> +++++++++++++++++++++++++++++--------------------
>>>>> 1 file changed, 95 insertions(+), 64 deletions(-)
>>>>>
>>>>> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
>>>>> index 4372486..d522272 100644
>>>>> --- a/xen/common/sched_rt.c
>>>>> +++ b/xen/common/sched_rt.c
>>>>> @@ -16,6 +16,7 @@
>>>>> #include <xen/delay.h>
>>>>> #include <xen/event.h>
>>>>> #include <xen/time.h>
>>>>> +#include <xen/timer.h>
>>>>> #include <xen/perfc.h>
>>>>> #include <xen/sched-if.h>
>>>>> #include <xen/softirq.h>
>>>>> @@ -147,6 +148,16 @@ static unsigned int nr_rt_ops;
>>>>> * Global lock is referenced by schedule_data.schedule_lock from all
>>>>> * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
>>>>> */
>>>>> +
>>>>> +/* dedicated timer for replenishment */
>>>>> +static struct timer repl_timer;
>>>>> +
>>>>> +/* controls when to first start the timer*/
>>>>
>>>>
>>>>
>>>>
>>>> missing a space between * and / at the end of this line.
>>>>
>>>>
>>>>>
>>>>> +static int timer_started;
>>>>> +
>>>>> +/* handler for the replenishment timer */
>>>>> +static void repl_handler(void *data);
>>>>> +
>>>>> struct rt_private {
>>>>> spinlock_t lock; /* the global coarse grand lock */
>>>>> struct list_head sdom; /* list of availalbe domains, used for
>>>>> dump */
>>>>> @@ -426,6 +437,7 @@ __runq_insert(const struct scheduler *ops, struct
>>>>> rt_vcpu *svc)
>>>>> static int
>>>>> rt_init(struct scheduler *ops)
>>>>> {
>>>>> + const int cpu = smp_processor_id();
>>>>
>>>>
>>>>
>>>>
>>>> Did the "cpu" be used? If not, please remove it here.
>>>>
>>>>
>>>>>
>>>>> struct rt_private *prv = xzalloc(struct rt_private);
>>>>>
>>>>> printk("Initializing RTDS scheduler\n"
>>>>> @@ -454,6 +466,8 @@ rt_init(struct scheduler *ops)
>>>>>
>>>>> ops->sched_data = prv;
>>>>>
>>>>> + init_timer(&repl_timer, repl_handler, ops, 0);
>>>>> +
>>>>> return 0;
>>>>>
>>>>> no_mem:
>>>>> @@ -473,6 +487,9 @@ rt_deinit(const struct scheduler *ops)
>>>>> xfree(_cpumask_scratch);
>>>>> _cpumask_scratch = NULL;
>>>>> }
>>>>> +
>>>>> + kill_timer(&repl_timer);
>>>>> +
>>>>> xfree(prv);
>>>>> }
>>>>>
>>>>> @@ -635,6 +652,13 @@ rt_vcpu_insert(const struct scheduler *ops, struct
>>>>> vcpu *vc)
>>>>>
>>>>> /* add rt_vcpu svc to scheduler-specific vcpu list of the dom */
>>>>> list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu);
>>>>> +
>>>>> + if(!timer_started)
>>>>
>>>>
>>>>
>>>>
>>>> space should be added..
>>>
>>>
>>>
>>> Thanks, Meng. I will fix that.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> + {
>>>>> + /* the first vcpu starts the timer for the first time*/
>>>>> + timer_started = 1;
>>>>> + set_timer(&repl_timer,svc->cur_deadline);
>>>>> + }
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -792,44 +816,6 @@ __runq_pick(const struct scheduler *ops, const
>>>>> cpumask_t *mask)
>>>>> }
>>>>>
>>>>> /*
>>>>> - * Update vcpu's budget and
>>>>> - * sort runq by insert the modifed vcpu back to runq
>>>>> - * lock is grabbed before calling this function
>>>>> - */
>>>>> -static void
>>>>> -__repl_update(const struct scheduler *ops, s_time_t now)
>>>>> -{
>>>>> - struct list_head *runq = rt_runq(ops);
>>>>> - struct list_head *depletedq = rt_depletedq(ops);
>>>>> - struct list_head *iter;
>>>>> - struct list_head *tmp;
>>>>> - struct rt_vcpu *svc = NULL;
>>>>> -
>>>>> - list_for_each_safe(iter, tmp, runq)
>>>>> - {
>>>>> - svc = __q_elem(iter);
>>>>> - if ( now < svc->cur_deadline )
>>>>> - break;
>>>>> -
>>>>> - rt_update_deadline(now, svc);
>>>>> - /* reinsert the vcpu if its deadline is updated */
>>>>> - __q_remove(svc);
>>>>> - __runq_insert(ops, svc);
>>>>> - }
>>>>> -
>>>>> - list_for_each_safe(iter, tmp, depletedq)
>>>>> - {
>>>>> - svc = __q_elem(iter);
>>>>> - if ( now >= svc->cur_deadline )
>>>>> - {
>>>>> - rt_update_deadline(now, svc);
>>>>> - __q_remove(svc); /* remove from depleted queue */
>>>>> - __runq_insert(ops, svc); /* add to runq */
>>>>> - }
>>>>> - }
>>>>> -}
>>>>> -
>>>>> -/*
>>>>> * schedule function for rt scheduler.
>>>>> * The lock is already grabbed in schedule.c, no need to lock here
>>>>> */
>>>>> @@ -848,7 +834,6 @@ rt_schedule(const struct scheduler *ops, s_time_t
>>>>> now, bool_t tasklet_work_sched
>>>>> /* burn_budget would return for IDLE VCPU */
>>>>> burn_budget(ops, scurr, now);
>>>>>
>>>>> - __repl_update(ops, now);
>>>>>
>>>>> if ( tasklet_work_scheduled )
>>>>> {
>>>>> @@ -889,7 +874,7 @@ rt_schedule(const struct scheduler *ops, s_time_t
>>>>> now, bool_t tasklet_work_sched
>>>>> }
>>>>> }
>>>>>
>>>>> - ret.time = MIN(snext->budget, MAX_SCHEDULE); /* sched quantum */
>>>>> + ret.time = snext->budget; /* invoke the scheduler next time */
>>>>> ret.task = snext->vcpu;
>>>>>
>>>>> /* TRACE */
>>>>> @@ -1033,10 +1018,6 @@ rt_vcpu_wake(const struct scheduler *ops, struct
>>>>> vcpu *vc)
>>>>> {
>>>>> struct rt_vcpu * const svc = rt_vcpu(vc);
>>>>> s_time_t now = NOW();
>>>>> - struct rt_private *prv = rt_priv(ops);
>>>>> - struct rt_vcpu *snext = NULL; /* highest priority on RunQ */
>>>>> - struct rt_dom *sdom = NULL;
>>>>> - cpumask_t *online;
>>>>>
>>>>> BUG_ON( is_idle_vcpu(vc) );
>>>>>
>>>>> @@ -1074,14 +1055,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct
>>>>> vcpu *vc)
>>>>> /* insert svc to runq/depletedq because svc is not in queue now */
>>>>> __runq_insert(ops, svc);
>>>>>
>>>>> - __repl_update(ops, now);
>>>>> -
>>>>> - ASSERT(!list_empty(&prv->sdom));
>>>>> - sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
>>>>> - online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>>>>> - snext = __runq_pick(ops, online); /* pick snext from ALL valid cpus
>>>>> */
>>>>> -
>>>>> - runq_tickle(ops, snext);
>>>>> + runq_tickle(ops, svc);
>>>>>
>>>>> return;
>>>>> }
>>>>> @@ -1094,10 +1068,6 @@ static void
>>>>> rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
>>>>> {
>>>>> struct rt_vcpu *svc = rt_vcpu(vc);
>>>>> - struct rt_vcpu *snext = NULL;
>>>>> - struct rt_dom *sdom = NULL;
>>>>> - struct rt_private *prv = rt_priv(ops);
>>>>> - cpumask_t *online;
>>>>> spinlock_t *lock = vcpu_schedule_lock_irq(vc);
>>>>>
>>>>> clear_bit(__RTDS_scheduled, &svc->flags);
>>>>> @@ -1108,15 +1078,8 @@ rt_context_saved(const struct scheduler *ops,
>>>>> struct vcpu *vc)
>>>>> if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
>>>>> likely(vcpu_runnable(vc)) )
>>>>> {
>>>>> + /* only insert the pre-empted vcpu back*/
>>>>> __runq_insert(ops, svc);
>>>>> - __repl_update(ops, NOW());
>>>>> -
>>>>> - ASSERT(!list_empty(&prv->sdom));
>>>>> - sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
>>>>> - online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>>>>> - snext = __runq_pick(ops, online); /* pick snext from ALL cpus */
>>>>> -
>>>>> - runq_tickle(ops, snext);
>>>>> }
>>>>> out:
>>>>> vcpu_schedule_unlock_irq(lock, vc);
>>>>> @@ -1167,6 +1130,74 @@ rt_dom_cntl(
>>>>> return rc;
>>>>> }
>>>>>
>>>>> +static void repl_handler(void *data){
>>>>> + unsigned long flags;
>>>>> + s_time_t now = NOW();
>>>>> + s_time_t min_repl = LONG_MAX; /* max time used in comparison*/
>>>>> + struct scheduler *ops = data;
>>>>> + struct rt_private *prv = rt_priv(ops);
>>>>> + struct list_head *runq = rt_runq(ops);
>>>>> + struct list_head *depletedq = rt_depletedq(ops);
>>>>> + struct list_head *iter;
>>>>> + struct list_head *tmp;
>>>>> + struct rt_vcpu *svc = NULL;
>>>>> +
>>>>> + spin_lock_irqsave(&prv->lock,flags);
>>>>> +
>>>>> + stop_timer(&repl_timer);
>>>>
>>>>
>>>>
>>>>
>>>> Move the stop_timer before spin_lock_irqsave() since no more than one
>>>> core will run this handler in the current design.
>>>
>>>
>>>
>>> Right. We want the critical section as short as possible.
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> +
>>>>> + list_for_each_safe(iter, tmp, runq)
>>>>> + {
>>>>> + svc = __q_elem(iter);
>>>>> + if ( now < svc->cur_deadline )
>>>>> + break;
>>>>> + rt_update_deadline(now, svc);
>>>>> + /* scan the runq to find the min release time
>>>>> + * this happens when vcpus on runq miss deadline
>>>>> + */
>>>>> + if( min_repl> svc->cur_deadline )
>>>>> + {
>>>>> + min_repl = svc->cur_deadline;
>>>>> + }
>>>>
>>>>
>>>>
>>>> The logic here is incorrect. The comment you put is correct, but the
>>>> code didn't reflect the comment.
>>>>
>>>> When the timer's handler is called, now should be equal to min_repl,
>>>> shouldn't it?
>>>> If now = min_repl, this if statement will never be called because the
>>>> loop stops if ( now < svc->cur_deadline ).
>>>>
>>>
>>> Yes the handler will be invoked when it was programmed to but the value of
>>> min_repl is set to be LONG_MAX initially. It doesn't preserve values.
>>> Therefore, the IF statement is feasible.
>>
>>
>>
>> Hmm, let's have a look at the logic...
>>
>> the min_repl. will only be possibly updated to svc->cur_deadline when
>> now >= svc->cur_deadline, right?
>>
>> In other words, min_repl will only be possibly updated to a time
>> point before now. Shouldn't the min_repl be the earliest time *in the
>> future* that the handler should be triggered? If not, what is the
>> definition/meaning of min_repl here?
>>
>> At the end of the function, you did reprogram the timer to min_repl.
>> /* reprogram the timer using the most imminent release time*/
>> set_timer(&repl_timer, min_repl);
>>
>> So I think the logic is incorrect here...
>>
>> You should first update the budget and deadline of each VCPUs in runq,
>> depletedq and running VCPUs, and (then) find the earliest time in the
>> future when the handler should be triggered.
>>
>
> Yeah. Although the next release time was actually found after a vcpu was
> replenished. If the runq still has vcpus pending but not missing a deadline,
> the break statement on the first iteration will not check the release time.
>
> Since the runq is sorted, it's easier to just pick the next one after both
> runq and depletedq are updated.
>
>
>>>
>>>
>>>
>>>> Only after you have replenished the vcpus on runq and depletedq,
>>>> should you pick the earliest deadline of the front vcpus at the runq
>>>> and depletedq as the min_repl.
>>>>
>>>> Please correct me if I missed anything. :-)
>>>>
>>>>
>>>>>
>>>>> + /* reinsert the vcpu if its deadline is updated */
>>>>> + __q_remove(svc);
>>>>> + __runq_insert(ops, svc);
>>>>> + }
>>>>> +
>>>>> + list_for_each_safe(iter, tmp, depletedq)
>>>>> + {
>>>>> + svc = __q_elem(iter);
>>>>> + if ( now >= svc->cur_deadline )
>>>>> + {
>>>>> + rt_update_deadline(now, svc);
>>>>> +
>>>>> + /* scan the delp_q to find the minimal release time */
>>>>> + if(min_repl > svc->cur_deadline)
>>>>> + {
>>>>> + min_repl = svc->cur_deadline;
>>>>> + }
>>>>
>>>>
>>>>
>>>>
>>>> Please see the above comment. It may have the same issue here.
>>>>
>>>> In addition, the current running VCPUs on cores also need to replenish
>>>> their budgets at the beginning of their next periods. Those running
>>>> VCPUs are not in runq or depletedq in the current RTDS scheduler. So
>>>> this patch won't replenish the current running VCPUs' budget until
>>>> they are put back to runq or depletedq, which is incorrect. Think
>>>> about the following scenario: a VCPU with very small period has a
>>>> task always running on it to keep this vcpu always runnable; This VCPU
>>>> has its period equal to its budget. Suppose this VCPU is the only VCPU
>>>> on a 4 core machine. This VCPU should keep running on one core and
>>>> never be put back to runq. In the current code, this VCPU won't have
>>>> its budget replenished. Right? :-)
>>>>
>>>> You can choose one of the following three approaches to fix it:
>>>> a) add the running VCPUs into the runq
>>>> b) use another queue to keep track of the current running VCPUs
>>>> c) just scan each core (in the cpupool) to get the current running VCPU on
>>>> it.
>>>> I personally prefer c) but I'm looking forward to others' opinion. :-)
>>>>
>
> I experimented a little bit with adding a runningq to keep track of all
> running vcpus. A vcpu will be inserted when it's picked as snext in
> rt_schedule(). It will be removed from runningq after the context switch is
> finished in context_saved(). Also, a currently running vcpu could be marked
> as RUNSTATE_offline in sleep() and added back to runq in wake(). Does this
> sound reasonable if we were to use a q to keep track of running vcpus?
I think so. The name of that queue can be runningq, IMHO.
>
>
>
>>>>> + __q_remove(svc);
>>>>> + __runq_insert(ops, svc);
>>>>> + runq_tickle(ops, svc);
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /* if timer was triggered but none of the vcpus
>>>>> + * need to be refilled, set the timer to be the
>>>>> + * default period + now
>>>>> + */
>>>>
>>>>
>>>>
>>>>
>>>> Hmm, when could this happen? If it could happen, please specify the
>>>> situation.
>>>> Otherwise, it may be better to print a warning (or cause crash?) when
>>>> you set the timer to be default period + now?
>>>>
>>>> I'm not quite sure which is better, warning or crash?
>>>>
>>>>
>>>>>
>>>>> + if(min_repl == LONG_MAX)
>>>>> + {
>>>>>
>>>>> + set_timer(&repl_timer, now + RTDS_DEFAULT_PERIOD);
>>>>> + }
>>>>
>>>>
>>>>
>>>>
>>>> Hmm,
>>>>
>>>> when could this happen? The only situation in my mind is that there is
>>>> no vcpus in the cpupool. But if that is the case, shouldn't the timer
>>>> never be triggered? Then this handler will never be called.
>>>
>>>
>>>
>>> This is actually still a question for me. When I was testing the code in a
>>> virtual machine, it happens when the system boots up. I am guessing it's
>>> related to the same problem in nested-virtualization (just guessing). When
>>> I tested it on a real machine, it locks up and won't even respond to any
>>> key-board events.
>>
>>
>>
>>
>> First, we care about the real machine scenario. It should always run
>> well on real machine. The scheduler design didn't consider the
>> nest-virt. situation, so it is ok for now to have performance
>> degradation or issues in nest-virt. situation, IMHO.
>>
>> Second, the reason why it locks up is probably because the timer of
>> the replenish handler is not correctly set up as I pointed out above.
>> So try to fix the above issue first and see if this still occurs.
>>
> Right, sometimes there is no vcpus on q at all because when the timer is
> programmed to trigger, a running vcpu hasn't been added back to the q yet. In
> this situation, correct me if I'm wrong, the repl_handler is called before
> the context switch is finished. However, scanning the currently running vcpus
> can fix this.
Yes.
Meng
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |