[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 1/4] xen: add real time scheduler rt



2014-09-08 6:33 GMT-04:00 Dario Faggioli <dario.faggioli@xxxxxxxxxx>:
> On sab, 2014-09-06 at 23:56 -0400, Meng Xu wrote:
>> Hi Dario,
>>
>> I have modified the code based on your comments in the previous email.
>>
> Ok. :-)
>
> Let me reply to some of your replies, then I'll go looking at the new
> version of the series.
>
>> >> +/*
>> >> + * Useful inline functions
>> >> + */
>> >> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
>> >> +{
>> >> +    return _ops->sched_data;
>> >> +}
>> >> +
>> >> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
>> >> +{
>> >> +    return _vcpu->sched_priv;
>> >> +}
>> >> +
>> >> +static inline struct rt_dom *RT_DOM(const struct domain *_dom)
>> >> +{
>> >> +    return _dom->sched_priv;
>> >> +}
>> >> +
>> >> +static inline struct list_head *RUNQ(const struct scheduler *_ops)
>> >> +{
>> >> +    return &RT_PRIV(_ops)->runq;
>> >> +}
>> >> +
>> > I see what's happening, and I remember the suggestion of using static
>> > inline-s, with which I agree. At that point, however, I'd have the
>> > function names lower case-ed.
>> >
>> > It's probably not a big deal, and I don't think we have any official
>> > saying about this, but it looks more consistent to me.
>>
>> Because other schedulers uses the upper-case name, (because they are
>> macros instead of static inline,) in order to keep the style
>> consistent with other schedulers, I think I will keep the name in
>> upper-case.
>>
> All upper case is to stress the fact that these are macro. I find that
> making the reader aware of something like this (i.e., whether he's
> dealing with a macro or a function), is much more important than
> consistency between different and unrelated source files.
>
> I recommend lower casing these, although, I guess it's George's, and
> maybe Jan's, call to actually decide.
>
> George?

I saw George's reply. Now I will change them to lower-case.
>
>> I can send another patch set to change all other
>> schedulers'  macro definition of these functions to static inline, and
>> then change the name to lower-case. What do you think?
>>
> No need to do that.

Roger.

>
>> >> +    printk("cpupool=%s\n", cpustr);
>> >> +
>> >> +    /* TRACE */
>> >> +    {
>> >> +        struct {
>> >> +            unsigned dom:16,vcpu:16;
>> >> +            unsigned processor;
>> >> +            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, 
>> >> cur_deadline_hi;
>> >> +            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
>> >> +        } d;
>> >> +        d.dom = svc->vcpu->domain->domain_id;
>> >> +        d.vcpu = svc->vcpu->vcpu_id;
>> >> +        d.processor = svc->vcpu->processor;
>> >> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
>> >> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
>> >> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
>> >> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
>> >> +        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
>> >> +        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
>> >> +        trace_var(TRC_RT_VCPU_DUMP, 1,
>> >> +                  sizeof(d),
>> >> +                  (unsigned char *)&d);
>> >> +    }
>> > Not a too big deal, but is it really useful to trace vcpu dumps? How?
>>
>> Actually, this function is called in other functions in this file.
>>
> Yeah, I saw that. And IIRC, I asked to remove pretty much all of the
> calls of this function from around the file, except then one from the
> handling of the debug key that is actually meant at dumping these info.

I see. Then I will remove them.

>
>> Whenever we want to look into the details of a vcpu (for example, when
>> we insert a vcpu to the RunQ), we call this function and it will trace
>> the vcpu information.
>>
> Which does not make sense. It does (I still find it very 'chatty',
> though), for developing and debugging reasons, but there should not be
> anything like this in an upstreamable patch series.
>
> Also, you're filling up the trace of TRC_RT_VCPU_DUMP events while what
> you're doing is not actually that, but rather, allocating a new vcpu,
> deallocating, etc.
>
> If you need to see these events, add specific tracing events, like
> you're doing already for a bunch of other ones. You've done the right
> thing (according to me, at least), introducing those tracing events and
> tracing points. Broaden it, if you need, instead of "abusing" dumping
> vcpus. :-)

I see. Then I will remove these dumps and add more trace events.

>
>> I extract this as a function to avoid writing
>> this trace again and again in all of those functions, such as
>> rt_alloc_vdata, rt_free_vdata, burn_budgets.
>>
> Ditto. BTW, burning budget seems to me to have its own trace point, as I
> am suggesting above, and not using this function. Doing the same
> everywhere you think you need is TRT(^TM).

OK. Now I understand and will remove those dumps.

>
>> In addition, we use xl sched-rt to dump the RunQ information to check
>> the RunQ is properly sorted for debug reason.
>>
>> When it is upstreamed, I will change this function to static inline,
>> remove the printf in this function and leave the trace there.
>>
> This makes even less sense! :-O
>
> Dumping info via printk() is a valid mean of gathering debug info,
> supported by all other schedulers, and by other Xen subsystem as well.
> That happens whey a debug key is sent to the Xen console, and that
> absolutely needs to stay.
>
> What you need to do is not removing the printk. It's rather removing the
> trace point from within here, and avoiding calling this function from
> other place than debug key handling.
>
> Also, from a methodology perspective, there is no "when upstream I will
> change ...". What you send to xen-devel should always be the code, that,
> if properly acked, gets committed upstream as it is, without any further
> modification, either right before or right after that.

Thank you very much for correcting my incorrect perspective. :-)

>
> If you need some more aggressive debugging for your day-to-day
> development (which is fine, I often do need something like that :-D),
> what you can do is have it in a separate patch, at the bottom of the
> series. Then, when sending the patches in, you just do not include that
> one, and you're done. This is how I do it, at least, and it works for
> me. Both plain git and tools like stgit or quilt/guilt makes it very
> easy to deal with this workflow.

This is a very good approach! I will do that!

>
>
>> >> +        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
>> >>
>> > Mmm... flag_vcpu, eh? I missed it above where it's declared, but I
>> > really don't like the name. 'depleted_vcpus'? Or something else (sorry,
>> > not very creative in this very moment :-))... but flag_vcpu, I found it
>> > confusing.
>>
>> It's declared in struct rt_private. It's inited in rt_init function.
>>
>> The reason why I need this vcpu is:
>> The RunQ has two parts, the first part is the vcpus with budget and
>> the first part is sorted based on priority of vcpus. The second part
>> [...]
>>
> Wow, wo, woow... hold your horses! :-P :-P
>
> I saw where it lives, and I understood what's it for. All I was asking
> was to change the name of the variable. :-)

Haha. :-P

I also saw George's comment on this. It seems that both you and George
have the idea of using two separate queue instead of one RunQ with a
flag vcpu. Using two separate queue is easier for developers to
understand. So I will change it to use two separate queues and add
some simple helper functions.

>
>> >> + */
>> >> +static void *
>> >> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
>> >> +{
>> >> +    struct rt_private *prv = RT_PRIV(ops);
>> >> +
>> >> +    cpumask_set_cpu(cpu, &prv->cpus);
>> >> +
>> >> +    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
>> >> +
>> >> +    printtime();
>> >> +    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
>> >> +    /* same as credit2, not a bogus pointer */
>> >> +    return (void *)1;
>> >>
>> > Can you put in the comment the reason why it is ok to do this, without
>> > referencing credit2?
>>
>> Explained in the comment in next patch. In schedule.c, they use the
>> return value to check if this function correctly allocate pdata by
>> checking if it returns 1. (Well, this is inconsistent with other code
>> which uses 0 to indicate success.) The function definition needs
>> return a void *, so we have to cast the 1 to void *. That's the
>> reason. :-P
>>
> Same here. I do see the reason. My point was: "Either state the reason
> in the comment, or remove it." The purpose of a comment like this should
> be to explain _why_ something is done in a certain way. Now, about this
> code, either one sees (or is able to quickly find out) the reason for
> the (void*)1, bu herself, in which case the comment is useless anyway.
> OTOH, if one needs help in understanding that, you're not helping him
> much by saying <<dude, this is fine, credit2 does the same thing!>>,
> i.e., the comments could have been useful in this case, but it's useless
> again.
>
> So, if I'd have to choose between an useless comment and no comment, I'd
> go for the latter. And that's what I'm saying: make it a useful comments
> for someone different than you (or even for you, e.g., in 5 years from
> now :-P), or kill it. :-)

I see the point! This is a very useful guideline of adding comments! :-)
In the version 2 patch set I sent, it adds the comments  /* 1
indicates alloc. succeed in schedule.c */, Is this ok? (I think this
could be useful for other developers because it tells them how the
return value will be used.)

>
>> >> +    svc->period = RT_DEFAULT_PERIOD;
>> >> +    if ( !is_idle_vcpu(vc) )
>> >> +        svc->budget = RT_DEFAULT_BUDGET;
>> >> +
>> >> +    count = (now/MICROSECS(svc->period)) + 1;
>> >> +    /* sync all VCPU's start time to 0 */
>> >> +    svc->cur_deadline += count * MICROSECS(svc->period);
>> >> +
>> > why "+="? What value do you expect cur_deadline to hold at this time?
>>
>> It should be =. This does not cause a bug because it's in the
>> rt_alloc_vdata() and the svc->cur_deadline is 0.  But "+=" is not
>> correct in logic. I modified it. Thank you!
>>
> I appreciate it does not harm, but it's harder to read, so thanks for
> changing it.
>
>> >
>> > Also, unless I'm missing something, or doing the math wrong, what you
>> > want to do here is place the deadline one period ahead of NOW().
>>
>> Ah, no. I think you are thinking about the CBS server mechanisms. (Now
>> I start to use some notations in real-time academic field) For the
>> deferrable server, if the release offset of a implicit-deadline
>> deferrable vcpu is O_i, its deadline will be O_i + p_i * k, where k is
>> a natural number. So we want to set deadline to the end of the period
>> during which NOW() is fall in.
>>
> Mmm... ok, yes, I have to admit I probably was not considering that. So,
> for the first time you're setting the deadline, which is what's
> happening here, I don't think I see a quick way to avoid the div+1, to
> emulate the ceiling.
>
> It probably can still be avoided in other places, though. In fact, in
> that case, you're advancing the deadline (perhaps more than one time)
> from the a previous one, and that should manage to get you to the end of
> the right period, shouldn't it. Anyway, I guess I'll comment more (if
> I'll find it necessary) about this directly on v2.
>
> I'm still convinced about the time conversion part of what I said about
> this code, though.

Sure! I will avoid using the division when advance the deadline.
As to time conversion part, all code in hypervisor is in nanosecond.
The only time when we do the conversion is in function rt_dom_cntl,
when we set/get vcpus' parameters (since the toolstack use us as time
unit and hypervisor uses ns as time unit).

>
>> >> +     * update deadline info: When deadline is in the past,
>> >> +     * it need to be updated to the deadline of the current period,
>> >> +     * and replenish the budget
>> >> +     */
>> >> +    delta = now - svc->cur_deadline;
>> >> +    if ( delta >= 0 )
>> >> +    {
>> >> +        count = ( delta/MICROSECS(svc->period) ) + 1;
>> >> +        svc->cur_deadline += count * MICROSECS(svc->period);
>> >> +        svc->cur_budget = svc->budget * 1000;
>> >> +
>> > Ditto, about time units and conversions.
>> >
>> > I also am sure I said before that I prefer an approach like:
>> >
>> >     while ( svc->cur_deadline < now )
>> >         svc->cur_deadline += svc->period;+/*
>>
>> I explained in the previous comment. :-) Now could be  several periods
>> late after the current deadline. And this is deferrable server. :-)
>>
> Are you sure this is not ok this time? As I said, I agree the count
> thing is right when assigning the first deadline. However:
>  1) I know it can be several periods away, that's the purpose of the
>     while()
>  2) advancing in steps of period, starting from the last set deadline,
>     should get you to the end of the right period.
>
> Or am I missing something else? :-)

Yes, you are right! I will change it to the while lool instead of
using the division. :-P


>
>> > "What's in priv->cpus, BTW? How is it different form the cpupool online
>> > mask (returned in 'online' by cpupool_scheduler_cpumask() )?"
>> >
>> > while I don't remember having received an answer, and I see it's still.
>> > here in the code. Am I missing something? If not, can you explain?
>>
>> prv->cpus is the online cpus for the scheduler.  It should have the
>> same value with the cpupool_scheduler_cpumask().
>>
> Should?
>
>> Should I remove this? (I plan to release the next version this
>> weekend, I will remove this after receiving the  command. :-))
>>
> Well, you tell me. :-) Is there a particular reason why you're keeping
> the same information in two places? If there's one, explain it to us. If
> there's no, well... :-)

Will remove in version 3. :-)

Thank you again!

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.