[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 1/4] xen: add real time scheduler rt



On sab, 2014-09-06 at 23:56 -0400, Meng Xu wrote:
> Hi Dario,
> 
> I have modified the code based on your comments in the previous email.
>
Ok. :-)

Let me reply to some of your replies, then I'll go looking at the new
version of the series.

> >> +/*
> >> + * Useful inline functions
> >> + */
> >> +static inline struct rt_private *RT_PRIV(const struct scheduler *_ops)
> >> +{
> >> +    return _ops->sched_data;
> >> +}
> >> +
> >> +static inline struct rt_vcpu *RT_VCPU(const struct vcpu *_vcpu)
> >> +{
> >> +    return _vcpu->sched_priv;
> >> +}
> >> +
> >> +static inline struct rt_dom *RT_DOM(const struct domain *_dom)
> >> +{
> >> +    return _dom->sched_priv;
> >> +}
> >> +
> >> +static inline struct list_head *RUNQ(const struct scheduler *_ops)
> >> +{
> >> +    return &RT_PRIV(_ops)->runq;
> >> +}
> >> +
> > I see what's happening, and I remember the suggestion of using static
> > inline-s, with which I agree. At that point, however, I'd have the
> > function names lower case-ed.
> >
> > It's probably not a big deal, and I don't think we have any official
> > saying about this, but it looks more consistent to me.
> 
> Because other schedulers uses the upper-case name, (because they are
> macros instead of static inline,) in order to keep the style
> consistent with other schedulers, I think I will keep the name in
> upper-case. 
>
All upper case is to stress the fact that these are macro. I find that
making the reader aware of something like this (i.e., whether he's
dealing with a macro or a function), is much more important than
consistency between different and unrelated source files.

I recommend lower casing these, although, I guess it's George's, and
maybe Jan's, call to actually decide.

George?

> I can send another patch set to change all other
> schedulers'  macro definition of these functions to static inline, and
> then change the name to lower-case. What do you think?
> 
No need to do that.

> >> +    printk("cpupool=%s\n", cpustr);
> >> +
> >> +    /* TRACE */
> >> +    {
> >> +        struct {
> >> +            unsigned dom:16,vcpu:16;
> >> +            unsigned processor;
> >> +            unsigned cur_budget_lo, cur_budget_hi, cur_deadline_lo, 
> >> cur_deadline_hi;
> >> +            unsigned is_vcpu_on_runq:16,is_vcpu_runnable:16;
> >> +        } d;
> >> +        d.dom = svc->vcpu->domain->domain_id;
> >> +        d.vcpu = svc->vcpu->vcpu_id;
> >> +        d.processor = svc->vcpu->processor;
> >> +        d.cur_budget_lo = (unsigned) svc->cur_budget;
> >> +        d.cur_budget_hi = (unsigned) (svc->cur_budget >> 32);
> >> +        d.cur_deadline_lo = (unsigned) svc->cur_deadline;
> >> +        d.cur_deadline_hi = (unsigned) (svc->cur_deadline >> 32);
> >> +        d.is_vcpu_on_runq = __vcpu_on_runq(svc);
> >> +        d.is_vcpu_runnable = vcpu_runnable(svc->vcpu);
> >> +        trace_var(TRC_RT_VCPU_DUMP, 1,
> >> +                  sizeof(d),
> >> +                  (unsigned char *)&d);
> >> +    }
> > Not a too big deal, but is it really useful to trace vcpu dumps? How?
> 
> Actually, this function is called in other functions in this file.
>
Yeah, I saw that. And IIRC, I asked to remove pretty much all of the
calls of this function from around the file, except then one from the
handling of the debug key that is actually meant at dumping these info.

> Whenever we want to look into the details of a vcpu (for example, when
> we insert a vcpu to the RunQ), we call this function and it will trace
> the vcpu information. 
>
Which does not make sense. It does (I still find it very 'chatty',
though), for developing and debugging reasons, but there should not be
anything like this in an upstreamable patch series.

Also, you're filling up the trace of TRC_RT_VCPU_DUMP events while what
you're doing is not actually that, but rather, allocating a new vcpu,
deallocating, etc.

If you need to see these events, add specific tracing events, like
you're doing already for a bunch of other ones. You've done the right
thing (according to me, at least), introducing those tracing events and
tracing points. Broaden it, if you need, instead of "abusing" dumping
vcpus. :-)

> I extract this as a function to avoid writing
> this trace again and again in all of those functions, such as
> rt_alloc_vdata, rt_free_vdata, burn_budgets.
> 
Ditto. BTW, burning budget seems to me to have its own trace point, as I
am suggesting above, and not using this function. Doing the same
everywhere you think you need is TRT(^TM).

> In addition, we use xl sched-rt to dump the RunQ information to check
> the RunQ is properly sorted for debug reason.
> 
> When it is upstreamed, I will change this function to static inline,
> remove the printf in this function and leave the trace there.
> 
This makes even less sense! :-O

Dumping info via printk() is a valid mean of gathering debug info,
supported by all other schedulers, and by other Xen subsystem as well.
That happens whey a debug key is sent to the Xen console, and that
absolutely needs to stay.

What you need to do is not removing the printk. It's rather removing the
trace point from within here, and avoiding calling this function from
other place than debug key handling.

Also, from a methodology perspective, there is no "when upstream I will
change ...". What you send to xen-devel should always be the code, that,
if properly acked, gets committed upstream as it is, without any further
modification, either right before or right after that.

If you need some more aggressive debugging for your day-to-day
development (which is fine, I often do need something like that :-D),
what you can do is have it in a separate patch, at the bottom of the
series. Then, when sending the patches in, you just do not include that
one, and you're done. This is how I do it, at least, and it works for
me. Both plain git and tools like stgit or quilt/guilt makes it very
easy to deal with this workflow.


> >> +        list_add(&svc->runq_elem, &prv->flag_vcpu->runq_elem);
> >>
> > Mmm... flag_vcpu, eh? I missed it above where it's declared, but I
> > really don't like the name. 'depleted_vcpus'? Or something else (sorry,
> > not very creative in this very moment :-))... but flag_vcpu, I found it
> > confusing.
> 
> It's declared in struct rt_private. It's inited in rt_init function.
> 
> The reason why I need this vcpu is:
> The RunQ has two parts, the first part is the vcpus with budget and
> the first part is sorted based on priority of vcpus. The second part
> [...]
>
Wow, wo, woow... hold your horses! :-P :-P

I saw where it lives, and I understood what's it for. All I was asking
was to change the name of the variable. :-)

> >> + */
> >> +static void *
> >> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
> >> +{
> >> +    struct rt_private *prv = RT_PRIV(ops);
> >> +
> >> +    cpumask_set_cpu(cpu, &prv->cpus);
> >> +
> >> +    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> >> +
> >> +    printtime();
> >> +    printk("%s total cpus: %d", __func__, cpumask_weight(&prv->cpus));
> >> +    /* same as credit2, not a bogus pointer */
> >> +    return (void *)1;
> >>
> > Can you put in the comment the reason why it is ok to do this, without
> > referencing credit2?
> 
> Explained in the comment in next patch. In schedule.c, they use the
> return value to check if this function correctly allocate pdata by
> checking if it returns 1. (Well, this is inconsistent with other code
> which uses 0 to indicate success.) The function definition needs
> return a void *, so we have to cast the 1 to void *. That's the
> reason. :-P
> 
Same here. I do see the reason. My point was: "Either state the reason
in the comment, or remove it." The purpose of a comment like this should
be to explain _why_ something is done in a certain way. Now, about this
code, either one sees (or is able to quickly find out) the reason for
the (void*)1, bu herself, in which case the comment is useless anyway.
OTOH, if one needs help in understanding that, you're not helping him
much by saying <<dude, this is fine, credit2 does the same thing!>>,
i.e., the comments could have been useful in this case, but it's useless
again.

So, if I'd have to choose between an useless comment and no comment, I'd
go for the latter. And that's what I'm saying: make it a useful comments
for someone different than you (or even for you, e.g., in 5 years from
now :-P), or kill it. :-)

> >> +    svc->period = RT_DEFAULT_PERIOD;
> >> +    if ( !is_idle_vcpu(vc) )
> >> +        svc->budget = RT_DEFAULT_BUDGET;
> >> +
> >> +    count = (now/MICROSECS(svc->period)) + 1;
> >> +    /* sync all VCPU's start time to 0 */
> >> +    svc->cur_deadline += count * MICROSECS(svc->period);
> >> +
> > why "+="? What value do you expect cur_deadline to hold at this time?
> 
> It should be =. This does not cause a bug because it's in the
> rt_alloc_vdata() and the svc->cur_deadline is 0.  But "+=" is not
> correct in logic. I modified it. Thank you!
> 
I appreciate it does not harm, but it's harder to read, so thanks for
changing it.

> >
> > Also, unless I'm missing something, or doing the math wrong, what you
> > want to do here is place the deadline one period ahead of NOW().
> 
> Ah, no. I think you are thinking about the CBS server mechanisms. (Now
> I start to use some notations in real-time academic field) For the
> deferrable server, if the release offset of a implicit-deadline
> deferrable vcpu is O_i, its deadline will be O_i + p_i * k, where k is
> a natural number. So we want to set deadline to the end of the period
> during which NOW() is fall in.
> 
Mmm... ok, yes, I have to admit I probably was not considering that. So,
for the first time you're setting the deadline, which is what's
happening here, I don't think I see a quick way to avoid the div+1, to
emulate the ceiling.

It probably can still be avoided in other places, though. In fact, in
that case, you're advancing the deadline (perhaps more than one time)
from the a previous one, and that should manage to get you to the end of
the right period, shouldn't it. Anyway, I guess I'll comment more (if
I'll find it necessary) about this directly on v2.

I'm still convinced about the time conversion part of what I said about
this code, though.

> >> +    /* Debug only: dump new vcpu's info */
> >> +    printtime();
> >> +    rt_dump_vcpu(svc);
> >> +
> > As said before, this has to go, quite possibly by means of replacing it
> > by some tracing.
> 
> When no more other comments to this patch set, I will make
> rt_dump_vcpu as a static inline function and it only has the tracing
> inside. Is that OK? Now I just want to use some dump to help me with
> some quick debug. :-)
> 
Please don't. Do ditch all the occurrences of this, and add tracing if
required.

Personally, I think that in most of the places where you are now calling
rt_dump_vcpu(), you don't even need to add tracing. Perhaps do what I
said above about keeping a private debug patch. In this spot here, I
wouldn't see it bad to have a trace point, and one with its own specific
event and data, as setting the first deadline is quite important an
event to see in a trace, if you what to understand how things works/are
going.

> > Can you please re-look at
> > http://lists.xen.org/archives/html/xen-devel/2014-07/msg01624.html , and
> > either reply/explain, or cope with that? In summary, what I was asking
> > was whether adding/removing the vcpu to the list of vcpus of a domain
> > could be done somewhere or somehow else, as doing it like this means,
> > for some code-path, removing and then re-adding it right away.
> >
> > Looking more carefully, I do see that credit2 does the same thing but,
> > for one, that doesn't make it perfect (I still don't like it :-P), and
> > for two, credit2 does a bunch of other stuff there, that you don't.
> >
> > Looking even more carefully, this seems to be related to the TODO you
> > put (which may be a new addition wrt RFCv2, I can't remember). So, yes,
> > I seriously think you should take care of the fact that, when this
> > function is called for moving a domain from a cpupool to another, you
> > need to move the vcpus from the old cpupool's runqueue to the new one's
> > runqueue.
> >
> > Have you tested doing such operation (moving domains between cpupools)?
> > Because, with the code looking like this, it seems a bit unlikely...
> 
> Hmm, I actually did test this operations. I migrate a dom from Pool-o
> to the newly created cpupool test with credit scheduler. That's also
> the scenario I showed in the cover page of this patch set. :-)
>
Yes, I saw it. TBF, you only have part of what allows to tell if things
are properly working after the migration, e.g., the output of `xl
vcpu-list', to see actually where the vcpus are running is missing.

In any case, working or not, I think the right thing to do is having
some runq management bits in these functions.

> Now I add the code of removing the vcpu from the RunQ, and tested it
> was removed from the RunQ when I move the domU to the new cpupool.
> 
Perfect, thanks. Let's comment and try this then.

> As to remove the vcpu from the domain's vcpu list. I'm not operating
> the vcpu list of the structure domain{}, which are used by other
> components in Xen. 
>
I know.

> Actually, we have a scheduler-specific private
> structure rt_vcpu which uses its field runq_elem to link the vcpu to
> the RunQ of the scheduler; 
>
I know.

> We also have a scheduler-specific private
> structure rt_dom to record the scheduler-specific data for the domain
> (it's defined in sched_rt.c). 
>
I know.

> Inside the rt_dom, it has a vcpu list to
> record the scheduler-specific vcpus of this domain. 
>
I know.

> When we move a
> domU from a cpupool to another cpupool (say, with rt scheduler), the
> domU's scheduler-specific vcpu should be moved to the destination
> cpupool's RunQ. So we need to remove the scheduler-specific vcpu from
> the source cpupool's RunQ by using this rt_vcpu_remove() and then
> insert it to the dest. cpupool's RunQ by using the rt_vcpu_insert().
> 
I know. And I think this is exactly what you are *NOT* doing in v1.
Since you often looks at what credit2, does, you can see it includes a
call to runq_assign(), the equivalent of which, is what you're missing.

> > Finally, the insert_vcpu hook is called from sched_init_vcpu() too,
> > still in schedule.c. And in fact, all the other scheduler use this
> > function to put a vcpu on its runqueue for the first time. Once you do
> > this, you a fine behaviour, cpupool-wise, almost for free. You
> > apparently don't. Why is that? If you'd go for a similar approach, you
> > will get the same benefit. Where is it that you insert a vcpu in the
> > RunQ for the first time? Again, why there and not here?
> 
> Added the runq_insert() statement in this function to add the vcpu to
> the RunQ now.
> In the old patch, the vcpu is inserted in the wake-up function. :-(
> 
Right, which may be what makes things (at least partially) work when
cpupool are involved as well.

> >> +     * update deadline info: When deadline is in the past,
> >> +     * it need to be updated to the deadline of the current period,
> >> +     * and replenish the budget
> >> +     */
> >> +    delta = now - svc->cur_deadline;
> >> +    if ( delta >= 0 )
> >> +    {
> >> +        count = ( delta/MICROSECS(svc->period) ) + 1;
> >> +        svc->cur_deadline += count * MICROSECS(svc->period);
> >> +        svc->cur_budget = svc->budget * 1000;
> >> +
> > Ditto, about time units and conversions.
> >
> > I also am sure I said before that I prefer an approach like:
> >
> >     while ( svc->cur_deadline < now )
> >         svc->cur_deadline += svc->period;+/*
> 
> I explained in the previous comment. :-) Now could be  several periods
> late after the current deadline. And this is deferrable server. :-)
> 
Are you sure this is not ok this time? As I said, I agree the count
thing is right when assigning the first deadline. However:
 1) I know it can be several periods away, that's the purpose of the 
    while()
 2) advancing in steps of period, starting from the last set deadline, 
    should get you to the end of the right period.

Or am I missing something else? :-)

> >> +    if ( delta < 0 )
> >> +    {
> >> +        printk("%s, ATTENTION: now is behind last_start! delta = %ld for 
> >> ",
> >> +                __func__, delta);
> >> +        rt_dump_vcpu(svc);
> >> +        svc->last_start = now;  /* update last_start */
> >> +        svc->cur_budget = 0;   /* FIXME: should we recover like this? */
> >> +        return;
> >> +    }
> >> +
> > Bha, this just should not happen, and if it does, it's either a bug or
> > an hardware problem, isn't it? I don't think you should do much more
> > than this. I'd remove the rt_dump_vcpu() but, in this case, I'm fine
> > with the printk(), as it's something that really should not happen, and
> > we want to inform sysadmin that something has gone very bad. :-)
> 
> When I run Xen in virtualBox and configure 4 (virtual) cores to the VM
> of the virtualBox, these 4 cores are four threads for my host machine
> (i.e., Macbookair). I'm guessing if the four virtual cores do not have
> a good (or timely) time synchronization, when a vcpu migrates to
> another virtual core, the time might be late after its last_start
> time.
> 
That's likely what happens, yes.

> > "What's in priv->cpus, BTW? How is it different form the cpupool online
> > mask (returned in 'online' by cpupool_scheduler_cpumask() )?"
> >
> > while I don't remember having received an answer, and I see it's still.
> > here in the code. Am I missing something? If not, can you explain?
> 
> prv->cpus is the online cpus for the scheduler.  It should have the
> same value with the cpupool_scheduler_cpumask().
> 
Should?

> Should I remove this? (I plan to release the next version this
> weekend, I will remove this after receiving the  command. :-))
> 
Well, you tell me. :-) Is there a particular reason why you're keeping
the same information in two places? If there's one, explain it to us. If
there's no, well... :-)

> Thank you very much for your time and review!
> Hope I have solved all of them, except the ones I asked in the email.
> (I think I solved all of them now. :-P )
> 
Thanks to you for taking care of them.

I'll have a look at v2 and let you know.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.