Xen project Mailing List

Re: [Xen-devel] [PATCH V2 1/1] Improved RTDS scheduler

To: Tianyang Chen <tiche@xxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Tue, 26 Jan 2016 11:52:29 +0000

Cc: george.dunlap@xxxxxxxxxx, Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>, Meng Xu <mengxu@xxxxxxxxxxxxx>

Delivery-date: Tue, 26 Jan 2016 11:52:55 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 2016-01-25 at 17:04 -0500, Tianyang Chen wrote: > I have removed some of the Ccs so they won't get bothered as weÂ > discussed previously. > Yeah... I said you should have done that in the first place, and then Cc-ed them myself! Sorry... :-P > On 1/25/2016 4:00 AM, Dario Faggioli wrote: > > On Thu, 2015-12-31 at 05:20 -0500, Tianyang Chen wrote: > > >Â > > So, there's always only one timer... Even if we have multiple > > cpupool > > with RTDS as their scheduler, they share the replenishment timer? I > > think it makes more sense to make this per-scheduler. > > > Yeah, I totally ignored the case for cpu-pools. It looks like when aÂ > cpu-pool is created, it copies the scheduler struct and calls > rt_init()Â > where a private field is initialized. So I assume the timer should > beÂ > put inside the scheduler private struct? > Yes, I think it should be there. We certainly don't want different cpupools to share the timer. > Now that I think about it, theÂ > timer is hard-coded to run on cpu0. > It is. Well, in your patch it is "hard-coded" to cpu0. When considering cpupools, you just hard-code it to one cpu of the pool. In fact, the fact that the timer is sort-of pinned to a pcpu is a (potential) issue (overhead on that pcpu, what happens if that pcpu goes offline), but let's deal with all this later. For now, make the code cpupools-safe. > If there're lots of cpu-pools butÂ > the replenishment can only be done on the same pcpu, would that be aÂ > problem? Should we keep track of all instances of schedulers > (nr_rt_opsÂ > counts how many) and just put times on different pcpus? > One timer per cpupool is what we want, at least for now. > > About the actual startup of the timer (no matter whether for first > > time > > or not). Here, you were doing it in _vcpu_insert() and not in > > _vcpu_wake(); in v3 you're doing it in _vcpu_wake() and not in > > _runq_insert()... Which one is the proper way? > > > > Correct me if I'm wrong, at the beginning of the boot process, all > vcpusÂ > are put to sleep/not_runnable after insertions. Therefore, the timerÂ > should start when the first vcpu wakes up. I think the wake() in v3Â > should be correct. > Check when the insert_vcpu is called in schedule.c (hint, this also has to do with cpupools()). I think that starting it in wake() is ok, but, really, do double check (and, once you're ready for that, test things by creating multiple pools and moving domains around between them). > > Mmm... I'll think about this more and let you know... But out of > > the > > top of my head, I think the tickling has to stay? You preempted a > > vcpu > > from the pcpu where it was running, maybe some other pcpu is either > > idle or running a vcpu with a later deadline, and should come and > > pick > > this one up? > > > gEDF allows this but there is overhead and may not be worth it. I > haveÂ > no stats to support this but there are some papers on restricting > whatÂ > tasks can migrate. We can discuss more if we need extra logic here. > Ok (more on this in the reply to Meng's email). > > Oh, and one thing: the use of the term "release time" is IMO a bit > > misleading. Release of what? Typically, the release time of an RT > > task > > (or job) is when the task (or job) is declared ready to run... But > > I > > don't think it's used like this in here. > > > > I propose to just get rid of it. > > > The "release time" here means the next time when a deferrable server > isÂ > released and ready to serve. It happens every period. Maybe the termÂ > "inter-release time" is more appropriate? > Perhaps, but I think this part of the DS algorithm can be implemented in a smart enough way to avoid having to deal with this "explicitly" (and in particular, having to scan the running or ready queues during replenishment). > > > +ÂÂÂÂÂÂÂÂif( min_repl> svc->cur_deadline ) > > > +ÂÂÂÂÂÂÂÂ{ > > > +ÂÂÂÂÂÂÂÂÂÂÂÂmin_repl = svc->cur_deadline; > > > +ÂÂÂÂÂÂÂÂ} > > > +ÂÂÂÂÂÂÂÂ/* reinsert the vcpu if its deadline is updated */ > > > +ÂÂÂÂÂÂÂÂ__q_remove(svc); > > > +ÂÂÂÂÂÂÂÂ__runq_insert(ops, svc); > > > > > One more proof of what I was trying to say. Is it really this > > handler's > > job to --basically-- re-sort the runqueue? I don't think so. > > > > What is the specific situation that you are trying to handle like > > this? > > > Right, if we want to count deadline misses, it could be done when a > vcpuÂ > is picked. However, when selecting the most imminent "inter-releaseÂ > time" of all runnable vcpu, the head of the runq could be missing > itsÂ > deadline and the cur-deadline could be in the past. How do we handleÂ > this situation? We still need to scan the runq right? > I'll do my best to avoid that we'll end up scanning the runqueue in the replenishment timer handler, and in fact I still don't think this is going to be necessary. Let's discuss more about this specific point when replying to Meng's email. > > But I don't think I understand. When a vcpu runs out of budget, > > either: > > Â a. it needs an immediate replenishment > > Â b. it needs to go to depletedq, and a replenishment event for it > > ÂÂÂÂÂprogrammed (which may or may not require re-arming the > > ÂÂÂÂÂreplenishment timer) > > > > Meng's example falls in a., AFAICT, and we can just deal with that > > when > > we handle the 'budget exhausted' event (in rt_schedule(), in this > > case, > > I think). > > > > The case you refer to in the comment above ("when vcpus on runq > > miss > > deadline") can either fall in a. or in b., but in both cases it > > seems > > to me that you can handle it when it happens, instead than inside > > this > > timer handling routine. > > > This discussion was before I figured out things about idle_vcpu[] > andÂ > tasklet. A vcpu could be preempted and put back to either runq orÂ > depletedq if a tasklet is scheduled. It could also end up in a > depletedqÂ > in other situations. I guess Meng's point is this vcpu should be > runningÂ > constantly without being taken off if there is no tasklet, in an > effortÂ > to follow EDF. > So, whatever it is the reason that triggered the call to schedule() -- either a tasklet or anything-- a vcpu that has exhausted its budget should either: Â- be replenished immediately, and hence continue to run (as soon asÂ Â Âpossible), if it has a replenishment pending that has not beenÂ Â Âperformed yet. Â- go to the depleted queue and wait for one. Isn't this the case? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.