Xen project Mailing List

Re: [Xen-devel] [PATCH v4] xen: sched: convert RTDS from time to event driven model

To: Tianyang Chen <tiche@xxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Mon, 8 Feb 2016 12:27:09 +0100

Cc: george.dunlap@xxxxxxxxxx, Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>, Meng Xu <mengxu@xxxxxxxxxxxxx>

Delivery-date: Mon, 08 Feb 2016 11:27:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, 2016-02-05 at 23:27 -0500, Tianyang Chen wrote: > > On 2/5/2016 9:39 AM, Dario Faggioli wrote: > > On Wed, 2016-02-03 at 21:23 -0500, Tianyang Chen wrote: > > > > I see. So I'm just curious what can cause spurious wakeup? Does it > onlyÂ > happen to a running vcpu (currently on pcpu), and a vcpu that is onÂ > either runq or depletedq? > Hehe, "that's virt, baby!" :-P No, seriously, vcpu being already running or already in a queue, is indeed what makes me call a wakeup a "spurious" wakeup. You can check at the performance counters that we update in those cases, to see when they happen most. In my experience, on Credit, I've seen them happening mostly when a vcpu is running. For instance: root@Zhaman:~# xenperf |grep vcpu_wake sched: vcpu_wake_runningÂÂÂÂÂÂÂÂÂÂÂÂT=ÂÂÂÂÂÂÂÂ39Â sched: vcpu_wake_onrunqÂÂÂÂÂÂÂÂÂÂÂÂÂT=ÂÂÂÂÂÂÂÂÂ0Â sched: vcpu_wake_runnableÂÂÂÂÂÂÂÂÂÂÂT=ÂÂÂÂÂ59875Â sched: vcpu_wake_not_runnableÂÂÂÂÂÂÂT=ÂÂÂÂÂÂÂÂÂ0 > > So we realized that this is just a spurious wakeup, and get back to > > what we were doing. > > > > What's wrong with this I just said? > > > > The reason why I wanted to add a vcpu back to replq is because it > couldÂ > be taken off from the timer handler. Now, because of the spuriousÂ > wakeup, I think the timer shouldn't take any vcpus off of replq, inÂ > which case wake() should be doing nothing just like other schedulersÂ > when it's a spurious wakeup. Also, only sleep() should remove eventsÂ > from replq. > Err... well, that depends on the how the code ends up looking like, and it's start to become difficult to reason on it like this. Perhaps, considering that it looks to me that we are in agreement on all the important aspects, you can draft a new version and we'll reason and comment on top of that? > > Mmm... no, I think we should know/figure out whether a > > replenishment is > > pending already and we are ok with it, or if we need a new one. > > > > Just to make sure, spurious sleep/wakeup are for a vcpu that is on > pcpuÂ > or any queue right? > Yes, spurious wakeup is how I'm calling wakeups that needs no action from the schedule, because the vcpu is already awake (and either running or runnable). I don't think there is such a thing as spurious sleep... When sleep is called, we go to sleep. There are cases where the sleep-->wakeup turnaround is really really fast, but I would not call them "spurious sleeps", and they should not need much special treatment, not in sched_rt.c at least. Oh, maybe you mean the cases where you wakeup and you find out that context_saved() has not run yet (just occurred by looking at the code below)? Well, yes... But I wouldn't call those "spurious sleeps" either. > If a real sleep happens, it should be taken off from replq. However, > inÂ > spurious wakeup (which I assume corresponds to a spurious sleep?), > itÂ > shouldn't be taken off from replq. Adding back to replq should > happenÂ > for those vcpus which were taken off because of "real sleep". > I don't really follow, but I have the feeling that you're making it sound more complicated like it is in reality... :-) > So here is a summary to make sure everyone's on the same page :) > This is basically your code with a slight modification beforeÂ > replq_insert(). > > static void > rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) > { > ÂÂÂÂÂstruct rt_vcpu * const svc = rt_vcpu(vc); > ÂÂÂÂÂs_time_t now = NOW(); > > ÂÂÂÂÂBUG_ON( is_idle_vcpu(vc) ); > > ÂÂÂÂÂif ( unlikely(curr_on_cpu(vc->processor) == vc) ) > ÂÂÂÂÂ{ > /* > * spurious wakeup so do nothing as it is > * not removed from replq > ÂÂÂ Â*/ > Kill the comment (and below too), it's pretty useless. > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_running); > ÂÂÂÂÂÂÂÂÂreturn; > ÂÂÂÂÂ} > > ÂÂÂÂÂ/* on RunQ/DepletedQ, just update info is ok */ > ÂÂÂÂÂif ( unlikely(__vcpu_on_q(svc)) ) > ÂÂÂÂÂ{ > /* > * spurious wakeup so do nothing as it is > * not removed from replq > */ > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_onrunq); > ÂÂÂÂÂÂÂÂÂreturn; > ÂÂÂÂÂ} > > ÂÂÂÂÂif ( likely(vcpu_runnable(vc)) ) > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_runnable); > ÂÂÂÂÂelse > ÂÂÂÂÂÂÂÂÂSCHED_STAT_CRANK(vcpu_wake_not_runnable); > > > ÂÂÂÂÂ/* > ÂÂÂÂÂÂ* sleep() might have taken it off from replq before > ÂÂÂÂÂÂ* conext_saved(). Three cases: > ÂÂÂÂÂÂ* 1) sleep on pcpu--> context_saved() --> wake() > ÂÂÂÂÂÂ* 2) sleep on pcpu--> wake() --> context_saved() > ÂÂÂÂÂÂ* 3) sleep before context_saved() -->wake(). > ÂÂÂÂÂÂ* The first two cases sleep() doesn't remove this vcpu > ÂÂÂÂÂÂ* so add a check before inserting. > ÂÂÂÂÂÂ*/ > I'm not sure why you're saying that, in cases 1) and 2), sleep should not remove the replenishment event from the queue... > ÂÂÂÂÂif ( now >= svc->cur_deadline) > ÂÂÂÂÂ{ > ÂÂÂÂÂÂÂÂÂrt_update_deadline(now, svc); > ÂÂÂÂÂÂÂÂÂ__replq_remove(svc); > ÂÂÂÂÂ} > > ÂÂÂÂÂif( !__vcpu_on_replq(svc) ) > ÂÂÂÂÂÂÂÂÂ__replq_insert(svc); > And despite the comment above being a bit unclear to me, I *think* this code is ok. :-) > ÂÂÂÂÂ/* If context hasn't been saved for this vcpu yet, we can't put > it on > ÂÂÂÂÂÂ* the Runqueue/DepletedQ. Instead, we set a flag so that it > will be > ÂÂÂÂÂÂ* put on the Runqueue/DepletedQ after the context has been > saved. > ÂÂÂÂÂÂ*/ > ÂÂÂÂÂif ( unlikely(svc->flags & RTDS_scheduled) ) > ÂÂÂÂÂ{ > ÂÂÂÂÂÂÂÂÂset_bit(__RTDS_delayed_runq_add, &svc->flags); > > ÂÂÂÂÂÂÂÂÂreturn; > ÂÂÂÂÂ} > > ÂÂÂÂÂ/* insert svc to runq/depletedq because svc is not in queue now > */ > ÂÂÂÂÂ__runq_insert(ops, svc); > ÂÂÂÂÂrunq_tickle(ops, snext); > > ÂÂÂÂÂreturn; > } > > > static void > rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) > { > ÂÂÂÂÂstruct rt_vcpu * const svc = rt_vcpu(vc); > > ÂÂÂÂÂBUG_ON( is_idle_vcpu(vc) ); > ÂÂÂÂÂSCHED_STAT_CRANK(vcpu_sleep); > > ÂÂÂÂÂ/* > ÂÂÂÂÂÂ* If a vcpu is not sleeping on a pcpu or a queue, > ÂÂÂÂÂÂ* it should be taken off > ÂÂÂÂÂÂ* from the replenishment queue > ÂÂÂÂÂÂ* Case 1: sleep on a pcpu > ÂÂÂÂÂÂ* Case 2: sleep on a queue > ÂÂÂÂÂÂ* Case 3: sleep before context switch is finished > ÂÂÂÂÂÂ*/ > ÂÂÂÂÂif ( curr_on_cpu(vc->processor) == vc ) > ÂÂÂÂÂÂÂÂÂcpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); > ÂÂÂÂÂelse if ( __vcpu_on_q(svc) ) > ÂÂÂÂÂÂÂÂÂ__q_remove(svc); > ÂÂÂÂÂelse if ( svc->flags & RTDS_delayed_runq_add ) > ÂÂÂÂÂ{ > ÂÂÂÂÂÂÂÂÂclear_bit(__RTDS_delayed_runq_add, &svc->flags); > > ÂÂÂÂÂÂÂÂÂif( __vcpu_on_replq(svc) ) > ÂÂÂÂÂÂÂÂÂÂÂÂ__replq_remove(svc); > ÂÂÂÂÂ} > } > Yep, as I was saying above, why you're only removing from the replenishment queue in the latest case? For instance, if a vcpu is running on a pcpu, and it goes to sleep (the first if), do you want the timer to fire while it is still asleep or not? My impression is that, no, you don't want that, and you compensate for it in rt_vcpu_wake(). Well, if this is the case, then you should remove the replenishment event (and, potentially, stop or reprogram the timer) in any of these cases. Don't you think? Note that the if-s here are not to be matched with the ones in _wake(). As I said, when you sleep you sleep, and there's no such thing as spurious sleep. The reason why we have these if-s, is that different actions need to be undertaken depending on whether the vcpu that is going to sleep was running, in a queue, or waiting to be re-queued. For instance, you only want the pcpu to go through schedule again, only if the vcpu was running. Also, you want to be sure that the vcpu is not queued back in context_saved(), if it was flagged to be. Etc. > > Before, this was done in __repl_update(), which is (and that's a > > good > > thing) no longer there. > > > > But maybe it's me that missed something... > > > > __repl_update() only scans runq and depletedq for replenishment > right?Â > And right after burn_budget(), if the current vcpu runs out of > budget, > snext will be picked here because of: > > /* if scurr has higher priority and budget, still pick scurr */ > if ( !is_idle_vcpu(current) && vcpu_runnable(current) && > ÂÂÂÂÂÂÂscurr->cur_budget > 0 && ( is_idle_vcpu(snext->vcpu) || > ÂÂÂÂÂscurr->cur_deadline <= snext->cur_deadline ) ) > ÂÂÂÂÂsnext = scurr; > > if scurr->cur_budget == 0, snext will be picked immediately. > Is that what you looking for? > Ok, but then scurr need to go to the depleted queue. Double checking, I see that this is happening in __runq_insert(), so, yes, this code also looks fine. Thanks for bearing with me. :-D > > This all sounds fine to me. > > > Great. I will put together v5 soon. If we all agree on the rt_wake() > andÂ > rt_sleep(). > We're probably quite close. :-P Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.