Xen project Mailing List

[Xen-devel] About vcpu wakeup and runq tickling in credit

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: Dario Faggioli <raistlin@xxxxxxxx>

Date: Tue, 23 Oct 2012 15:34:20 +0200

Cc: Keir Fraser <keir.xen@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 23 Oct 2012 13:35:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi George, Everyone, While reworking a bit my NUMA aware scheduling patches I figured I'm not sure I understand what __runq_tickle() (in xen/common/sched_credit.c, of course) does. Here's the thing. Upon every vcpu wakeup we put the new vcpu in a runq and then call __runq_tickle(), passing the waking vcpu via 'new'. Let's call the vcpu that just woke up v_W, and the vcpu that is currently running on the cpu where that happens v_C. Let's also call the CPU where all is happening P. As far as I've understood, in __runq_tickle(), we: static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new) { [...] cpumask_t mask; cpumask_clear(&mask); /* If strictly higher priority than current VCPU, signal the CPU */ if ( new->pri > cur->pri ) { [...] cpumask_set_cpu(cpu, &mask); } --> Make sure we put the CPU we are on (P) in 'mask', in case the woken --> vcpu (v_W) has higher priority that the currently running one (v_C). /* * If this CPU has at least two runnable VCPUs, we tickle any idlers to * let them know there is runnable work in the system... */ if ( cur->pri > CSCHED_PRI_IDLE ) { if ( cpumask_empty(prv->idlers) ) [...] else { cpumask_t idle_mask; cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity); if ( !cpumask_empty(&idle_mask) ) { [...] if ( opt_tickle_one_idle ) { [...] cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask); } else cpumask_or(&mask, &mask, &idle_mask); } cpumask_and(&mask, &mask, new->vcpu->cpu_affinity); --> Make sure we include one or more (depending on opt_tickle_one_idle) --> CPUs that are both idle and part of v_W's CPU-affinity in 'mask'. } } /* Send scheduler interrupts to designated CPUs */ if ( !cpumask_empty(&mask) ) cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ); --> Ask all the CPUs in 'mask' to reschedule. That would mean all the --> idlers from v_W's CPU-affinity and, possibly, "ourself" (P). The --> effect will be that all/some of the CPUs v_W's has affinity with --> _and_ (let's assume so) P will go through scheduling as quickly as --> possible. } Is the above right? If yes, here's my question. Is that right to always tickle v_W's affine CPUs and only them? I'm asking because a possible scenario, at least according to me, is that P schedules very quickly after this and, as prio(v_W)>prio(v_C), it selects v_W and leaves v_C in its runq. At that point, one of the tickled CPU (say P') enters schedule, sees that P is not idle, and tries to steal a vcpu from its runq. Now we know that P' has affinity with v_W, but v_W is not there, while v_C is, and if P' is not in its affinity, we've forced P' to reschedule for nothing. Also, there now might be another (or even a number of) CPU where v_C could run that stays idle, as it has not being tickled. So, if that is true, it seems we leave some room for sub-optimal CPU utilization, as well as some non-work conserving windows. Of course, it is very hard to tell how frequent this actually happens. As it comes to possible solution, I think that, for instance, tickling all the CPUs in both v_W's and v_C's affinity masks could solve this, but that would also potentially increase the overhead (by asking _a_lot_ of CPUs to reschedule), and again, it's hard to say if/when it's worth... Actually, going all the way round, i.e., tickling only CPUs with affinity with v_C (in this case) looks more reasonable, under the assumption that v_w is going to be scheduled on P soon enough. In general, that would mean tickling the CPUs in the affinity mask of the vcpu with smaller priority, but I've not checked how that would interact with the rest of the scheduling logic yet. If I got things wrong and/or there's something I missed or overlooked, please, accept my apologies. :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.