[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] About vcpu wakeup and runq tickling in credit



Hi George, Everyone,

While reworking a bit my NUMA aware scheduling patches I figured I'm not
sure I understand what __runq_tickle() (in xen/common/sched_credit.c, of
course) does.

Here's the thing. Upon every vcpu wakeup we put the new vcpu in a runq
and then call __runq_tickle(), passing the waking vcpu via 'new'. Let's
call the vcpu that just woke up v_W, and the vcpu that is currently
running on the cpu where that happens v_C. Let's also call the CPU where
all is happening P.

As far as I've understood, in  __runq_tickle(), we:


static inline void
__runq_tickle(unsigned int cpu, struct csched_vcpu *new)
{
    [...]
    cpumask_t mask;

    cpumask_clear(&mask);

    /* If strictly higher priority than current VCPU, signal the CPU */
    if ( new->pri > cur->pri )
    {
        [...]
        cpumask_set_cpu(cpu, &mask);
    }

--> Make sure we put the CPU we are on (P) in 'mask', in case the woken
--> vcpu (v_W) has higher priority that the currently running one (v_C).

    /*
     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
     * let them know there is runnable work in the system...
     */
    if ( cur->pri > CSCHED_PRI_IDLE )
    {
        if ( cpumask_empty(prv->idlers) )
        [...]
        else
        {
            cpumask_t idle_mask;

            cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
            if ( !cpumask_empty(&idle_mask) )
            {
                [...]
                if ( opt_tickle_one_idle )
                {
                    [...]
                    cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
                }
                else
                    cpumask_or(&mask, &mask, &idle_mask);
            }
            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);

--> Make sure we include one or more (depending on opt_tickle_one_idle)
--> CPUs that are both idle and part of v_W's CPU-affinity in 'mask'.

        }
    }

    /* Send scheduler interrupts to designated CPUs */
    if ( !cpumask_empty(&mask) )
        cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ);

--> Ask all the CPUs in 'mask' to reschedule. That would mean all the
--> idlers from v_W's CPU-affinity and, possibly, "ourself" (P). The
--> effect will be that all/some of the CPUs v_W's has affinity with
--> _and_ (let's assume so) P will go through scheduling as quickly as
--> possible.

}

Is the above right?

If yes, here's my question. Is that right to always tickle v_W's affine
CPUs and only them?

I'm asking because a possible scenario, at least according to me, is
that P schedules very quickly after this and, as prio(v_W)>prio(v_C), it
selects v_W and leaves v_C in its runq. At that point, one of the
tickled CPU (say P') enters schedule, sees that P is not idle, and tries
to steal a vcpu from its runq. Now we know that P' has affinity with
v_W, but v_W is not there, while v_C is, and if P' is not in its
affinity, we've forced P' to reschedule for nothing.
Also, there now might be another (or even a number of) CPU where v_C
could run that stays idle, as it has not being tickled.

So, if that is true, it seems we leave some room for sub-optimal CPU
utilization, as well as some non-work conserving windows.
Of course, it is very hard to tell how frequent this actually happens.

As it comes to possible solution, I think that, for instance, tickling
all the CPUs in both v_W's and v_C's affinity masks could solve this,
but that would also potentially increase the overhead (by asking _a_lot_
of CPUs to reschedule), and again, it's hard to say if/when it's
worth...

Actually, going all the way round, i.e., tickling only CPUs with
affinity with v_C (in this case) looks more reasonable, under the
assumption that v_w is going to be scheduled on P soon enough. In
general, that would mean tickling the CPUs in the affinity mask of the
vcpu with smaller priority, but I've not checked how that would interact
with the rest of the scheduling logic yet.

If I got things wrong and/or there's something I missed or overlooked,
please, accept my apologies. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.