[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] xen: sched: introduce hard and soft affinity in credit 2 scheduler



Hi Justin!

Glad to see you're already at the stage where you're producing patches,
and thanks for sharing them! ;-P

A quick note about timing, which is probably pretty bad. :-( This is
absolutely not your fault, but we are working on releasing Xen 4.4 at
the beginning of 2014, so, until then, the most of the focus would be on
bugfixing, rather than implementing and reviewing new features.

In particular, George, which is the Xen's scheduler maintainer and the
major expert of Credit2 (he wrote it :-)) is really busy with that, as
he's the release coordinator, and he'll also be traveling for
conferences in January.

Add to that the Winter holidays, and I think you get the big
picture! :-(

That being said, about the code...

On sab, 2013-12-14 at 08:15 -1000, Justin Weaver wrote:
> Modified function runq_candidate in the credit 2 scheduler to
> have it consider hard and soft affinity when choosing the next
> vCPU from the run queue to run on the given pCPU.
> 
Ok, and the question is then, is that enough for implementing hard and
soft affinities? By 'that' I mean, 'modifying runq_candidate'. Or do we
need to do something else, in some other places?

Notice that I'm not saying things actually are in one way or the other
(although, I do think that this is not enough: e.g., what about
choose_cpu() ?). I'm rather saying that I think this information should
be present in the changelog. :-)

> Function now chooses the vCPU with the most credit that has hard affinity
> and maybe soft affinity for the given pCPU. If it does not have soft affinity
> and there is another vCPU that prefers to run on the given pCPU, then as long
> as it has at least a certain amount of credit (currently defined as half of
> CSCHED_CREDIT_INIT, but more testing is needed to determine the best value)
> then it is chosen instead.
> 
Ok, so, why this 'certain amount of credit' thing? I got the technical
details of it from the code below, but can you spend a few words on why
and how you think something like this would be required and/or useful?

Oh, and still about the process, no matter how simple it is or will turn
out to be, I'd send at least two patches, one for hard affinity and the
other one for soft affinity. That would make the whole thing a lot
easier to both review (right now) and understand (in future, when
looking at git log).

> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 4e68375..d337cdd 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -116,6 +116,10 @@
>   * ATM, set so that highest-weight VMs can only run for 10ms
>   * before a reset event. */
>  #define CSCHED_CREDIT_INIT          MILLISECS(10)
> +/* Minimum amount of credit needed for a vcpu with soft
> +   affinity for a given cpu to be picked from the run queue
> +   over a vcpu with more credit but only hard affinity. */
> +#define CSCHED_MIN_CREDIT_PREFER_SA MILLISECS(5)
>
As said above, what is this buying us? What's the big idea behind it?

>  /* Carryover: How much "extra" credit may be carried over after
>   * a reset. */
>  #define CSCHED_CARRYOVER_MAX        CSCHED_MIN_TIMER
> @@ -1615,6 +1619,7 @@ runq_candidate(struct csched_runqueue_data *rqd,
>  {
>      struct list_head *iter;
>      struct csched_vcpu *snext = NULL;
> +    bool_t found_snext_w_hard_affinity = 0;
>  
>      /* Default to current if runnable, idle otherwise */
>      if ( vcpu_runnable(scurr->vcpu) )
> @@ -1626,6 +1631,11 @@ runq_candidate(struct csched_runqueue_data *rqd,
>      {
>          struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, 
> runq_elem);
>  
> +        /* If this is not allowed to run on this processor based on its
> +         * hard affinity mask, continue to the next vcpu on the run queue */
> +        if ( !cpumask_test_cpu(cpu, &svc->cpu_hard_affinity) )
> +            continue;
> +
And, as mentioned above already too, if we don't have hard affinity with
this pCPU, how did we get on this runqueue? Obviously, I know how we got
here in the present situation... Actually, that's exactly what I meant
when saying that there is probably more effort needed somewhere else, to
avoid as much as possible for a vCPU to land in the runqueue of a pCPU
which is outside of its hard affinity (and soft too, of course).

>          /* If this is on a different processor, don't pull it unless
>           * its credit is at least CSCHED_MIGRATE_RESIST higher. */
>          if ( svc->vcpu->processor != cpu
> @@ -1633,13 +1643,29 @@ runq_candidate(struct csched_runqueue_data *rqd,
>              continue;
>  
>          /* If the next one on the list has more credit than current
> -         * (or idle, if current is not runnable), choose it. */
> -        if ( svc->credit > snext->credit )
> +         * (or idle, if current is not runnable), choose it. Only need
> +         * to do this once since run queue is in credit order. */
> +        if ( !found_snext_w_hard_affinity
> +             && svc->credit > snext->credit )
> +        {
> +            snext = svc;
> +            found_snext_w_hard_affinity = 1;
> +        }
> +
Ok, this is probably the right thing for hard affinity. However...

> +        /* Is there enough credit left in this vcpu to continue 
> +         * considering soft affinity? */ 
> +        if ( svc->credit < CSCHED_MIN_CREDIT_PREFER_SA )
> +            break;
> +
> +        /* Does this vcpu prefer to run on this cpu? */
> +        if ( !cpumask_full(svc->cpu_soft_affinity) 
> +             && cpumask_test_cpu(cpu, &svc->cpu_soft_affinity) )
>              snext = svc;
> +        else
> +            continue;  
>  
... No matter the effect of CSCHED_MIN_CREDIT_PREFER_SA, I wonder
whether we're interfering too much with the credit2 algorithm.

Consider for example the situation where all but one pCPUs are busy, and
assume we have a bunch of vCPUs, at the head of the free pCPU's
runqueue, with a great amount of credit, but without soft affinity for
the the pCPU. OTOH, there might be vCPUs with way less credits, but with
soft affinity to there, and we'd be letting the later(s) run while it
were the former(s) that should have, wouldn't we?

Of course, if depends on their credits to be greater than
CSCHED_MIN_CREDIT_PREFER_SA, but still, this does not look like the
right approach to me, at least not at this stage.

What I think I'd try to do is as follows:
 1) try as hard as possible to make sure that each vCPU is in a runqueue
    belonging to at least one of the pCPUs it has hard affinity with
 2) try hard (but a bit less hard than 1) is fine) to make sure that
    each vCPU is in a runqueue belonging to at least one of the pCPUs it
    has soft affiniy with
 3) when scheduling (in runq_candidate), scan the runqueue in credit
    order and pick up the first vCPU that has hard affinity with the
    pCPU being considered (as you're also doing), but forgetting about
    soft affinity.

Once that is done, we could look at introduce something like
CSCHED_MIN_CREDIT_PREFER_SA, as an optimization, and see how it
performs.

Still as optimizations, we can try to do something clever wrt 1) and 2),
e.g., instead of making sure a vCPU lands in a runqueue belonging to at
least a vCPU in the affinity mask, we could try to put the vCPU in the
runqueue with the bigger intersection between its pCPUs and the domain's
affinity, to maximize the probability of the scheduling being quick
enough.... But again, this can come later.

So, does all this make sense?

Thanks again for your work and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.