[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/4] sched: credit2: consider per-vcpu soft affinity



On 03/26/2015 09:48 AM, Justin T. Weaver wrote:
> when making decisions for vcpus (run queue assignment, run queue migration,
> cpu assignment, and cpu tickling).
> 
> Added soft affinity balancing loops to...
>  * get_fallback_cpu
>  * runq_tickle (one for idle, but not tickled; one for non-idle, and not
>    tickled)
>  * choose_cpu
> 
> choose_cpu now tries to find the run queue with the most cpus in the given
> vcpu's soft affinity. It uses minimum run queue load as a tie breaker.
> 
> Added a function to determine the number of soft cpus gained (or lost) by a
> given vcpu if it is migrated from a given source run queue to a given
> destination run queue.
> 
> Modified algorithm in balance_load and consider...
>  * if the load on lrqd and/or orqd is less than the number of their active
>    cpus, balance_load will look for vcpus that would have their soft affinity
>    improved by being pushed and/or pulled. Load does not need to be considered
>    since a run queue recieveing a pushed or pulled vcpu is not being fully
>    utilized. This returns vcpus that may have been migrated away from their
>    soft affinity due to load balancing back to their soft affinity.
>  * in consider, vcpus that might be picked for migration because pushing or
>    pulling them decreases the load delta are not picked if their current run
>    queue's load is less than its active cpu count and if that migration would
>    harm their soft affinity. There's no need to push/pull if the load is under
>    capacity, and the vcpu would lose access to some or all of its soft cpus.
>  * in consider, if a push/pull/swap migration decreases the load delta by a
>    similar amount to another push/pull/swap migration, then use soft cpu gain
>    as a tie breaker. This allows load to continue to balance across run 
> queues,
>    but favors soft affinity gains if the load deltas are close.
> 
> Signed-off-by: Justin T. Weaver <jtweaver@xxxxxxxxxx>

First of all, thank you for doing the careful work to try to make this
happen.  The logic here is *very* complicated!

One minor thing re the changelog: You should probably mention somewhere
that you're also removing on-stack cpumask_t to use the per-cpu scratch
variable instead.

> ---
> Changes in v3:
>  * get_fallback_cpu: added balance loop to try to find a soft affinity cpu 
>  * runq_tickle: replaced use of local var mask with csched2_cpumask
>  * runq_tickle: added two balance loops, one for finding idle, but not
>    tickled, and other for finding non-idle with lowest credit
>  * choose_cpu: added balance loop to find cpu for given vcpu that has most
>    soft cpus (with run queue load being a tie breaker), or if none were found,
>    or not considering soft affinity, pick cpu from runq with least load
>  * balance_load / consider: removed code that ignored a migration if it meant
>    moving a vcpu away from its soft affinity; added migration of vcpus to
>    improve their soft affinity if the destination run queue was under load;
>    added check in consider, if current run queue is under load and migration
>    would hurt the vcpu's soft affinity, do not consider the migration; added
>    soft affinity tie breaker in consider if current load delta and consider
>    load delta are close
>  * added helper functions for soft affinity related changes to balance_load
> Changes in v2:
>  * Not submitted in version 2; focus was on the hard affinity patch
> ---
>  xen/common/sched_credit2.c |  344 
> ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 313 insertions(+), 31 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index bbcfbf2..47d0bad 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -127,6 +127,14 @@
>  #define CSCHED2_CREDIT_RESET         0
>  /* Max timer: Maximum time a guest can be run for. */
>  #define CSCHED2_MAX_TIMER            MILLISECS(2)
> +/* Used in balance_load to specify migration direction. */
> +#define CSCHED2_PULL                 0
> +#define CSCHED2_PUSH                 1
> +/*
> + * Used in balance_load to decide if deltas are close enough to use soft
> + * affinity as a tie breaker.
> + */
> +#define CSCHED2_DIVIDE_BY_16         4
>  
>  
>  #define CSCHED2_IDLE_CREDIT                 (-(1<<30))
> @@ -288,15 +296,33 @@ struct csched2_dom {
>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)

Don't forget to update the comment. :-)

>  {
> +    int balance_step;
> +
>      if ( likely(cpumask_test_cpu(svc->vcpu->processor,
>          svc->vcpu->cpu_hard_affinity)) )
>          return svc->vcpu->processor;
>  
> -    cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
> -        &svc->rqd->active);
> -    if ( cpumask_empty(csched2_cpumask) )
> -        cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
> -            VCPU2ONLINE(svc->vcpu));
> +    for_each_sched_balance_step( balance_step )
> +    {
> +        if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +            && !__vcpu_has_soft_affinity(svc->vcpu,
> +                svc->vcpu->cpu_hard_affinity) )
> +            continue;
> +
> +        sched_balance_cpumask(svc->vcpu, balance_step, csched2_cpumask);
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &svc->rqd->active);
> +        if ( !cpumask_empty(csched2_cpumask) )
> +            break;
> +        else
> +        {
> +            sched_balance_cpumask(svc->vcpu, balance_step, csched2_cpumask);
> +            cpumask_and(csched2_cpumask, csched2_cpumask,
> +                VCPU2ONLINE(svc->vcpu));
> +            if ( !cpumask_empty(csched2_cpumask) )
> +                break;
> +        }

I think we don't need the second half here to be in an "else" clause --
if the statement in the if() is true, then it will break out of the loop
and not execute any further.

> +    }
> +
>      ASSERT( !cpumask_empty(csched2_cpumask) );
>  
>      return cpumask_any(csched2_cpumask);
> @@ -516,8 +542,8 @@ runq_tickle(const struct scheduler *ops, unsigned int 
> cpu, struct csched2_vcpu *
>      int i, ipid=-1;
>      s_time_t lowest=(1<<30);
>      struct csched2_runqueue_data *rqd = RQD(ops, cpu);
> -    cpumask_t mask;
>      struct csched2_vcpu * cur;
> +    int balance_step;
>  
>      d2printk("rqt %pv curr %pv\n", new->vcpu, current);
>  
> @@ -534,26 +560,43 @@ runq_tickle(const struct scheduler *ops, unsigned int 
> cpu, struct csched2_vcpu *
>          goto tickle;
>      }
>      
> +    for_each_sched_balance_step ( balance_step )
> +    {
> +        if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +            && !__vcpu_has_soft_affinity(new->vcpu,
> +                new->vcpu->cpu_hard_affinity) )
> +            continue;
> +
>          /* Get a mask of idle, but not tickled, that new is allowed to run 
> on. */

I think we want this comment moved above the for_each_ loop; and
probably say something like

/* Look for an idle, untickled cpu in the vcpu's soft affinity, then its
hard affinity. */

(Formated properly, obviously)

> -        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> -        cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    
> +        sched_balance_cpumask(new->vcpu, balance_step, csched2_cpumask);
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &rqd->idle);
> +        cpumask_andnot(csched2_cpumask, csched2_cpumask, &rqd->tickled);
> +
>          /* If it's not empty, choose one */
> -        i = cpumask_cycle(cpu, &mask);
> +        i = cpumask_cycle(cpu, csched2_cpumask);
>          if ( i < nr_cpu_ids )
>          {
>              ipid = i;
>              goto tickle;
>          }
> +    }
>  
>      /* Otherwise, look for the non-idle cpu with the lowest credit,
>       * skipping cpus which have been tickled but not scheduled yet,
>       * that new is allowed to run on. */
> -        cpumask_andnot(&mask, &rqd->active, &rqd->idle);
> -        cpumask_andnot(&mask, &mask, &rqd->tickled);
> -        cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    for_each_sched_balance_step ( balance_step )
> +    {
> +        if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +            && !__vcpu_has_soft_affinity(new->vcpu,
> +                new->vcpu->cpu_hard_affinity) )
> +            continue;
> +
> +        sched_balance_cpumask(new->vcpu, balance_step, csched2_cpumask);
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &rqd->active);
> +        cpumask_andnot(csched2_cpumask, csched2_cpumask, &rqd->idle);
> +        cpumask_andnot(csched2_cpumask, csched2_cpumask, &rqd->tickled);
>  
> -        for_each_cpu(i, &mask)
> +        for_each_cpu(i, csched2_cpumask)

So here you go through this loop looking for the lowest credit once for
soft affinity and once for hard affinity; but you don't break out of the
loop when you're doing soft affinity, so unless I'm missing something,
the effect is the same as just doing the hard affinity (since soft is
guaranteed to be a subset of hard).

Up till this point, we are favoring (in order):
 1. idle, non-tickled vcpus in soft affinity
 2. idle, non-tickled vcpus in hard affinity

By putting this down here, we're already favoring idle vcpus in the hard
affinity over non-idle vcpus in the soft affinity.  I think that makes
sense.

We could at this point just go through looking for the lowest credit in
the hard affinity, getting rid of the loop entirely.  That gives us:
 3: cpu whose currently running vcpu has the lowest credit

But it might be worth doing the loop for the "soft" step first, and then
checking whether the cpu with the lowest credit we've found yet is lower
than new's credit.  If so, it would allow the vcpu to run immediately,
while staying within its soft affinity.  Only if the lowest credit we've
seen is higher than the vcpu we're looking to place should we then go
find the lowest one in the whole hard affinity.

That would give us instead:
 3: cpu in the soft affinity whose currently running vcpu has the lowest
credit lower than the vcpu we're looking to place
 4: cpu in the hard affinity whose currently running vcpu has the lowest
credit

Whatever we choose, we probably want to make sure there's a comment
explaining the high-level view.

---

OK, that's enough for one day; I'll get to the rest either tomorrow or
next week.

You know, you make 4 changes here; in get_fallback_cpu, runq_tickle,
choose_cpu, and balance_load.  Each of them are actually quite
independent, and many of them quite complicated.

I think it might be a good idea, when you resend them, to break it down
into four patches.  It's a bit unorthodox to send a patch which only
partially implements a feature like soft affinity; but in this case I
think it makes sense.  That way we can discuss, review, and sign-off on
each one individually.  And there will also be space in each changelog
to describe in more detail what's being done and why.

What do you (both Justin and Dario) think?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.