[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] xen: sched: introduce hard and soft affinity in credit 2 scheduler



Dario,

Sorry for disappearing for so long ... I'm back and ready to continue working.

> A quick note about timing, which is probably pretty bad. :-( This is
> absolutely not your fault, but we are working on releasing Xen 4.4 at
> the beginning of 2014, so, until then, the most of the focus would be on
> bugfixing, rather than implementing and reviewing new features.

Should be OK. I just hope I can get some of my code committed before
the end of my school semester.

> That being said, about the code...
>
> On sab, 2013-12-14 at 08:15 -1000, Justin Weaver wrote:
>> Modified function runq_candidate in the credit 2 scheduler to
>> have it consider hard and soft affinity when choosing the next
>> vCPU from the run queue to run on the given pCPU.
>>
> Ok, and the question is then, is that enough for implementing hard and
> soft affinities? By 'that' I mean, 'modifying runq_candidate'. Or do we
> need to do something else, in some other places?
>
> Notice that I'm not saying things actually are in one way or the other
> (although, I do think that this is not enough: e.g., what about
> choose_cpu() ?). I'm rather saying that I think this information should
> be present in the changelog. :-)

Other functions will need to change, but currently with only one run
queue, only runq_candidate needed to change. I'll look through the
others again with the mindset that we (or maybe I) will fix the issue
that is causing only one run queue to be created despite having
multiple cores/sockets available.

>> Function now chooses the vCPU with the most credit that has hard affinity
>> and maybe soft affinity for the given pCPU. If it does not have soft affinity
>> and there is another vCPU that prefers to run on the given pCPU, then as long
>> as it has at least a certain amount of credit (currently defined as half of
>> CSCHED_CREDIT_INIT, but more testing is needed to determine the best value)
>> then it is chosen instead.
>>
> Ok, so, why this 'certain amount of credit' thing? I got the technical
> details of it from the code below, but can you spend a few words on why
> and how you think something like this would be required and/or useful?

Without a solution like this I believe soft affinity would be ignored.
The next vCPU picked to run on the given pCPU would always be the one
on the run queue with the most credit (how it is now) whether it
prefers to run on the pCPU or not. My thinking (for example) is that
it might be better to pick the vCPU with the second most credit and
which actually prefers to run on the given pCPU over the one with more
credit but with no soft affinity preference (or preference for a
different pCPU).

> Oh, and still about the process, no matter how simple it is or will turn
> out to be, I'd send at least two patches, one for hard affinity and the
> other one for soft affinity. That would make the whole thing a lot
> easier to both review (right now) and understand (in future, when
> looking at git log).

Understood, will do.

>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 4e68375..d337cdd 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -116,6 +116,10 @@
>>   * ATM, set so that highest-weight VMs can only run for 10ms
>>   * before a reset event. */
>>  #define CSCHED_CREDIT_INIT          MILLISECS(10)
>> +/* Minimum amount of credit needed for a vcpu with soft
>> +   affinity for a given cpu to be picked from the run queue
>> +   over a vcpu with more credit but only hard affinity. */
>> +#define CSCHED_MIN_CREDIT_PREFER_SA MILLISECS(5)
>>
> As said above, what is this buying us? What's the big idea behind it?

I believe I answered above, but I can clarify further if necessary.

>
>>  /* Carryover: How much "extra" credit may be carried over after
>>   * a reset. */
>>  #define CSCHED_CARRYOVER_MAX        CSCHED_MIN_TIMER
>> @@ -1615,6 +1619,7 @@ runq_candidate(struct csched_runqueue_data *rqd,
>>  {
>>      struct list_head *iter;
>>      struct csched_vcpu *snext = NULL;
>> +    bool_t found_snext_w_hard_affinity = 0;
>>
>>      /* Default to current if runnable, idle otherwise */
>>      if ( vcpu_runnable(scurr->vcpu) )
>> @@ -1626,6 +1631,11 @@ runq_candidate(struct csched_runqueue_data *rqd,
>>      {
>>          struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, 
>> runq_elem);
>>
>> +        /* If this is not allowed to run on this processor based on its
>> +         * hard affinity mask, continue to the next vcpu on the run queue */
>> +        if ( !cpumask_test_cpu(cpu, &svc->cpu_hard_affinity) )
>> +            continue;
>> +
> And, as mentioned above already too, if we don't have hard affinity with
> this pCPU, how did we get on this runqueue? Obviously, I know how we got
> here in the present situation... Actually, that's exactly what I meant
> when saying that there is probably more effort needed somewhere else, to
> avoid as much as possible for a vCPU to land in the runqueue of a pCPU
> which is outside of its hard affinity (and soft too, of course).

Right, and I'll further examine runq_assign knowing that the single
run queue issue will eventually be fixed.

>
>>          /* If this is on a different processor, don't pull it unless
>>           * its credit is at least CSCHED_MIGRATE_RESIST higher. */
>>          if ( svc->vcpu->processor != cpu
>> @@ -1633,13 +1643,29 @@ runq_candidate(struct csched_runqueue_data *rqd,
>>              continue;
>>
>>          /* If the next one on the list has more credit than current
>> -         * (or idle, if current is not runnable), choose it. */
>> -        if ( svc->credit > snext->credit )
>> +         * (or idle, if current is not runnable), choose it. Only need
>> +         * to do this once since run queue is in credit order. */
>> +        if ( !found_snext_w_hard_affinity
>> +             && svc->credit > snext->credit )
>> +        {
>> +            snext = svc;
>> +            found_snext_w_hard_affinity = 1;
>> +        }
>> +
> Ok, this is probably the right thing for hard affinity. However...
>
>> +        /* Is there enough credit left in this vcpu to continue
>> +         * considering soft affinity? */
>> +        if ( svc->credit < CSCHED_MIN_CREDIT_PREFER_SA )
>> +            break;
>> +
>> +        /* Does this vcpu prefer to run on this cpu? */
>> +        if ( !cpumask_full(svc->cpu_soft_affinity)
>> +             && cpumask_test_cpu(cpu, &svc->cpu_soft_affinity) )
>>              snext = svc;
>> +        else
>> +            continue;
>>
> ... No matter the effect of CSCHED_MIN_CREDIT_PREFER_SA, I wonder
> whether we're interfering too much with the credit2 algorithm.
>
> Consider for example the situation where all but one pCPUs are busy, and
> assume we have a bunch of vCPUs, at the head of the free pCPU's
> runqueue, with a great amount of credit, but without soft affinity for
> the the pCPU. OTOH, there might be vCPUs with way less credits, but with
> soft affinity to there, and we'd be letting the later(s) run while it
> were the former(s) that should have, wouldn't we?
>
> Of course, if depends on their credits to be greater than
> CSCHED_MIN_CREDIT_PREFER_SA, but still, this does not look like the
> right approach to me, at least not at this stage.
>
> What I think I'd try to do is as follows:
>  1) try as hard as possible to make sure that each vCPU is in a runqueue
>     belonging to at least one of the pCPUs it has hard affinity with
>  2) try hard (but a bit less hard than 1) is fine) to make sure that
>     each vCPU is in a runqueue belonging to at least one of the pCPUs it
>     has soft affiniy with
>  3) when scheduling (in runq_candidate), scan the runqueue in credit
>     order and pick up the first vCPU that has hard affinity with the
>     pCPU being considered (as you're also doing), but forgetting about
>     soft affinity.
>
> Once that is done, we could look at introduce something like
> CSCHED_MIN_CREDIT_PREFER_SA, as an optimization, and see how it
> performs.
>
> Still as optimizations, we can try to do something clever wrt 1) and 2),
> e.g., instead of making sure a vCPU lands in a runqueue belonging to at
> least a vCPU in the affinity mask, we could try to put the vCPU in the
> runqueue with the bigger intersection between its pCPUs and the domain's
> affinity, to maximize the probability of the scheduling being quick
> enough.... But again, this can come later.
>
> So, does all this make sense?

Yes, that all makes sense. Thank you for the feedback. I think my work
will be more useful if I can also get the system to start making
multiple run queues like it should, despite the message in the credit
2 comments about there currently only being one queue.

> Thanks again for your work and Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

Justin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.