Xen project Mailing List

Re: [Xen-devel] [PATCH v2] xen: sched: introduce hard and soft affinity in credit 2 scheduler

Hi Justin! Glad to see you're already at the stage where you're producing patches, and thanks for sharing them! ;-P A quick note about timing, which is probably pretty bad. :-( This is absolutely not your fault, but we are working on releasing Xen 4.4 at the beginning of 2014, so, until then, the most of the focus would be on bugfixing, rather than implementing and reviewing new features. In particular, George, which is the Xen's scheduler maintainer and the major expert of Credit2 (he wrote it :-)) is really busy with that, as he's the release coordinator, and he'll also be traveling for conferences in January. Add to that the Winter holidays, and I think you get the big picture! :-( That being said, about the code... On sab, 2013-12-14 at 08:15 -1000, Justin Weaver wrote: > Modified function runq_candidate in the credit 2 scheduler to > have it consider hard and soft affinity when choosing the next > vCPU from the run queue to run on the given pCPU. > Ok, and the question is then, is that enough for implementing hard and soft affinities? By 'that' I mean, 'modifying runq_candidate'. Or do we need to do something else, in some other places? Notice that I'm not saying things actually are in one way or the other (although, I do think that this is not enough: e.g., what about choose_cpu() ?). I'm rather saying that I think this information should be present in the changelog. :-) > Function now chooses the vCPU with the most credit that has hard affinity > and maybe soft affinity for the given pCPU. If it does not have soft affinity > and there is another vCPU that prefers to run on the given pCPU, then as long > as it has at least a certain amount of credit (currently defined as half of > CSCHED_CREDIT_INIT, but more testing is needed to determine the best value) > then it is chosen instead. > Ok, so, why this 'certain amount of credit' thing? I got the technical details of it from the code below, but can you spend a few words on why and how you think something like this would be required and/or useful? Oh, and still about the process, no matter how simple it is or will turn out to be, I'd send at least two patches, one for hard affinity and the other one for soft affinity. That would make the whole thing a lot easier to both review (right now) and understand (in future, when looking at git log). > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 4e68375..d337cdd 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -116,6 +116,10 @@ > * ATM, set so that highest-weight VMs can only run for 10ms > * before a reset event. */ > #define CSCHED_CREDIT_INIT MILLISECS(10) > +/* Minimum amount of credit needed for a vcpu with soft > + affinity for a given cpu to be picked from the run queue > + over a vcpu with more credit but only hard affinity. */ > +#define CSCHED_MIN_CREDIT_PREFER_SA MILLISECS(5) > As said above, what is this buying us? What's the big idea behind it? > /* Carryover: How much "extra" credit may be carried over after > * a reset. */ > #define CSCHED_CARRYOVER_MAX CSCHED_MIN_TIMER > @@ -1615,6 +1619,7 @@ runq_candidate(struct csched_runqueue_data *rqd, > { > struct list_head *iter; > struct csched_vcpu *snext = NULL; > + bool_t found_snext_w_hard_affinity = 0; > > /* Default to current if runnable, idle otherwise */ > if ( vcpu_runnable(scurr->vcpu) ) > @@ -1626,6 +1631,11 @@ runq_candidate(struct csched_runqueue_data *rqd, > { > struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, > runq_elem); > > + /* If this is not allowed to run on this processor based on its > + * hard affinity mask, continue to the next vcpu on the run queue */ > + if ( !cpumask_test_cpu(cpu, &svc->cpu_hard_affinity) ) > + continue; > + And, as mentioned above already too, if we don't have hard affinity with this pCPU, how did we get on this runqueue? Obviously, I know how we got here in the present situation... Actually, that's exactly what I meant when saying that there is probably more effort needed somewhere else, to avoid as much as possible for a vCPU to land in the runqueue of a pCPU which is outside of its hard affinity (and soft too, of course). > /* If this is on a different processor, don't pull it unless > * its credit is at least CSCHED_MIGRATE_RESIST higher. */ > if ( svc->vcpu->processor != cpu > @@ -1633,13 +1643,29 @@ runq_candidate(struct csched_runqueue_data *rqd, > continue; > > /* If the next one on the list has more credit than current > - * (or idle, if current is not runnable), choose it. */ > - if ( svc->credit > snext->credit ) > + * (or idle, if current is not runnable), choose it. Only need > + * to do this once since run queue is in credit order. */ > + if ( !found_snext_w_hard_affinity > + && svc->credit > snext->credit ) > + { > + snext = svc; > + found_snext_w_hard_affinity = 1; > + } > + Ok, this is probably the right thing for hard affinity. However... > + /* Is there enough credit left in this vcpu to continue > + * considering soft affinity? */ > + if ( svc->credit < CSCHED_MIN_CREDIT_PREFER_SA ) > + break; > + > + /* Does this vcpu prefer to run on this cpu? */ > + if ( !cpumask_full(svc->cpu_soft_affinity) > + && cpumask_test_cpu(cpu, &svc->cpu_soft_affinity) ) > snext = svc; > + else > + continue; > ... No matter the effect of CSCHED_MIN_CREDIT_PREFER_SA, I wonder whether we're interfering too much with the credit2 algorithm. Consider for example the situation where all but one pCPUs are busy, and assume we have a bunch of vCPUs, at the head of the free pCPU's runqueue, with a great amount of credit, but without soft affinity for the the pCPU. OTOH, there might be vCPUs with way less credits, but with soft affinity to there, and we'd be letting the later(s) run while it were the former(s) that should have, wouldn't we? Of course, if depends on their credits to be greater than CSCHED_MIN_CREDIT_PREFER_SA, but still, this does not look like the right approach to me, at least not at this stage. What I think I'd try to do is as follows: 1) try as hard as possible to make sure that each vCPU is in a runqueue belonging to at least one of the pCPUs it has hard affinity with 2) try hard (but a bit less hard than 1) is fine) to make sure that each vCPU is in a runqueue belonging to at least one of the pCPUs it has soft affiniy with 3) when scheduling (in runq_candidate), scan the runqueue in credit order and pick up the first vCPU that has hard affinity with the pCPU being considered (as you're also doing), but forgetting about soft affinity. Once that is done, we could look at introduce something like CSCHED_MIN_CREDIT_PREFER_SA, as an optimization, and see how it performs. Still as optimizations, we can try to do something clever wrt 1) and 2), e.g., instead of making sure a vCPU lands in a runqueue belonging to at least a vCPU in the affinity mask, we could try to put the vCPU in the runqueue with the bigger intersection between its pCPUs and the domain's affinity, to maximize the probability of the scheduling being quick enough.... But again, this can come later. So, does all this make sense? Thanks again for your work and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.