[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity

On 06/10/2014 01:44 AM, Dario Faggioli wrote:
Before this change, each vcpu had its own vcpu-affinity
(in v->cpu_affinity), representing the set of pcpus where
the vcpu is allowed to run. Since when NUMA-aware scheduling
was introduced the (credit1 only, for now) scheduler also
tries as much as it can to run all the vcpus of a domain
on one of the nodes that constitutes the domain's

The idea here is making the mechanism more general by:
   * allowing for this 'preference' for some pcpus/nodes to be
     expressed on a per-vcpu basis, instead than for the domain
     as a whole. That is to say, each vcpu should have its own
     set of preferred pcpus/nodes, instead than it being the
     very same for all the vcpus of the domain;
   * generalizing the idea of 'preferred pcpus' to not only NUMA
     awareness and support. That is to say, independently from
     it being or not (mostly) useful on NUMA systems, it should
     be possible to specify, for each vcpu, a set of pcpus where
     it prefers to run (in addition, and possibly unrelated to,
     the set of pcpus where it is allowed to run).

We will be calling this set of *preferred* pcpus the vcpu's
soft affinity, and this changes introduce it, and starts using it
for scheduling, replacing the indirect use of the domain's NUMA
node-affinity. This is more general, as soft affinity does not
have to be related to NUMA. Nevertheless, it allows to achieve the
same results of NUMA-aware scheduling, just by making soft affinity
equal to the domain's node affinity, for all the vCPUs (e.g.,
from the toolstack).

This also means renaming most of the NUMA-aware scheduling related
functions, in credit1, to something more generic, hinting toward
the concept of soft affinity rather than directly to NUMA awareness.

As a side effects, this simplifies the code quit a bit. In fact,
prior to this change, we needed to cache the translation of
d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
is what scheduling decisions require (we used to keep it in
node_affinity_cpumask). This, and all the complicated logic
required to keep it updated, is not necessary any longer.

The high level description of NUMA placement and scheduling in
docs/misc/xl-numa-placement.markdown is being updated too, to match
the new architecture.

Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Probably should have taken this off; but in any case:

Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.