[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RESEND 05/12] xen: numa-sched: make space for per-vcpu node-affinity



On mar, 2013-11-05 at 17:16 +0000, George Dunlap wrote:
> Just to outline what the alternative would look like:  The hypervisor 
> would focus on the minimum mechanisms required to do something useful 
> for NUMA systems.  The domain NUMA affinity would be only used for 
> memory allocation.  vcpus would only have "hard" and "soft" affinities. 
>   The toolstack (libxl? xl?) would be responsible for stitching these 
> together into a useable interface for NUMA: e.g., it would have the 
> concept of "numa affinity" for vcpus (or indeed, virtual NUMA 
> topologies), and would do things like update the domain NUMA affinity 
> based on vcpu affinities.
> 
> This would mean the toolstack either assuming, when someone calls 
> vcpu_set_node_affinity, that soft_affinity == numa_affinity, or keeping 
> its own copy of numa_affinity for each vcpu around somewhere.
> 
And to elaborate a bit more what I said yesterday night, now that I have
the code in front of me, going for the above would actually mean the
following.

In domain.c we have domain_update_node_affinity(). What it does *before*
this series is calculating d->node_affinity basing on all the vcpu's
cpu_affinity (i.e., pinning). What it does *after* this series is
calculating d->node_affinity besing on _vcpu's_ node_affinity. (*)

Such function is currently called, basically, when a new vcpu is
allocated (alloc_vcpu()), when a domain changes cpupool
(sched_move_domain()), when the cpupool the domain is in changes
(cpupool_assign_cpu_locked() or cpupool_unassign_cpu(). That means that
all the above operations _automatically_ affect d->node_affinity.

Now, we're talking about killing vc->cpu_affinity and not introducing
vc->node_affinity and, instead, introduce vc->cpu_hard_affinity and
vc->cpu_soft_affinity and, more important, not to link any of the above
to d->node_affinity. That means all the above operations _will_NOT_
automatically affect d->node_affinity any longer, at least from the
hypervisor (and, most likely, libxc) perspective. OTOH, I'm almost sure
that I can force libxl (and xl) to retain the exact same behaviour it is
exposing to the user (just by adding an extra call when needed).

So, although all this won't be an issue for xl and libxl consumers (or,
at least, that's my goal), it will change how the hypervisor used to
behave in all those situations. This means that xl and libxl users will
see no change, while folks issuing hypercalls and/or libxc calls will.

Is that ok? I mean, I know there are no stability concerns for those
APIs, but still, is that an acceptable change?

Regards,
Dario

(*) yes, in both cases (before and after this series), it is possible
already that d->node_affinity is not automatically calculated, but that
it just stick to something the toolstack provided. That will stay, so
it's pretty much irrelevant to this discussion... Actually, it won't
just "stay", it will become the sole and only case!

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.