[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] VCPUs-at-startup and VCPUs-max with NUMA node affinity

  • To: James Bulpin <James.Bulpin@xxxxxxxxxxxxx>, "xen-api@xxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxx>
  • From: Dave Scott <Dave.Scott@xxxxxxxxxxxxx>
  • Date: Wed, 30 May 2012 10:21:41 +0100
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • Delivery-date: Wed, 30 May 2012 09:22:01 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>
  • Thread-index: Ac081xDWKDPUgv0eQOe9upOioCwsEgAu3EBQACxGJaA=
  • Thread-topic: VCPUs-at-startup and VCPUs-max with NUMA node affinity


James wrote:
> I'm thinking about the interaction of xapi's vCPU management and the
> future Xen automatic NUMA placement
> (http://blog.xen.org/index.php/2012/05/16/numa-and-xen-part-ii-
> scheduling-and-placement/). If a VM has an equal or smaller number of
> vCPUs than a NUMA node has pCPUs then it makes sense for that VM to
> have NUMA node affinity. But then what happens if vCPUs are hotplugged
> to the VM and it now has more vCPUs than the node has pCPUs? I can see
> several options here:
>   1. The node is over-provisioned in that the VM's vCPUs contend with
> each other for the pCPUs - not good

I agree, this doesn't sound good to me either.

>   2. The CPU affinity is dropped allowing vCPUs to run on any node -
> the memory is still on the original node so now we've got a poor
> placement for vCPUs that happen to end up running on other nodes. This
> also leads to additional interconnect traffic and possible cache line
> ping-pong.

This also sounds pretty bad -- it would have been better to stripe the memory 
across all the banks in the first place!

>   3. The vCPUs that cannot fit on the node are given no affinity but
> those that can retain their node affinity - leads to some vCPUs being
> better performing than others due to memory (non-)locality. This also
> leads to some additional interconnect traffic and possible cache line
> ping-pong.
>   4. We never let this happen because we only allow node affinity to be
> set for the maximum vCPU count a VM may have during this boot (VCPUs-
> max; options 1 to 3 above use VCPUs-at-startup to decide whether to use
> node affinity).
> I'm tempted by #4 because it avoids having to make difficult and
> workload dependent decisions when changing vCPU counts. My guess is
> that many users will have VMs with VCPUs-at-startup==VCPUs-max so it
> becomes a non-issue anyway.

I agree, this looks like the best solution to me. Also since we only support 
vCPU hotplug for PV guests, all HVM guests implicity have 
VCPUs-at-startup=VCPUs-max, so that's definitely a fairly common scenario.

> My only real concern is that if users
> regularly run VMs with small VCPUs-at-startup but with VCPUs-max being
> the number of pCPUs in the box, i.e. allowing them to hotplug up to the
> full resource of the box.
> And a related question: when xapi/xenopsd builds a domain does it have
> to tell Xen about VCPUs-max or just the number of vCPUs required right
> now?

IIRC the domain builder needs to know the VCPUs-max. VCPUs-at-startup is 
implemented by a protocol over xenstore where there's a directory:

cpu = ""
 0 = ""
  availability = "online"
 1 = ""
  Availability = "online"

Which tells the PV kernel that it should disable/hotunplug (or not) certain 
vCPUs. I'm not sure but I imagine the guest receives the xenstore watch event, 
deregisters the vCPU with its scheduler and then issues a hypercall telling xen 
to stop scheduling the vCPU too. It's certainly has to be a co-operative thing, 
since if xen just stopped scheduling a vCPU that would probably have some bad 
effects on the guest :) It's slightly odd that the protocol allows per-vCPU 
control, when I'm not convinced that you can meaningfully tell them apart.


Xen-api mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.