Xen project Mailing List

Re: [Xen-API] VCPUs-at-startup and VCPUs-max with NUMA node affinity

To: James Bulpin <James.Bulpin@xxxxxxxxxxxxx>, "xen-api@xxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxx>

From: Dave Scott <Dave.Scott@xxxxxxxxxxxxx>

Date: Wed, 30 May 2012 10:21:41 +0100

Accept-language: en-US

Acceptlanguage: en-US

Delivery-date: Wed, 30 May 2012 09:22:01 +0000

List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

Thread-index: Ac081xDWKDPUgv0eQOe9upOioCwsEgAu3EBQACxGJaA=

Thread-topic: VCPUs-at-startup and VCPUs-max with NUMA node affinity

Hi, James wrote: > I'm thinking about the interaction of xapi's vCPU management and the > future Xen automatic NUMA placement > (http://blog.xen.org/index.php/2012/05/16/numa-and-xen-part-ii- > scheduling-and-placement/). If a VM has an equal or smaller number of > vCPUs than a NUMA node has pCPUs then it makes sense for that VM to > have NUMA node affinity. But then what happens if vCPUs are hotplugged > to the VM and it now has more vCPUs than the node has pCPUs? I can see > several options here: > > 1. The node is over-provisioned in that the VM's vCPUs contend with > each other for the pCPUs - not good I agree, this doesn't sound good to me either. > 2. The CPU affinity is dropped allowing vCPUs to run on any node - > the memory is still on the original node so now we've got a poor > placement for vCPUs that happen to end up running on other nodes. This > also leads to additional interconnect traffic and possible cache line > ping-pong. This also sounds pretty bad -- it would have been better to stripe the memory across all the banks in the first place! > 3. The vCPUs that cannot fit on the node are given no affinity but > those that can retain their node affinity - leads to some vCPUs being > better performing than others due to memory (non-)locality. This also > leads to some additional interconnect traffic and possible cache line > ping-pong. > > 4. We never let this happen because we only allow node affinity to be > set for the maximum vCPU count a VM may have during this boot (VCPUs- > max; options 1 to 3 above use VCPUs-at-startup to decide whether to use > node affinity). > > I'm tempted by #4 because it avoids having to make difficult and > workload dependent decisions when changing vCPU counts. My guess is > that many users will have VMs with VCPUs-at-startup==VCPUs-max so it > becomes a non-issue anyway. I agree, this looks like the best solution to me. Also since we only support vCPU hotplug for PV guests, all HVM guests implicity have VCPUs-at-startup=VCPUs-max, so that's definitely a fairly common scenario. > My only real concern is that if users > regularly run VMs with small VCPUs-at-startup but with VCPUs-max being > the number of pCPUs in the box, i.e. allowing them to hotplug up to the > full resource of the box. > > And a related question: when xapi/xenopsd builds a domain does it have > to tell Xen about VCPUs-max or just the number of vCPUs required right > now? IIRC the domain builder needs to know the VCPUs-max. VCPUs-at-startup is implemented by a protocol over xenstore where there's a directory: cpu = "" 0 = "" availability = "online" 1 = "" Availability = "online" Which tells the PV kernel that it should disable/hotunplug (or not) certain vCPUs. I'm not sure but I imagine the guest receives the xenstore watch event, deregisters the vCPU with its scheduler and then issues a hypercall telling xen to stop scheduling the vCPU too. It's certainly has to be a co-operative thing, since if xen just stopped scheduling a vCPU that would probably have some bad effects on the guest :) It's slightly odd that the protocol allows per-vCPU control, when I'm not convinced that you can meaningfully tell them apart. Cheers, Dave _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.