[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file

To: Dario Faggioli <raistlin@xxxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
Date: Thu, 12 Apr 2012 11:24:46 +0100
Cc: Andre Przywara <andre.przywara@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
Delivery-date: Thu, 12 Apr 2012 10:25:48 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/04/12 14:17, Dario Faggioli wrote:

Let the user specify the NUMA node affinity of a guest via the
'nodes=' config switch. Explicitly listing the intended target host
nodes is required as of now.

A valid setting will directly impact the node_affinity mask
of the guest, i.e., it will change the behaviour of the low level
memory allocator. On the other hand, this commit does not affect
by any means how the guest's vCPUs are scheduled on the host's
pCPUs.

I would probably break this into three separate patches, starting withthe hypervisor, then libxc, then libxl, especially since they tend tohave different maintainers and committers.


TODO: * Better investigate and test interactions with cpupools.

Signed-off-by: Dario Faggioli<dario.faggioli@xxxxxxxxxx>

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -112,6 +112,15 @@ List of which cpus the guest is allowed
  (all vcpus will run on cpus 0,2,3,5), or `cpus=["2", "3"]` (all vcpus
  will run on cpus 2 and 3).

+=item B<nodes=[ NODE, NODE, ...]>
+
+List of the NUMA nodes of the host the guest should be considered
+`affine` with. Being affine to a (set of) NUMA node(s) mainly means
+the guest's memory is going to be allocated on those node(s).
+
+A list of nodes should be specified as follows: `nodes=["0", "3"]`
+for the guest to be affine with nodes 0 and 3.
+

Hmm, I think that using "affine" here is technically correct, and iswhat one would use if writing a research paper; but it's unusual to hearthe word in more common English; it would be more common to hear someonedescribe a VM as "having affinity with". How about something like this:

"The list of NUMA nodes the domain is considered to have affinity with.Memory from the guest will be allocated from these nodes."

(Technically you're also not supposed to end a sentence with apreposition, but I think "...with which the domain is considered to haveaffinity" is just to awkward.)

Also, is there a way to specify that the affinity to be to all nodesand/or based on the vcpu mask of the vcpus?

  =item B<memory=MBYTES>

  Start the guest with MBYTES megabytes of RAM.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -110,6 +110,44 @@ int xc_domain_shutdown(xc_interface *xch
  }


+int xc_domain_node_affinity(xc_interface *xch,
+                            uint32_t domid,
+                            xc_nodemap_t node_affinity)
+{
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BUFFER(uint8_t, local);
+    int ret = -1;
+    int nodesize;
+
+    nodesize = xc_get_nodemap_size(xch);
+    if (!nodesize)
+    {
+        PERROR("Could not get number of nodes");
+        goto out;
+    }
+
+    local = xc_hypercall_buffer_alloc(xch, local, nodesize);
+    if ( local == NULL )
+    {
+        PERROR("Could not allocate memory for domain_node_affinity domctl 
hypercall");
+        goto out;
+    }
+
+    domctl.cmd = XEN_DOMCTL_node_affinity;
+    domctl.domain = (domid_t)domid;
+
+    memcpy(local, node_affinity, nodesize);
+    set_xen_guest_handle(domctl.u.node_affinity.nodemap.bitmap, local);
+    domctl.u.node_affinity.nodemap.nr_elems = nodesize * 8;
+
+    ret = do_domctl(xch,&domctl);
+
+    xc_hypercall_buffer_free(xch, local);
+
+ out:
+    return ret;
+}
+
  int xc_vcpu_setaffinity(xc_interface *xch,
                          uint32_t domid,
                          int vcpu,
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -520,6 +520,19 @@ int xc_watchdog(xc_interface *xch,
                 uint32_t id,
                 uint32_t timeout);

+/**
+ * This function explicitly sets the affinity of a domain with the
+ * host NUMA nodes.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id in which vcpus are to be created.
+ * @parm node_affinity the map of the affine nodes.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_domain_node_affinity(xc_interface *xch,
+                            uint32_t domind,
+                            xc_nodemap_t node_affinity);
+

Seems like it would be useful to have both a get and a set.

  int xc_vcpu_setaffinity(xc_interface *xch,
                          uint32_t domid,
                          int vcpu,
diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py
--- a/tools/libxl/gentest.py
+++ b/tools/libxl/gentest.py
@@ -20,7 +20,7 @@ def randomize_case(s):
  def randomize_enum(e):
      return random.choice([v.name for v in e.values])

-handcoded = ["libxl_cpumap", "libxl_key_value_list",
+handcoded = ["libxl_cpumap", "libxl_nodemap", "libxl_key_value_list",
               "libxl_cpuid_policy_list", "libxl_file_reference",
               "libxl_string_list"]

@@ -119,6 +119,19 @@ static void libxl_cpumap_rand_init(libxl
      }
  }

+static void libxl_nodemap_rand_init(libxl_nodemap *nodemap)
+{
+    int i;
+    nodemap->size = rand() % 16;
+    nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map));
+    libxl_for_each_node(i, *nodemap) {
+        if (rand() % 2)
+            libxl_nodemap_set(nodemap, i);
+        else
+            libxl_nodemap_reset(nodemap, i);
+    }
+}
+
  static void libxl_key_value_list_rand_init(libxl_key_value_list *pkvl)
  {
      int i, nr_kvp = rand() % 16;
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3082,6 +3082,16 @@ int libxl_set_vcpuaffinity_all(libxl_ctx
      return rc;
  }

+int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid,
+                            libxl_nodemap *nodemap)
+{
+    if (xc_domain_node_affinity(ctx->xch, domid, nodemap->map)) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting node affinity");
+        return ERROR_FAIL;
+    }
+    return 0;
+}
+
  int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap)
  {
      GC_INIT(ctx);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -727,6 +727,8 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct
                             libxl_cpumap *cpumap);
  int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
                                 unsigned int max_vcpus, libxl_cpumap *cpumap);
+int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid,
+                            libxl_nodemap *nodemap);

Hmm -- is there really no libxl_get_vcpuaffinity()? That seems to be amissing component...

  int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap 
*cpumap);

  libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -121,6 +121,12 @@ int libxl__domain_build_info_setdefault(
          libxl_cpumap_set_any(&b_info->cpumap);
      }

+    if (!b_info->nodemap.size) {
+        if (libxl_nodemap_alloc(CTX,&b_info->nodemap))
+            return ERROR_NOMEM;
+        libxl_nodemap_set_none(&b_info->nodemap);
+    }
+
      if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT)
          b_info->max_memkb = 32 * 1024;
      if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -67,6 +67,7 @@ int libxl__build_pre(libxl__gc *gc, uint
      char *xs_domid, *con_domid;
      xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus);
      libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,&info->cpumap);
+    libxl_set_node_affinity(ctx, domid,&info->nodemap);
      xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + 
LIBXL_MAXMEM_CONSTANT);
      if (info->type == LIBXL_DOMAIN_TYPE_PV)
          xc_domain_set_memmap_limit(ctx->xch, domid,
diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c
--- a/tools/libxl/libxl_json.c
+++ b/tools/libxl/libxl_json.c
@@ -119,6 +119,26 @@ out:
      return s;
  }

+yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand,
+                                       libxl_nodemap *nodemap)
+{
+    yajl_gen_status s;
+    int i;
+
+    s = yajl_gen_array_open(hand);
+    if (s != yajl_gen_status_ok) goto out;
+
+    libxl_for_each_node(i, *nodemap) {
+        if (libxl_nodemap_test(nodemap, i)) {
+            s = yajl_gen_integer(hand, i);
+            if (s != yajl_gen_status_ok) goto out;
+        }
+    }
+    s = yajl_gen_array_close(hand);
+out:
+    return s;
+}
+
  yajl_gen_status libxl_key_value_list_gen_json(yajl_gen hand,
                                                libxl_key_value_list *pkvl)
  {
diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h
--- a/tools/libxl/libxl_json.h
+++ b/tools/libxl/libxl_json.h
@@ -27,6 +27,7 @@ yajl_gen_status libxl_domid_gen_json(yaj
  yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p);
  yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p);
  yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p);
+yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, libxl_nodemap *p);
  yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand,
                                                   libxl_cpuid_policy_list *p);
  yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list 
*p);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -11,6 +11,7 @@ libxl_domid = Builtin("domid", json_fn =
  libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE)
  libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE)
  libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", 
passby=PASS_BY_REFERENCE)
+libxl_nodemap = Builtin("nodemap", dispose_fn="libxl_nodemap_dispose", 
passby=PASS_BY_REFERENCE)
  libxl_cpuid_policy_list = Builtin("cpuid_policy_list", 
dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE)

  libxl_string_list = Builtin("string_list", 
dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE)
@@ -233,6 +234,7 @@ libxl_domain_build_info = Struct("domain
      ("max_vcpus",       integer),
      ("cur_vcpus",       integer),
      ("cpumap",          libxl_cpumap),
+    ("nodemap",         libxl_nodemap),
      ("tsc_mode",        libxl_tsc_mode),
      ("max_memkb",       MemKB),
      ("target_memkb",    MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -515,7 +515,7 @@ static void parse_config_data(const char
      const char *buf;
      long l;
      XLU_Config *config;
-    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids;
+    XLU_ConfigList *cpus, *nodes, *vbds, *nics, *pcis, *cvfbs, *cpuids;
      int pci_power_mgmt = 0;
      int pci_msitranslate = 1;
      int pci_permissive = 0;
@@ -628,6 +628,39 @@ static void parse_config_data(const char
          free(buf2);
      }

+    if (!xlu_cfg_get_list (config, "nodes",&nodes, 0, 0)) {
+        int i, n_nodes = 0;
+
+        if (libxl_nodemap_alloc(ctx,&b_info->nodemap)) {
+            fprintf(stderr, "Unable to allocate nodemap\n");
+            exit(1);
+        }
+
+        libxl_nodemap_set_none(&b_info->nodemap);
+        while ((buf = xlu_cfg_get_listitem(nodes, n_nodes)) != NULL) {
+            i = atoi(buf);
+            if (!libxl_nodemap_node_valid(&b_info->nodemap, i)) {
+                fprintf(stderr, "node %d illegal\n", i);
+                exit(1);
+            }
+            libxl_nodemap_set(&b_info->nodemap, i);
+            n_nodes++;
+        }
+
+        if (n_nodes == 0)
+            fprintf(stderr, "No valid node found: no affinity will be set\n");
+    }
+    else {
+        /*
+         * XXX What would a sane default be? I think doing nothing
+         *     (i.e., relying on cpu-affinity/cpupool ==>  the current
+         *     behavior) should be just fine.
+         *     That would mean we're saying to the user "if you want us
+         *     to take care of NUMA, please tell us, maybe just putting
+         *     'nodes=auto', but tell us... otherwise we do as usual".
+         */

I think that for this patch, doing nothing is fine (which means removingthe whole else clause). But once we have the auto placement, I thinkthat "nodes=auto" should be the default.

+    }
+
      if (!xlu_cfg_get_long (config, "memory",&l, 0)) {
          b_info->max_memkb = l * 1024;
          b_info->target_memkb = b_info->max_memkb;
diff --git a/tools/python/xen/lowlevel/xl/xl.c 
b/tools/python/xen/lowlevel/xl/xl.c
--- a/tools/python/xen/lowlevel/xl/xl.c
+++ b/tools/python/xen/lowlevel/xl/xl.c
@@ -243,6 +243,18 @@ int attrib__libxl_cpumap_set(PyObject *v
      return 0;
  }

+int attrib__libxl_nodemap_set(PyObject *v, libxl_nodemap *pptr)
+{
+    int i;
+    long node;
+
+    for (i = 0; i<  PyList_Size(v); i++) {
+        node = PyInt_AsLong(PyList_GetItem(v, i));
+        libxl_nodemap_set(pptr, node);
+    }
+    return 0;
+}
+
  int attrib__libxl_file_reference_set(PyObject *v, libxl_file_reference *pptr)
  {
      return genwrap__string_set(v,&pptr->path);
diff --git a/xen/common/domain.c b/xen/common/domain.c
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -222,6 +222,7 @@ struct domain *domain_create(

      spin_lock_init(&d->node_affinity_lock);
      d->node_affinity = NODE_MASK_ALL;
+    d->has_node_affinity = 0;

      spin_lock_init(&d->shutdown_lock);
      d->shutdown_code = -1;
@@ -337,7 +338,7 @@ void domain_update_node_affinity(struct
  {
      cpumask_var_t cpumask;
      cpumask_var_t online_affinity;
-    const cpumask_t *online;
+    const cpumask_t *online = cpupool_online_cpumask(d->cpupool);
      nodemask_t nodemask = NODE_MASK_NONE;
      struct vcpu *v;
      unsigned int node;
@@ -350,9 +351,22 @@ void domain_update_node_affinity(struct
          return;
      }

-    online = cpupool_online_cpumask(d->cpupool);
+    spin_lock(&d->node_affinity_lock);

-    spin_lock(&d->node_affinity_lock);
+    /*
+     * If someone explicitly told us our NUMA affinity, avoid messing
+     * that up. Notice, however, that vcpu affinity is still what
+     * drives all scheduling decisions.
+     *
+     * XXX I'm quite sure this is fine wrt to the various v->cpu_affinity
+     *     (at least, that was the intended design!). Could it be useful
+     *     to cross-check d->node_affinity against `online`? The basic
+     *     idea here is "Hey, if you shoot yourself in the foot... You've
+     *     shot in the foot!", but, you know...
+     */
+    if ( d->has_node_affinity )
+        goto out;
+

      for_each_vcpu ( d, v )
      {
@@ -365,6 +379,8 @@ void domain_update_node_affinity(struct
              node_set(node, nodemask);

      d->node_affinity = nodemask;
+
+out:
      spin_unlock(&d->node_affinity_lock);

      free_cpumask_var(online_affinity);
@@ -372,6 +388,31 @@ void domain_update_node_affinity(struct
  }


+int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity)
+{
+    spin_lock(&d->node_affinity_lock);
+
+    /*
+     * Being/becoming affine to _no_ nodes is not going to work,
+     * let's take it as the `reset node affinity` command.
+     */
+    if ( nodes_empty(*affinity) )
+    {
+        d->has_node_affinity = 0;
+        spin_unlock(&d->node_affinity_lock);
+        domain_update_node_affinity(d);
+        return 0;
+    }
+
+    d->has_node_affinity = 1;
+    d->node_affinity = *affinity;
+
+    spin_unlock(&d->node_affinity_lock);
+
+    return 0;
+}
+
+
  struct domain *get_domain_by_id(domid_t dom)
  {
      struct domain *d;
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -621,6 +621,27 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
      }
      break;

+    case XEN_DOMCTL_node_affinity:
+    {
+        domid_t dom = op->domain;
+        struct domain *d = rcu_lock_domain_by_id(dom);
+        nodemask_t new_affinity;
+
+        ret = -ESRCH;
+        if ( d == NULL )
+            break;
+
+        /* XXX We need an xsm_* for this I guess, right? */

Yes. :-)

+
+        ret = xenctl_nodemap_to_nodemask(&new_affinity,
+&op->u.node_affinity.nodemap);
+        if ( !ret )
+            ret = domain_set_node_affinity(d,&new_affinity);
+
+        rcu_unlock_domain(d);
+    }
+    break;
+
      case XEN_DOMCTL_setvcpuaffinity:
      case XEN_DOMCTL_getvcpuaffinity:
      {
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -217,6 +217,14 @@ static void cpuset_print(char *set, int
      *set++ = '\0';
  }

+static void nodeset_print(char *set, int size, const nodemask_t *mask)
+{
+    *set++ = '[';
+    set += nodelist_scnprintf(set, size-2, mask);
+    *set++ = ']';
+    *set++ = '\0';
+}
+
  static void periodic_timer_print(char *str, int size, uint64_t period)
  {
      if ( period == 0 )
@@ -272,6 +280,9 @@ static void dump_domains(unsigned char k

          dump_pageframe_info(d);

+        nodeset_print(tmpstr, sizeof(tmpstr),&d->node_affinity);
+        printk("NODE affinity for domain %d: %s\n", d->domain_id, tmpstr);
+
          printk("VCPU information and callbacks for domain %u:\n",
                 d->domain_id);
          for_each_vcpu ( d, v )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -278,6 +278,15 @@ typedef struct xen_domctl_getvcpuinfo xe
  DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t);


+/* Set the NUMA node(s) with which the guest is `affine`. */
+/* XEN_DOMCTL_node_affinity */
+struct xen_domctl_node_affinity {
+    struct xenctl_map nodemap;   /* IN */
+};
+typedef struct xen_domctl_node_affinity xen_domctl_node_affinity_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_node_affinity_t);
+
+
  /* Get/set which physical cpus a vcpu can execute on. */
  /* XEN_DOMCTL_setvcpuaffinity */
  /* XEN_DOMCTL_getvcpuaffinity */
@@ -914,6 +923,7 @@ struct xen_domctl {
  #define XEN_DOMCTL_set_access_required           64
  #define XEN_DOMCTL_audit_p2m                     65
  #define XEN_DOMCTL_set_virq_handler              66
+#define XEN_DOMCTL_node_affinity                 67
  #define XEN_DOMCTL_gdbsx_guestmemio            1000
  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -927,6 +937,7 @@ struct xen_domctl {
          struct xen_domctl_getpageframeinfo  getpageframeinfo;
          struct xen_domctl_getpageframeinfo2 getpageframeinfo2;
          struct xen_domctl_getpageframeinfo3 getpageframeinfo3;
+        struct xen_domctl_node_affinity     node_affinity;
          struct xen_domctl_vcpuaffinity      vcpuaffinity;
          struct xen_domctl_shadow_op         shadow_op;
          struct xen_domctl_max_mem           max_mem;
diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h
--- a/xen/include/xen/nodemask.h
+++ b/xen/include/xen/nodemask.h
@@ -8,8 +8,9 @@
   * See detailed comments in the file linux/bitmap.h describing the
   * data type on which these nodemasks are based.
   *
- * For details of nodemask_scnprintf() and nodemask_parse(),
- * see bitmap_scnprintf() and bitmap_parse() in lib/bitmap.c.
+ * For details of nodemask_scnprintf(), nodelist_scnpintf() and
+ * nodemask_parse(), see bitmap_scnprintf() and bitmap_parse()
+ * in lib/bitmap.c.
   *
   * The available nodemask operations are:
   *
@@ -48,6 +49,7 @@
   * unsigned long *nodes_addr(mask)     Array of unsigned long's in mask
   *
   * int nodemask_scnprintf(buf, len, mask) Format nodemask for printing
+ * int nodelist_scnprintf(buf, len, mask) Format nodemask as a list for 
printing
   * int nodemask_parse(ubuf, ulen, mask)        Parse ascii string as nodemask
   *
   * for_each_node_mask(node, mask)      for-loop node over mask
@@ -280,6 +282,14 @@ static inline int __first_unset_node(con

  #define nodes_addr(src) ((src).bits)

+#define nodelist_scnprintf(buf, len, src) \
+                       __nodelist_scnprintf((buf), (len), (src), MAX_NUMNODES)
+static inline int __nodelist_scnprintf(char *buf, int len,
+                                       const nodemask_t *srcp, int nbits)
+{
+       return bitmap_scnlistprintf(buf, len, srcp->bits, nbits);
+}
+
  #if 0
  #define nodemask_scnprintf(buf, len, src) \
                         __nodemask_scnprintf((buf), (len),&(src), MAX_NUMNODES)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -346,8 +346,12 @@ struct domain
      /* Various mem_events */
      struct mem_event_per_domain *mem_event;

-    /* Currently computed from union of all vcpu cpu-affinity masks. */
+    /*
+     * Can be specified by the user. If that is not the case, it is
+     * computed from the union of all the vcpu cpu-affinity masks.
+     */
      nodemask_t node_affinity;
+    int has_node_affinity;

There's something that seems a bit clunky about this; but I can't reallythink of a better way. At every least I'd change the name of theelement here to something more descriptive; perhaps "auto_node_affinity"(which would invert the meaning)?

      unsigned int last_alloc_node;
      spinlock_t node_affinity_lock;
  };
@@ -416,6 +420,7 @@ static inline void get_knownalive_domain
      ASSERT(!(atomic_read(&d->refcnt)&  DOMAIN_DESTROYED));
  }

+int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity);
  void domain_update_node_affinity(struct domain *d);

  struct domain *domain_create(



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
  - From: Dario Faggioli
- [Xen-devel] Formatting of emails which are comments on patches.
  - From: Ian Jackson
- Re: [Xen-devel] [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
  - From: David Vrabel

References:
- [Xen-devel] [PATCH 00 of 10 [RFC]] Automatically place guest on host's NUMA nodes with xl
  - From: Dario Faggioli
- [Xen-devel] [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
  - From: Dario Faggioli

Prev by Date: Re: [Xen-devel] Xen 4.2 Release Plan / TODO
Next by Date: Re: [Xen-devel] [PATCH 06 of 10 [RFC]] xl: Allow user to set or change node affinity on-line
Previous by thread: [Xen-devel] [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
Next by thread: Re: [Xen-devel] [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.