[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Tue, 12 Apr 2022 16:11:47 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ngNkXe9ZAZvV8eAIl13VX/nZiyJuRCXIKxcAeBaVggg=; b=dbcv0DwvLUFiqrcYG8kAjkORMY2cnF0esZQNBRnV8aUWgtES0RfHYA/IBCLWl2AqnULPBe4ffbfx2lrTa4kOsdez5Vt6wHYyLkw8Vh8xtqpkA8Nd42lNy2crag8oRvSp1oDa7IyjTizOzW2r4VEmNAiOgisXISDhDnpqHgby1f4lIVy7vKSeeOPJ+FqrxgSsjGDbjOVQdzThO884WED/JaDcNopS4ue+Zk/nu8prv1Y5e8cF7j/CyV3rTx/YfohTcC229sz/FWvS02e77U5YSdXImLSgpp2aFSby6b9zt72TRQzPdmTySu8tSJrD7rdE+GSGvJkKz+n9o9Q9crVQ8g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OxqVN3+phbfgp1ydfnyTnoThUh6eWkiTfKLvhlWCw7SiMAxlGm4u90mQL/QUkU1pGlHGiGF3rgG07I3UxvPTBDk7WsULPBZIkpdEtqvjlagloq8jWCT7DuvQtGpb+Lm931x2dzElo8tBcQ4528+ZQujhWy8+nppKh6MsZ2mhQT1xDBVKWggl/9mdCn68Ic4OBVwp27aEe2Ayz2ODYTFfWUKxFMFDZlZqOPc6lA2GlSe5lO9HOUjQxQbKdlt1Ji/B+333nOuERtXb+Xc9mPFVJihYnrvt545mriMbYXb1o65BztadGmqEFTq0iyipSd0kUViAXPESMchVa8yghKZnKg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 12 Apr 2022 16:11:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYSoM0M4Zi9SzqTkOjTOtQV+PP7azl4C2AgAAVUwCABn7lAIAABpgA
  • Thread-topic: [PATCH] x86: make "dom0_nodes=" work with credit2

On Tue, 2022-04-12 at 15:48 +0000, Dario Faggioli wrote:
> On Fri, 2022-04-08 at 14:36 +0200, Jan Beulich wrote:
> 
> 
> And while doing that, I think we should consolidate touching the
> affinity only there, avoiding altering it twice. After all, we
> already
> know how it should look like, so let's go for it.
> 
> I'll send a patch to that effect, to show what I mean with this. 
> 
Here it is.

It's tested, with a few combinations of dom0_nodes and dom0_vcpus_pin
being there or not, and it survived (and behave as I would expect it
too) all of them :-)

I haven't tested yet (and can't test easily) the pv_shim case. I think
it's fine, but I'm adding Roger, to see if he can confirm that...
---
From: Dario Faggioli <dfaggioli@xxxxxxxx>
Subject: [PATCH 1/2] xen/sched: setup dom0 vCPUs affinity only once

Right now, affinity for dom0 vCPUs is setup in two steps. This is a
problem as, at least in Credit2, unit_insert() sees and uses the
"intermediate" affinity, and place the vCPUs on CPUs where they cannot
be run, resulting in the boot to hang, if the "dom0_nodes" parameter
is used.

Fix this by setting up the affinity properly once and for all, in
sched_init_vcpu(), called by create_vcpu().

Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx>
---
* Changelog is RFC!
---
 xen/common/sched/core.c    | 59 +++++++++++++++++++++++++-------------
 xen/common/sched/credit2.c |  8 ++++--
 2 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 19ab678181..dc2ed890e0 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -572,11 +572,41 @@ int sched_init_vcpu(struct vcpu *v)
     }
 
     /*
-     * Initialize affinity settings. The idler, and potentially
-     * domain-0 VCPUs, are pinned onto their respective physical CPUs.
+     * Initialize affinity settings. By doing this before the unit is
+     * inserted in the scheduler runqueues (by the call to sched_insert_unit(),
+     * at the end of the function, we are sure that it will be put on an
+     * appropriate CPU.
      */
-    if ( is_idle_domain(d) || (is_hardware_domain(d) && opt_dom0_vcpus_pin) )
+    if ( pv_shim && v->vcpu_id == 0 )
+    {
+        /*
+         * PV-shim: vcpus are pinned 1:1. Initially only 1 cpu is online,
+         * others will be dealt with when onlining them. This avoids pinning
+         * a vcpu to a not yet online cpu here.
+         */
+        sched_set_affinity(unit, cpumask_of(0), cpumask_of(0));
+    }
+    else if ( is_idle_domain(d) || (is_hardware_domain(d) && 
opt_dom0_vcpus_pin) )
+    {
+        /*
+         * The idler, and potentially domain-0 VCPUs, are pinned onto their
+         * respective physical CPUs.
+         */
         sched_set_affinity(unit, cpumask_of(processor), &cpumask_all);
+    }
+    else if ( is_hardware_domain(d) )
+    {
+        /*
+         * In absence of dom0_vcpus_pin, the hard and soft affinity of
+         * domain-0 is controlled by the dom0_nodes parameter. At this point
+         * it has been parsed and decoded, and we have the result of that
+         * in the dom0_cpus mask.
+         */
+        if ( !dom0_affinity_relaxed )
+            sched_set_affinity(unit, &dom0_cpus, &cpumask_all);
+        else
+            sched_set_affinity(unit, &cpumask_all, &dom0_cpus);
+    }
     else
         sched_set_affinity(unit, &cpumask_all, &cpumask_all);
 
@@ -3386,29 +3416,18 @@ void wait(void)
 void __init sched_setup_dom0_vcpus(struct domain *d)
 {
     unsigned int i;
-    struct sched_unit *unit;
 
     for ( i = 1; i < d->max_vcpus; i++ )
         vcpu_create(d, i);
 
     /*
-     * PV-shim: vcpus are pinned 1:1.
-     * Initially only 1 cpu is online, others will be dealt with when
-     * onlining them. This avoids pinning a vcpu to a not yet online cpu here.
+     * sched_vcpu_init(), called by vcpu_create(), will setup the hard and
+     * soft affinity of all the vCPUs, by calling sched_set_affinity() on each
+     * one of them. We can now make sure that the domain's node affinity is
+     * also updated accordingly, and we can do that here, once and for all
+     * (which is more efficient than calling domain_update_node_affinity()
+     * on all the vCPUs).
      */
-    if ( pv_shim )
-        sched_set_affinity(d->vcpu[0]->sched_unit,
-                           cpumask_of(0), cpumask_of(0));
-    else
-    {
-        for_each_sched_unit ( d, unit )
-        {
-            if ( !opt_dom0_vcpus_pin && !dom0_affinity_relaxed )
-                sched_set_affinity(unit, &dom0_cpus, NULL);
-            sched_set_affinity(unit, NULL, &dom0_cpus);
-        }
-    }
-
     domain_update_node_affinity(d);
 }
 #endif
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 0e3f89e537..ac5f8b8820 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -749,10 +749,12 @@ static int get_fallback_cpu(struct csched2_unit *svc)
 
         /*
          * This is cases 2 or 4 (depending on bs): v->processor isn't there
-         * any longer, check if we at least can stay in our current runq.
+         * any longer, check if we at least can stay in our current runq,
+        * if we have any (e.g., we don't yet, if we get here when a unit
+        * is inserted for the very first time).
          */
-        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
-                                       &svc->rqd->active)) )
+        if ( likely(svc->rqd && cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                                   &svc->rqd->active)) )
         {
             cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->rqd->active);
-- 
2.35.1
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.