[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/1] xen/arm: fix sparse cpu_possible_map calculation on SMP boot


  • To: Hirokazu Takahashi <taka@xxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Orzel, Michal" <michal.orzel@xxxxxxx>
  • Date: Tue, 30 Jun 2026 11:43:17 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=valinux.co.jp smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SmC41VZDSSKkR/nF7wXt4h4HOKsQ2FjHxiXrPsZ72B8=; b=nQjXWvtXEwxhQgBqstFllXY/jm3o9mYFSilezYu1C+AirErmcdxmgb0p7GSPrAOLS8fFP7/C1QV+wYVe2JuE6KmyCMUAyHw7dQwbAaooNb1A4M74m91sXVudaC+T7XCFAuwy3lSET3EEOWdTPNcq2oJ6zqJuicpJ29LtLDgXLPh14lppi42ydITe1+sUbmBb9XUzH7U7eICezY1rZcBaWsApK/+MlfOm5MgI/QeSx4fN2uF/MIKwg0uVRcmFQ6Za4Yj5VW1nSQjv7tERfCFyaw+VPlsG8kXnas8qwkd57SMQM2Ea0G1r1G9VF/GmI+LYoRagLiusxU5YAnYZVH5Skg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=oyDVeaOZL76u4dCvcGSTOpow5Hgj0l1SIUOK6Xvh88iAJqI1/oPSI2U2/VEldQteIwfb9wXAwdblo3/plZZDpJu69i73SjGnjrjv9nDlkrEtgKPPhSm8sJzQWRB2PbDLVi1YYmEbWSY3/qiBnc8R8p458x3ncS+8ZWT2/c94jxBjlR9JLEOvp4Beo2/GcJcRJj7ld0Qvy4POFhnkBrIQJC/k42BS7GBzD1nI6e93KMPENyM7x5NwvNtup1kEwQx2HwrEOS2PMojEvNr8LfT49r00M/EdoPd6DDbwfH2/Mu1EyXTvTErDh88YpuI2NwmHJ0cZ2Nv3llDn3AsiOReQkg==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=amd.com header.i="@amd.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Tue, 30 Jun 2026 09:43:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 29-Jun-26 00:51, Hirokazu Takahashi wrote:
> Currently, during ARM Xen's SMP initialization, if there is
> a Device Tree error (such as an invalid 'enable-method'),
> cpu_possible_map can end up being sparse.
> 
> The issue here is that nr_cpu_ids is calculated in a way that
> doesn't properly account for the maximum CPU ID when the map is
> sparse, causing a mismatch. For example, if cpu_possible_map is
> 0xff0f, nr_cpu_ids becomes 12, but the actual maximum CPU ID
> is 15. Xen's common code is built on the assumption that
> 'CPU ID < nr_cpu_ids', so this mismatch can break things.
> 
> To fix this, modify dt_smp_init_cpus() so that if the
> arch_cpu_init() call fails, we don't consume the CPU ID slot.
> 
> Changes in v2:
This should come after ---, not to be included in the final commit msg.

> Fix an issue where cpu_logical_map(0) is cleared when boot CPU
> initialization fails.
> 
> Signed-off-by: Hirokazu Takahashi <taka@xxxxxxxxxxxxx>
This should have a Fixes tag (I traced to 4557c2292854).

> ---
>  xen/arch/arm/smpboot.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 7f3cfa812e..0ab9619398 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -191,6 +191,14 @@ static void __init dt_smp_init_cpus(void)
>              continue;
>          }
>  
> +        if ( hwid != mpidr && cpuidx >= NR_CPUS )
This should stay where it was with just i >= NR_CPUS. By moving it here, above a
duplicate filter, you are causing a regression. When the CPU list is full and it
hits a duplicate, instead of ignoring the duplicate and moving on, it stops
reading the list entirely (the boot CPU can be past that). Also, it makes the
diff smaller.

> +        {
> +            printk(XENLOG_WARNING
> +                   "DT /cpu %u node exceeds the max cores %u, capping 
> them\n",
> +                   cpuidx, NR_CPUS);
> +            break;
> +        }
> +
>          /*
>           * Duplicate MPIDRs are a recipe for disaster. Scan all initialized
>           * entries and check for duplicates. If any found just skip the node.
> @@ -224,24 +232,19 @@ static void __init dt_smp_init_cpus(void)
>              bootcpu_valid = true;
>          }
>          else
> -            i = cpuidx++;
> -
> -        if ( cpuidx > NR_CPUS )
> -        {
> -            printk(XENLOG_WARNING
> -                   "DT /cpu %u node greater than max cores %u, capping 
> them\n",
> -                   cpuidx, NR_CPUS);
> -            cpuidx = NR_CPUS;
> -            break;
> -        }
> +            i = cpuidx;
>  
>          if ( (rc = arch_cpu_init(i, cpu)) < 0 )
>          {
>              printk("cpu%d init failed (hwid %"PRIregister"): %d\n", i, hwid, 
> rc);
> -            tmp_map[i] = MPIDR_INVALID;
>          }
>          else
> +        {
>              tmp_map[i] = hwid;
> +
> +            if ( i != 0 )
> +                cpuidx++;
> +        }
>      }
>  
>      if ( !bootcpu_valid )
> @@ -251,10 +254,8 @@ static void __init dt_smp_init_cpus(void)
>          return;
>      }
>  
> -    for ( i = 0; i < cpuidx; i++ )
> +    for ( i = 1; i < cpuidx; i++ )
Starting at index 1 is correct, since smp_prepare_boot_cpu() already
set cpu_possible_map bit 0 and cpu_logical_map(0). That reason lives in
a different function though, so please add a short comment here to make
clear index 0 is skipped on purpose.

>      {
> -        if ( tmp_map[i] == MPIDR_INVALID )
> -            continue;
>          cpumask_set_cpu(i, &cpu_possible_map);
>          cpu_logical_map(i) = tmp_map[i];
>      }

Other than that, this is a good fix, thanks.

~Michal




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.