|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/1] xen/arm: fix sparse cpu_possible_map calculation on SMP boot
On 29-Jun-26 00:51, Hirokazu Takahashi wrote:
> Currently, during ARM Xen's SMP initialization, if there is
> a Device Tree error (such as an invalid 'enable-method'),
> cpu_possible_map can end up being sparse.
>
> The issue here is that nr_cpu_ids is calculated in a way that
> doesn't properly account for the maximum CPU ID when the map is
> sparse, causing a mismatch. For example, if cpu_possible_map is
> 0xff0f, nr_cpu_ids becomes 12, but the actual maximum CPU ID
> is 15. Xen's common code is built on the assumption that
> 'CPU ID < nr_cpu_ids', so this mismatch can break things.
>
> To fix this, modify dt_smp_init_cpus() so that if the
> arch_cpu_init() call fails, we don't consume the CPU ID slot.
>
> Changes in v2:
This should come after ---, not to be included in the final commit msg.
> Fix an issue where cpu_logical_map(0) is cleared when boot CPU
> initialization fails.
>
> Signed-off-by: Hirokazu Takahashi <taka@xxxxxxxxxxxxx>
This should have a Fixes tag (I traced to 4557c2292854).
> ---
> xen/arch/arm/smpboot.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 7f3cfa812e..0ab9619398 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -191,6 +191,14 @@ static void __init dt_smp_init_cpus(void)
> continue;
> }
>
> + if ( hwid != mpidr && cpuidx >= NR_CPUS )
This should stay where it was with just i >= NR_CPUS. By moving it here, above a
duplicate filter, you are causing a regression. When the CPU list is full and it
hits a duplicate, instead of ignoring the duplicate and moving on, it stops
reading the list entirely (the boot CPU can be past that). Also, it makes the
diff smaller.
> + {
> + printk(XENLOG_WARNING
> + "DT /cpu %u node exceeds the max cores %u, capping
> them\n",
> + cpuidx, NR_CPUS);
> + break;
> + }
> +
> /*
> * Duplicate MPIDRs are a recipe for disaster. Scan all initialized
> * entries and check for duplicates. If any found just skip the node.
> @@ -224,24 +232,19 @@ static void __init dt_smp_init_cpus(void)
> bootcpu_valid = true;
> }
> else
> - i = cpuidx++;
> -
> - if ( cpuidx > NR_CPUS )
> - {
> - printk(XENLOG_WARNING
> - "DT /cpu %u node greater than max cores %u, capping
> them\n",
> - cpuidx, NR_CPUS);
> - cpuidx = NR_CPUS;
> - break;
> - }
> + i = cpuidx;
>
> if ( (rc = arch_cpu_init(i, cpu)) < 0 )
> {
> printk("cpu%d init failed (hwid %"PRIregister"): %d\n", i, hwid,
> rc);
> - tmp_map[i] = MPIDR_INVALID;
> }
> else
> + {
> tmp_map[i] = hwid;
> +
> + if ( i != 0 )
> + cpuidx++;
> + }
> }
>
> if ( !bootcpu_valid )
> @@ -251,10 +254,8 @@ static void __init dt_smp_init_cpus(void)
> return;
> }
>
> - for ( i = 0; i < cpuidx; i++ )
> + for ( i = 1; i < cpuidx; i++ )
Starting at index 1 is correct, since smp_prepare_boot_cpu() already
set cpu_possible_map bit 0 and cpu_logical_map(0). That reason lives in
a different function though, so please add a short comment here to make
clear index 0 is skipped on purpose.
> {
> - if ( tmp_map[i] == MPIDR_INVALID )
> - continue;
> cpumask_set_cpu(i, &cpu_possible_map);
> cpu_logical_map(i) = tmp_map[i];
> }
Other than that, this is a good fix, thanks.
~Michal
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |