[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup
Dear Thomas, Am 19.04.23 um 14:38 schrieb Thomas Gleixner: On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:On Tue, Apr 18 2023 at 22:10, Paul Menzel wrote:Am 18.04.23 um 10:40 schrieb Thomas Gleixner:Can you please provide the output of cpuid?Of course. Here the top, and the whole output is attached.Thanks for the data. Can you please apply the debug patch below and provide the dmesg output? Just the line which is added by the patch is enough. You can boot with cpuhp.parallel=off so you don't have wait for 10 seconds.Borislav found some a machine which also refuses to boot. It turns of the debug patch was spot on: [ 0.462724] .... node #0, CPUs: #1 [ 0.462731] smpboot: Kicking AP alive: 17 [ 0.465723] #2 [ 0.465732] smpboot: Kicking AP alive: 18 [ 0.467641] #3 [ 0.467641] smpboot: Kicking AP alive: 19 So the kernel gets APICID 17, 18, 19 from ACPI but CPUID leaf 0x1 ebx[31:24], which is the initial APICID has: CPU1 0x01 CPU2 0x02 CPU3 0x03 Which means the APICID to Linux CPU number lookup based on CPUID 0x01 fails for all of them and stops them dead in the low level startup code. I am attaching the logs for completeness. Linux is build from your branch with the debug print on top. The firmware, coreboot based, is built from [1], but it also happened non-parallel MP init. The code has better debug prints (attached) though as far as I can see. As Borislav is able to reproduce this too with some non-coreboot firmware, I assume it’s unrelated to coreboot. ``` [ 0.259247] smp: Bringing up secondary CPUs ... [ 0.259446] x86: Booting SMP configuration: [ 0.259448] .... node #0, CPUs: #1 [ 0.259453] smpboot: Kicking AP alive: 17 [ 10.260918] CPU1 failed to report alive state [ 10.260998] smp: Brought up 1 node, 1 CPU [ 10.261000] smpboot: Max logical packages: 2 [ 10.261001] smpboot: Total of 1 processors activated (7801.09 BogoMIPS) ``` IOW, the BIOS assignes random numbers to the AP APICs for whatever raisins, which leaves the parallel startup low level code up a creek without a paddle, except for actually reading the APICID back from the APIC. *SHUDDER* I'm leaning towards disabling the CPUID lead 0x01 based discovery and be done with it. Kind regards, Paul [1]: https://review.coreboot.org/68169 Attachment:
kodi-linux-6.3-rc3-smp-tglx.txt Attachment:
20230419-coreboot-cbmem-log-cb-68169.txt
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |