[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
On 11/23/21 3:50 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: > > >> -----Original Message----- >> From: Dongli Zhang [mailto:dongli.zhang@xxxxxxxxxx] >> Sent: Wednesday, November 24, 2021 5:22 AM >> To: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>; Longpeng (Mike, Cloud >> Infrastructure Service Product Dept.) <longpeng2@xxxxxxxxxx> >> Cc: linux-kernel@xxxxxxxxxxxxxxx; Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>; >> x86@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Peter Zijlstra >> <peterz@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Valentin Schneider >> <valentin.schneider@xxxxxxx>; Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>; >> Juergen Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>; >> Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; >> Borislav >> Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; H. Peter >> Anvin <hpa@xxxxxxxxx> >> Subject: Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be >> brought up again. >> >> Tested-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx> >> >> >> The bug fixed by commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to >> monotonic raw clock") may leave the cpu_hotplug_state at CPU_UP_PREPARE. As a >> result, to online this CPU again (even after removal) is always failed. >> >> I have tested that this patch works well to workaround the issue, by >> introducing >> either a mdeley(11000) or while(1); to start_secondary(). That is, to online >> the >> same CPU again is successful even after initial do_boot_cpu() failure. >> >> 1. add mdelay(11000) or while(1); to the start_secondary(). >> >> 2. to online CPU is failed at do_boot_cpu(). >> > > Thanks for your testing :) > > Does the cpu4 spin in wait_for_master_cpu() in your case ? I did two tests. TEST 1. I added "mdelay(11000);" as the first line in start_secondary(). Once the issue was encountered, the RIP of CPU=4 was ffffffff8c242021 (from QEMU's "info registers -a") which was in the range of wait_for_master_cpu(). # cat /proc/kallsyms | grep ffffffff8c2420 ffffffff8c242010 t wait_for_master_cpu ffffffff8c242030 T load_fixmap_gdt ffffffff8c242060 T native_write_cr4 ffffffff8c2420c0 T cr4_init TEST 2. I added "while(true);" as the first line in start_secondary(). Once the issue was encountered, the RIP of CPU=4 was ffffffff91654c0a (from QEMU's "info registers -a") which was in the range of start_secondary(). # cat /proc/kallsyms | grep ffffffff91654c0 ffffffff91654c00 t start_secondary Dongli Zhang > >> 3. to online CPU again is failed without this patch. >> >> # echo 1 > /sys/devices/system/cpu/cpu4/online >> -su: echo: write error: Input/output error >> >> 4. to online CPU again is successful with this patch. >> >> Thank you very much! >> >> Dongli Zhang >> >> On 11/22/21 7:47 AM, Sebastian Andrzej Siewior wrote: >>> From: "Longpeng(Mike)" <longpeng2@xxxxxxxxxx> >>> >>> A CPU will not show up in virtualized environment which includes an >>> Enclave. The VM splits its resources into a primary VM and a Enclave >>> VM. While the Enclave is active, the hypervisor will ignore all requests >>> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. >>> The kernel will wait up to ten seconds for CPU to show up >>> (do_boot_cpu()) and then rollback the hotplug state back to >>> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is >>> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. >>> >>> After the Enclave VM terminates, the primary VM can bring up the CPU >>> again. >>> >>> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. >>> >>> [bigeasy: Rewrite commit description.] >>> >>> Signed-off-by: Longpeng(Mike) <longpeng2@xxxxxxxxxx> >>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> >>> Link: >> https://urldefense.com/v3/__https://lore.kernel.org/r/20210901051143.2752-1 >> -longpeng2@huawei.com__;!!ACWV5N9M2RV99hQ!d4sCCXMQV7ekFwpd21vo1_9K-m5h4VZ-g >> E8Z62PLL58DT4VJ6StH57TR_KpBdbwhBE0$ >>> --- >>> >>> For XEN: this changes the behaviour as it allows to invoke >>> cpu_initialize_context() again should it have have earlier. I *think* >>> this is okay and would to bring up the CPU again should the memory >>> allocation in cpu_initialize_context() fail. >>> >>> kernel/smpboot.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/kernel/smpboot.c b/kernel/smpboot.c >>> index f6bc0bc8a2aab..34958d7fe2c1c 100644 >>> --- a/kernel/smpboot.c >>> +++ b/kernel/smpboot.c >>> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) >>> */ >>> return -EAGAIN; >>> >>> + case CPU_UP_PREPARE: >>> + /* >>> + * Timeout while waiting for the CPU to show up. Allow to try >>> + * again later. >>> + */ >>> + return 0; >>> + >>> default: >>> >>> /* Should not happen. Famous last words. */ >>>
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |