Xen project Mailing List

Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.

To: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" <longpeng2@xxxxxxxxxx>, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>

From: Dongli Zhang <dongli.zhang@xxxxxxxxxx>

Date: Tue, 23 Nov 2021 21:24:32 -0800

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=m0y6mT4qyVR8XNUGc8DkKZTyZcBHqTyMWV7agwWQghk=; b=HMKJYU0wDAnCD5ofJLJBFAInrzGPuBeXdWRxyiLWWWHALfhgTAPP9H9XUrRJOoynokEayZFDe5sT0gbTVgOkJ+MLqRodp4zugq/UBS03BULtWEXTy8cjqBSIOL6rjwffY5ubIVmih3nqm/utEMYbN9GwOcAVIjEuX4IrTIpQYIR+o2aNlZbbep6zPPGHPU5B/uxoHbf42YN4tN3x0Q5B4TjxhZ2OsjAfqlfkpmIQwH9hRZtKpzaslNP9HWk47+Ky/V6iMKuxohYhEzsfvhXKMwHxov1VfrBKxwXRe9ChoU1f/3Y7Vwnu7x0ZamaJhePS3MjGd0v3EoH/5DdnU0hQAA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mzAr5G4MmvDo6i0T1MsjnPi0g5bvKSuQ5hKTdoYoxFRpJWQ35vhbtlfqG/LQwRxoZauSmoXsZ8rWHNoddGHBpHeArdoc5nT9QUke1fUSOk8UmdSmBYh8jS6HeXZlLRjyewuTupHKanAzyTAeAVI8inLanmU+pO2Bw0opa0Kl3HqohRhtadSjyCDpnDTaSQGMcXm6C+RD5TQaD9L6v9o61xOg9sPJZpzFTjFeN0g2xcl27/mdCCFwpdn/YcgQXZMUYVT+1nYdvzNrXHXh7fO6DAJtsJJzlYDUEnZpqqIFlgguZTQOD61UaJFRVk7I8oHU35E2KrvaPfNgw4ox2mG6Kw==

Cc: "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "Gonglei (Arei)" <arei.gonglei@xxxxxxxxxx>, "x86@xxxxxxxxxx" <x86@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Valentin Schneider <valentin.schneider@xxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>

Delivery-date: Wed, 24 Nov 2021 05:25:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11/23/21 3:50 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: > > >> -----Original Message----- >> From: Dongli Zhang [mailto:dongli.zhang@xxxxxxxxxx] >> Sent: Wednesday, November 24, 2021 5:22 AM >> To: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>; Longpeng (Mike, Cloud >> Infrastructure Service Product Dept.) <longpeng2@xxxxxxxxxx> >> Cc: linux-kernel@xxxxxxxxxxxxxxx; Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>; >> x86@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Peter Zijlstra >> <peterz@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Valentin Schneider >> <valentin.schneider@xxxxxxx>; Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>; >> Juergen Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>; >> Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; >> Borislav >> Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; H. Peter >> Anvin <hpa@xxxxxxxxx> >> Subject: Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be >> brought up again. >> >> Tested-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx> >> >> >> The bug fixed by commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to >> monotonic raw clock") may leave the cpu_hotplug_state at CPU_UP_PREPARE. As a >> result, to online this CPU again (even after removal) is always failed. >> >> I have tested that this patch works well to workaround the issue, by >> introducing >> either a mdeley(11000) or while(1); to start_secondary(). That is, to online >> the >> same CPU again is successful even after initial do_boot_cpu() failure. >> >> 1. add mdelay(11000) or while(1); to the start_secondary(). >> >> 2. to online CPU is failed at do_boot_cpu(). >> > > Thanks for your testing :) > > Does the cpu4 spin in wait_for_master_cpu() in your case ? I did two tests. TEST 1. I added "mdelay(11000);" as the first line in start_secondary(). Once the issue was encountered, the RIP of CPU=4 was ffffffff8c242021 (from QEMU's "info registers -a") which was in the range of wait_for_master_cpu(). # cat /proc/kallsyms | grep ffffffff8c2420 ffffffff8c242010 t wait_for_master_cpu ffffffff8c242030 T load_fixmap_gdt ffffffff8c242060 T native_write_cr4 ffffffff8c2420c0 T cr4_init TEST 2. I added "while(true);" as the first line in start_secondary(). Once the issue was encountered, the RIP of CPU=4 was ffffffff91654c0a (from QEMU's "info registers -a") which was in the range of start_secondary(). # cat /proc/kallsyms | grep ffffffff91654c0 ffffffff91654c00 t start_secondary Dongli Zhang > >> 3. to online CPU again is failed without this patch. >> >> # echo 1 > /sys/devices/system/cpu/cpu4/online >> -su: echo: write error: Input/output error >> >> 4. to online CPU again is successful with this patch. >> >> Thank you very much! >> >> Dongli Zhang >> >> On 11/22/21 7:47 AM, Sebastian Andrzej Siewior wrote: >>> From: "Longpeng(Mike)" <longpeng2@xxxxxxxxxx> >>> >>> A CPU will not show up in virtualized environment which includes an >>> Enclave. The VM splits its resources into a primary VM and a Enclave >>> VM. While the Enclave is active, the hypervisor will ignore all requests >>> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. >>> The kernel will wait up to ten seconds for CPU to show up >>> (do_boot_cpu()) and then rollback the hotplug state back to >>> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is >>> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. >>> >>> After the Enclave VM terminates, the primary VM can bring up the CPU >>> again. >>> >>> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. >>> >>> [bigeasy: Rewrite commit description.] >>> >>> Signed-off-by: Longpeng(Mike) <longpeng2@xxxxxxxxxx> >>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> >>> Link: >> https://urldefense.com/v3/__https://lore.kernel.org/r/20210901051143.2752-1 >> -longpeng2@huawei.com__;!!ACWV5N9M2RV99hQ!d4sCCXMQV7ekFwpd21vo1_9K-m5h4VZ-g >> E8Z62PLL58DT4VJ6StH57TR_KpBdbwhBE0$ >>> --- >>> >>> For XEN: this changes the behaviour as it allows to invoke >>> cpu_initialize_context() again should it have have earlier. I *think* >>> this is okay and would to bring up the CPU again should the memory >>> allocation in cpu_initialize_context() fail. >>> >>> kernel/smpboot.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/kernel/smpboot.c b/kernel/smpboot.c >>> index f6bc0bc8a2aab..34958d7fe2c1c 100644 >>> --- a/kernel/smpboot.c >>> +++ b/kernel/smpboot.c >>> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) >>> */ >>> return -EAGAIN; >>> >>> + case CPU_UP_PREPARE: >>> + /* >>> + * Timeout while waiting for the CPU to show up. Allow to try >>> + * again later. >>> + */ >>> + return 0; >>> + >>> default: >>> >>> /* Should not happen. Famous last words. */ >>>

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.