Xen project Mailing List

Re: [PATCH] xen: fix is_xen_pmu()

To: Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, x86@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx

From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

Date: Tue, 22 Mar 2022 16:00:43 -0400

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=o3sT2wAYugVVZLMZFoWqzPzj2wgcJkVbnTckjo+xtLY=; b=Ziw3DkcKMdnrNzy+Y0W9mdTLx1mClAiYZQRxmwTabubs2W+XU8ByBwR4Jk8vIrpDR3llarpW948arMA8vxatTOfUqI6ahqlyNr7KG57oT9O4sGZySsqyAQq56DgnmM5xUzneHlH2IYncTk4fApGYPYdH7bj1eWdJ1y6xSAKEibivW0TTPcnv93X2N2nflgaLP4YdG9SuuzOv2XF6P7LpMk2NFuHnmRNk9DS/FuX/qMMQCaLHD1JWDYpf0ft/KAjxLbayrD+eEGdkmuzd5gXNRnBnP2/Twft9yrwIiXU1XCNNx4uepCwJFSeqyq8sSVt2uszLFeNBBzSpCyNEIyV3fw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mYMwVmMUyjgEK77qp938tKeAg2uo25sdVLAWdfwgFHr9f41MaM77KdogR2jlXcJegy8vtD/JIeXOeFMVIHaWonAH/n+YpLfXGayVz3e2I1/YOuuore+GAh77IuRHiqh17dhYHzbsoFs6ZurlHhd6sI94vcizxYUHSLBeUXGE5wMrGYszTD9pu3Hfp422e4vWewdZ5gnaKrjRQpanHP0VEqMsoS5aMoypeCxyddC3AIKnycM9UPiob+g+BrRiHQCxmkUYxW+S6DCSulS/jz6Wfj+p9m+fQT6HdinW3rpTAl+u/VZGzMxemZAbRbsYFtmRleiGI0M7mSEJpmdbF52Mvg==

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 22 Mar 2022 20:02:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 3/22/22 11:50 AM, Juergen Gross wrote:

is_xen_pmu() is taking the cpu number as parameter, but it is not using
it. Instead it just tests whether the Xen PMU initialization on the
current cpu did succeed. As this test is done by checking a percpu
pointer, preemption needs to be disabled in order to avoid switching
the cpu while doing the test. While resuming from suspend() this seems
not to be the case:

[   88.082751] ACPI: PM: Low-level resume complete
[   88.087933] ACPI: EC: EC started
[   88.091464] ACPI: PM: Restoring platform NVS memory
[   88.097166] xen_acpi_processor: Uploading Xen processor PM info
[   88.103850] Enabling non-boot CPUs ...
[   88.108128] installing Xen timer for CPU 1
[   88.112763] BUG: using smp_processor_id() in preemptible [00000000] code: 
systemd-sleep/7138
[   88.122256] caller is is_xen_pmu+0x12/0x30
[   88.126937] CPU: 0 PID: 7138 Comm: systemd-sleep Tainted: G        W         
5.16.13-2.fc32.qubes.x86_64 #1
[   88.137939] Hardware name: Star Labs StarBook/StarBook, BIOS 7.97 03/21/2022
[   88.145930] Call Trace:
[   88.148757]  <TASK>
[   88.151193]  dump_stack_lvl+0x48/0x5e
[   88.155381]  check_preemption_disabled+0xde/0xe0
[   88.160641]  is_xen_pmu+0x12/0x30
[   88.164441]  xen_smp_intr_init_pv+0x75/0x100

There is actually another PMU-related problem on restore which was caused (or, rather, highlighted) by ff083a2d972f56bebfd82409ca62e5dfce950961: [ 116.861637] ------------[ cut here ]------------ [ 116.861651] WARNING: CPU: 1 PID: 31 at kernel/events/core.c:6614 perf_register_guest_info_callbacks+0x68/0x70 [ 116.861673] Modules linked in: [ 116.861682] CPU: 1 PID: 31 Comm: xenwatch Not tainted 5.17.0-rc7ostr #103 [ 116.861695] RIP: e030:perf_register_guest_info_callbacks+0x68/0x70 [ 116.861706] Code: c7 c7 40 e1 86 82 e8 d7 e7 ff ff 48 8b 53 10 48 85 d2 74 14 48 c7 c6 f0 0a c0 81 48 c7 c7 30 e1 86 82 5b e9 ba e7 ff ff 5b c3 <0f> 0b c3 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 05 54 fd 0b 02 48 39 [ 116.861747] RSP: e02b:ffffc9004016fe18 EFLAGS: 00010286 [ 116.861758] RAX: ffffffff82432850 RBX: 0000000000000003 RCX: ffff888079c00000 [ 116.861768] RDX: ffff888079c00000 RSI: ffffc9004016fe30 RDI: ffffffff82432850 [ 116.861778] RBP: 0000000000000000 R08: 0000160000000000 R09: ffffea00000ed340 [ 116.861788] R10: 0000000000000758 R11: 0000000000000000 R12: ffff888003b4d000 [ 116.861797] R13: 0000000000000003 R14: ffffffff8162cf10 R15: 0000000000000000 [ 116.861819] FS: 0000000000000000(0000) GS:ffff888079c80000(0000) knlGS:0000000000000000 [ 116.861830] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 116.861839] CR2: 0000000000000000 CR3: 00000000062b6000 CR4: 0000000000040660 [ 116.861853] Call Trace: [ 116.861861] <TASK> [ 116.861866] xen_pmu_init+0x187/0x280 [ 116.861879] xen_arch_resume+0x30/0x50 [ 116.861888] do_suspend.cold+0x132/0x147 [ 116.861899] shutdown_handler+0x12e/0x140 [ 116.861910] xenwatch_thread+0x94/0x180 [ 116.861919] ? finish_wait+0x80/0x80 [ 116.861928] kthread+0xe7/0x110 [ 116.861938] ? kthread_complete_and_exit+0x20/0x20 [ 116.861948] ret_from_fork+0x22/0x30 [ 116.861959] </TASK> [ 116.861964] ---[ end trace 0000000000000000 ]--- I was going to send a patch but I think yours can be slightly modified to take care of this problem as well.

@@ -542,6 +539,7 @@ void xen_pmu_init(int cpu)
        per_cpu(xenpmu_shared, cpu).flags = 0;

if (cpu == 0) {

if (!is_xen_pmu)

+               is_xen_pmu = true;
                perf_register_guest_info_callbacks(&xen_guest_cbs);
                xen_pmu_arch_init();
        }
@@ -572,4 +570,7 @@ void xen_pmu_finish(int cpu)

free_pages((unsigned long)per_cpu(xenpmu_shared, cpu).xenpmu_data, 0);

        per_cpu(xenpmu_shared, cpu).xenpmu_data = NULL;
+
+       if (cpu == 0)
+               is_xen_pmu = false;

And drop this hunk. -boris

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.