Xen project Mailing List

Re: [RFC PATCH] x86/p2m-pt: do type recalculations with p2m read lock

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 3 Apr 2023 17:38:55 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eMoA0HbVGxqEhBwVsdwSPwO02be0FeKEZK3UFGRXJsY=; b=HYrrU+6IzxuzIphui2P1kCAl+onAl9SOhBStcOLrVBHwkKOMqSCa3ayHfaj4SMn4Tt5gENsWNtfolTzvsv5jCfMsiWuGcuHz4ZSAHjCWrzeouFI3hE+viLYbu+c/nQ9vtUgz7J3lcKyUvNocabAvm2l/B6eHzXVZ/so+tZOUBvJrW8klkPRBMJneLEXEvYYCB0jp5ISymrdSJ0n91PYcKABGS6LxnN+p8buzz8Jc9bvxYHpabq8F2sb6Y/2OCfUYljCcZ+bUmB5PZMnOp7fSnfViWe80+OgOCzdUjPFoerMhOxD0J6H1NsUS1fG+JtJu3owOV++MB8FJrlWd+tkLFQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q49pNWpv+VT7Gzrwq5lev7rcI8DZXndkMrhV0q0sNM5Ij4ygMI8avG3L0O1E1kg1gTBdt/rwJH1r6crwDZTKnE7pwgYdQp0d0XGL4Sk6VPhy4UK8chnoYZ2OaIKiFzxKRprL1k5N70ojrtGzXca22zKXhLCExVpWOgyJzWleQGhSMdUn2v3Vu6b8c6bdbU4vliIL3TXDFUV5oSv30JvafqJwNczCLp4YKo4Ul9u0pAgMwbHN7rLZXJjuYkL7sHBH/BglfZFcqBxTYxPeaNoXsBNrqoXeQRM+nv79k2LSqxTtulUM18A/EeVPPm/42zL3vYGfCRD/QSeXfQqy/1dl7A==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 03 Apr 2023 15:39:23 +0000

Ironport-data: A9a23:NiGY1qqicGihICXy111aXZKb5DBeBmI+ZBIvgKrLsJaIsI4StFCzt garIBmAOv6Ja2WmKtpzPdi+808A7MTTztJmQQY5+y5nECgS9JuZCYyVIHmrMnLJJKUvbq7FA +Y2MYCccZ9uHhcwgj/3b9ANeFEljfngqoLUUbKCYWYpA1c/Ek/NsDo788YhmIlknNOlNA2Ev NL2sqX3NUSsnjV5KQr40YrawP9UlKm06WNwUmAWP6gR5weCzyJNVvrzGInqR5fGatgMdgKFb 76rIIGRpgvx4xorA9W5pbf3GmVirmn6ZFXmZtJ+AsBOszAazsAA+v9T2Mk0MC+7vw6hjdFpo OihgLTrIesf0g8gr8xGO/VQO3kW0aSrY9YrK1Dn2SCY5xWun3cBX5yCpaz5VGEV0r8fPI1Ay RAXACADdhmBid+p+ZGcdNB83e8FF+fIDapK7xmMzRmBZRonabbqZvyToPR/hXI3jM0IGuvCb c0EbzYpdA7HfxBEJlYQDtQ5gfusgX78NTZfrTp5p4JuuzSVkFM3jearaYWIEjCJbZw9ckKwv GXJ8n6/GhgHHNee1SCE4jSngeqncSbTAdpOSeDlrK8y6LGV7koPD0YwdUaCneGWpnLufcJ8A X0X3QN7+MDe82TuFLERRSaQonSJoxodUNp4CPAh5UeGza+8yxmdLngJSHhGctNOnN87Q3km2 0GEm/vtBCdzq/uFRHSF7LCWoDiufy8PIgc/iTQsSAIE55zop9g1hxeWF9J7Svfq05vyBC36x C2MoG4mnbIPgMUX1qK9u1fanzaroZuPRQkwjunKYl+YAspCTNbNT+SVBZLztJ6s8K7xooG9g UU5

Ironport-hdrordr: A9a23:BLE14KEdxIFUGe1gpLqEKMeALOsnbusQ8zAXPiBKJCC9E/bo8v xG+c5w6faaslkssR0b9+xoW5PwI080l6QU3WB5B97LMDUO0FHCEGgI1/qA/9SPIUzDHu4279 YbT4FOTOfeIHI/p/zciTPId+rJwrO8gd2VbTG19QYQceloAZsQkDuQEmygYypLrJEtP+tDKH KbjPA3wQZJKRwsH72G7mBuZZm6m+H2

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Apr 03, 2023 at 05:32:39PM +0200, Jan Beulich wrote: > On 03.04.2023 12:14, Roger Pau Monne wrote: > > Global p2m type recalculations (as triggered by logdirty) can create > > so much contention on the p2m lock that simple guest operations like > > VCPUOP_set_singleshot_timer on guests with a high amount of vCPUs (32) > > will cease to work in a timely manner, up to the point that Linux > > kernel versions that sill use the VCPU_SSHOTTMR_future flag with the > > singleshot timer will cease to work: > > > > [ 82.779470] CE: xen increased min_delta_ns to 1000000 nsec > > [ 82.793075] CE: Reprogramming failure. Giving up > > [ 82.779470] CE: Reprogramming failure. Giving up > > [ 82.821864] CE: xen increased min_delta_ns to 506250 nsec > > [ 82.821864] CE: xen increased min_delta_ns to 759375 nsec > > [ 82.821864] CE: xen increased min_delta_ns to 1000000 nsec > > [ 82.821864] CE: Reprogramming failure. Giving up > > [ 82.856256] CE: Reprogramming failure. Giving up > > [ 84.566279] CE: Reprogramming failure. Giving up > > [ 84.649493] Freezing user space processes ... > > [ 130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} > > (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130) > > [ 130.604032] Task dump for CPU 14: > > [ 130.604032] swapper/14 R running task 0 0 1 > > 0x00000000 > > [ 130.604032] Call Trace: > > [ 130.604032] [<ffffffff90160f5d>] ? > > rcu_eqs_enter_common.isra.30+0x3d/0xf0 > > [ 130.604032] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 > > [ 130.604032] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 > > [ 130.604032] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 > > [ 130.604032] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 > > [ 130.604032] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 > > [ 549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} > > (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013) > > [ 549.655463] Task dump for CPU 26: > > [ 549.655463] swapper/26 R running task 0 0 1 > > 0x00000000 > > [ 549.655463] Call Trace: > > [ 549.655463] [<ffffffff90160f5d>] ? > > rcu_eqs_enter_common.isra.30+0x3d/0xf0 > > [ 549.655463] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 > > [ 549.655463] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 > > [ 549.655463] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 > > [ 549.655463] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 > > [ 549.655463] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 > > [ 821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} > > (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664) > > [ 821.888596] Task dump for CPU 26: > > [ 821.888622] swapper/26 R running task 0 0 1 > > 0x00000000 > > [ 821.888677] Call Trace: > > [ 821.888712] [<ffffffff90160f5d>] ? > > rcu_eqs_enter_common.isra.30+0x3d/0xf0 > > [ 821.888771] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 > > [ 821.888818] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 > > [ 821.888865] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 > > [ 821.888917] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 > > [ 821.888966] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 > > > > This is obviously undesirable. One way to bodge the issue would be to > > ignore VCPU_SSHOTTMR_future, but that's a deliberate breakage of the > > hypercall ABI. > > > > Instead lower the contention in the lock by doing the recalculation > > with the lock in read mode. This is safe because only the flags/type > > are changed, there's no PTE mfn change in the AMD recalculation logic. > > The Intel (EPT) case is likely more complicated, as superpage > > splitting for diverging EMT values must be done with the p2m lock in > > taken in write mode. > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > I'm unsure whether such modification is fully safe: I think changing > > the flags/type should be fine: the PTE write is performed using > > safwrite_p2m_entry() which must be atomic (as the guest is still > > running and accessing the page tables). I'm slightly worried about > > all PTE readers not using atomic accesses to do so (ie: pointer > > returned by p2m_find_entry() should be read atomicallly), and code > > assuming that a gfn type cannot change while holding the p2m lock in > > read mode. > > Coming back to this: Yes, I think reads (at least the ones in do_recalc() > which can now be done in parallel) will need to be tightened if this is a > road we want to follow. There are likely a lot of reads under the p2m read lock outside of do_recalc() that will ideally need to be switched to use atomic accesses also? I'm open to suggestions to other ways to get this sorted. And that's a guest with 'just' 32 vCPUs, as we go up the contention on the p2m lock during recalcs/misconfigs is going to increase massively. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.