Xen project Mailing List

[PATCH] xen/arm64: flushtlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI

From: Michal Orzel <michal.orzel@xxxxxxx>

Date: Tue, 14 Apr 2026 10:11:24 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PP1KyGMUPJvF5Q1ukT7V82w/Lm2EsCmlDdLEROtkou0=; b=HeE7Zb9PYVVoqHQHWP96r4oMc4+0pUJZgFH74GWbQ8/trEOGo/qiYz7wG4t+biDYdWwWx+sEV4m4RyxjP0wZsXv87+Tm8olYQ9hYq9SkKBHFIPLphquslOP9GM9umIa2LJ7SPRTDDz2JaGcZ9k4hGnDkbufcKgQ3Qv6/HsEw4c4svFs3FtSiPWee+6nRSZ6h35y7rnOzo97VU5hhGR7nppBwfYhGu3W68bif1pINo5Q5+jiw0jD5dIYJIROo9Jaln/oK6p2PxD3lKGsCJny41hO9MnluADpvH1F3NIFhaeufrtVUQVuAI7Y+P95IYlTCAAfnG8bAXqzD+p7FrmVunw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DUCwW21L9PPlcsfIks9yEoRta1EIouMTQKA4p/5Q1VbdlXcYPpadYyaQutXtXlKJTalB15lvYWK8iUk91cQBrItCircLp00P68jChPwlLr9SH90jqMnHpR9pAU98SwNpb4ivKEP/ZVPe/51ZU8eourBhGJNrLOLtsQNZuK7/5Gx5iWvJ3fRPmS63hty/zFLmV/Ksn3KsUqMyC4Vuk5UOgM79MX/XDBdnGw442G32wQa8ycRjDEtK8r3Fu/npczzuc2ZHQbcEN9krQRQaHWvjo+aJPOXHfLUdlMPXC6UM8Aj/uKMa70niypZFw3R+kfcyMW4RXt4YyrimS7FSCe2YOw==

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=amd.com header.i="@amd.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"

Cc: Michal Orzel <michal.orzel@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, <Mark.Rutland@xxxxxxx>

Delivery-date: Tue, 14 Apr 2026 08:12:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several errata where broadcast TLBI;DSB sequences don't provide all the architecturally required synchronization. The workaround performs more work than necessary, and can have significant overhead. This patch optimizes the workaround, as explained below. 1. All relevant errata only affect the ordering and/or completion of memory accesses which have been translated by an invalidated TLB entry. The actual invalidation of TLB entries is unaffected. 2. The existing workaround is applied to both broadcast and local TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for broadcast invalidation. 3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI sequence, whereas for all relevant errata it is only necessary to execute a single additional TLBI;DSB sequence after any number of TLBIs are completed by a DSB. For example, for a sequence of batched TLBIs: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH ... the existing workaround will expand this to: TLBI <op1>[, <arg1>] DSB ISH // additional TLBI <op1>[, <arg1>] // additional TLBI <op2>[, <arg2>] DSB ISH // additional TLBI <op2>[, <arg2>] // additional TLBI <op3>[, <arg3>] DSB ISH // additional TLBI <op3>[, <arg3>] // additional DSB ISH ... whereas it is sufficient to have: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH TLBI <opX>[, <argX>] // additional DSB ISH // additional Using a single additional TBLI and DSB at the end of the sequence can have significantly lower overhead as each DSB which completes a TLBI must synchronize with other PEs in the system, with potential performance effects both locally and system-wide. 4. The existing workaround repeats each specific TLBI operation, whereas for all relevant errata it is sufficient for the additional TLBI to use *any* operation which will be broadcast, regardless of which translation regime or stage of translation the operation applies to. For example, for a single TLBI: TLBI ALLE2IS DSB ISH ... the existing workaround will expand this to: TLBI ALLE2IS DSB ISH TLBI ALLE2IS // additional DSB ISH // additional ... whereas it is sufficient to have: TLBI ALLE2IS DSB ISH TLBI VALE1IS, XZR // additional DSB ISH // additional As the additional TLBI doesn't have to match a specific earlier TLBI, the additional TLBI can be implemented in separate code, with no memory of the earlier TLBIs. The additional TLBI can also use a cheaper TLBI operation. 5. The existing workaround is applied to both Stage-1 and Stage-2 TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for Stage-1 invalidation. Architecturally, TLBI operations which invalidate only Stage-2 information (e.g. IPAS2E1IS) are not required to invalidate TLB entries which combine information from Stage-1 and Stage-2 translation table entries, and consequently may not complete memory accesses translated by those combined entries. In these cases, completion of memory accesses is only guaranteed after subsequent invalidation of Stage-1 information (e.g. VMALLE1IS). Rework the workaround logic as follows: - add TLB_HELPER_LOCAL() to be used for local TLB ops without a workaround, - modify TLB_HELPER() workaround to use tlbi vale2is, xzr as a second TLB, - drop TLB_HELPER_VA(). It's used only by __flush_xen_tlb_one_local which is local and does not need workaround and by __flush_xen_tlb_one. In the latter case, since it's used in a loop, we don't need a workaround in the middle. Add __tlb_repeat_sync with a workaround to be used at the end after DSB and before final ISB, - TLBI VALE2IS passing XZR is used as an additional TLBI. While there is an identity mapping there, it's used very rarely. The performance impact is therefore negligible. If things change in the future, we can revisit the decision. Signed-off-by: Michal Orzel <michal.orzel@xxxxxxx> --- Linux counterpart (already merged): https://lore.kernel.org/linux-arm-kernel/20260218164348.2022831-1-mark.rutland@xxxxxxx/ --- xen/arch/arm/include/asm/arm32/flushtlb.h | 3 + xen/arch/arm/include/asm/arm64/flushtlb.h | 108 ++++++++++++++-------- xen/arch/arm/include/asm/flushtlb.h | 1 + 3 files changed, 71 insertions(+), 41 deletions(-) diff --git a/xen/arch/arm/include/asm/arm32/flushtlb.h b/xen/arch/arm/include/asm/arm32/flushtlb.h index 61c25a318998..5483be08fbbe 100644 --- a/xen/arch/arm/include/asm/arm32/flushtlb.h +++ b/xen/arch/arm/include/asm/arm32/flushtlb.h @@ -57,6 +57,9 @@ static inline void __flush_xen_tlb_one(vaddr_t va) asm volatile(STORE_CP32(0, TLBIMVAHIS) : : "r" (va) : "memory"); } +/* Only for ARM64_WORKAROUND_REPEAT_TLBI */ +static inline void __tlb_repeat_sync(void) {} + #endif /* __ASM_ARM_ARM32_FLUSHTLB_H__ */ /* * Local variables: diff --git a/xen/arch/arm/include/asm/arm64/flushtlb.h b/xen/arch/arm/include/asm/arm64/flushtlb.h index 3b99c11b50d1..1606b26bf28a 100644 --- a/xen/arch/arm/include/asm/arm64/flushtlb.h +++ b/xen/arch/arm/include/asm/arm64/flushtlb.h @@ -12,9 +12,14 @@ * ARM64_WORKAROUND_REPEAT_TLBI: * Modification of the translation table for a virtual address might lead to * read-after-read ordering violation. - * The workaround repeats TLBI+DSB ISH operation for all the TLB flush - * operations. While this is strictly not necessary, we don't want to - * take any risk. + * The workaround repeats TLBI+DSB ISH operation for broadcast TLB flush + * operations. The workaround is not needed for local operations. + * + * It is sufficient for the additional TLBI to use *any* operation which will + * be broadcast, regardless of which translation regime or stage of translation + * the operation applies to. TLBI VALE2IS is used passing XZR. While there is + * an identity mapping there, it's only used during suspend/resume, CPU on/off, + * so the impact (performance if any) is negligible. * * For Xen page-tables the ISB will discard any instructions fetched * from the old mappings. @@ -26,69 +31,90 @@ * Note that for local TLB flush, using non-shareable (nsh) is sufficient * (see D5-4929 in ARM DDI 0487H.a). Although, the memory barrier in * for the workaround is left as inner-shareable to match with Linux - * v6.1-rc8. + * v6.19. */ -#define TLB_HELPER(name, tlbop, sh) \ +#define TLB_HELPER_LOCAL(name, tlbop) \ static inline void name(void) \ { \ asm_inline volatile ( \ - "dsb " # sh "st;" \ + "dsb nshst;" \ "tlbi " # tlbop ";" \ - ALTERNATIVE( \ - "nop; nop;", \ - "dsb ish;" \ - "tlbi " # tlbop ";", \ - ARM64_WORKAROUND_REPEAT_TLBI, \ - CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \ - "dsb " # sh ";" \ + "dsb nsh;" \ "isb;" \ : : : "memory"); \ } -/* - * FLush TLB by VA. This will likely be used in a loop, so the caller - * is responsible to use the appropriate memory barriers before/after - * the sequence. - * - * See above about the ARM64_WORKAROUND_REPEAT_TLBI sequence. - */ -#define TLB_HELPER_VA(name, tlbop) \ -static inline void name(vaddr_t va) \ -{ \ - asm_inline volatile ( \ - "tlbi " # tlbop ", %0;" \ - ALTERNATIVE( \ - "nop; nop;", \ - "dsb ish;" \ - "tlbi " # tlbop ", %0;", \ - ARM64_WORKAROUND_REPEAT_TLBI, \ - CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \ - : : "r" (va >> PAGE_SHIFT) : "memory"); \ +#define TLB_HELPER(name, tlbop) \ +static inline void name(void) \ +{ \ + asm_inline volatile ( \ + "dsb ishst;" \ + "tlbi " # tlbop ";" \ + ALTERNATIVE( \ + "nop; nop;", \ + "dsb ish;" \ + "tlbi vale2is, xzr;", \ + ARM64_WORKAROUND_REPEAT_TLBI, \ + CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \ + "dsb ish;" \ + "isb;" \ + : : : "memory"); \ } /* Flush local TLBs, current VMID only. */ -TLB_HELPER(flush_guest_tlb_local, vmalls12e1, nsh) +TLB_HELPER_LOCAL(flush_guest_tlb_local, vmalls12e1) /* Flush innershareable TLBs, current VMID only */ -TLB_HELPER(flush_guest_tlb, vmalls12e1is, ish) +TLB_HELPER(flush_guest_tlb, vmalls12e1is) /* Flush local TLBs, all VMIDs, non-hypervisor mode */ -TLB_HELPER(flush_all_guests_tlb_local, alle1, nsh) +TLB_HELPER_LOCAL(flush_all_guests_tlb_local, alle1) /* Flush innershareable TLBs, all VMIDs, non-hypervisor mode */ -TLB_HELPER(flush_all_guests_tlb, alle1is, ish) +TLB_HELPER(flush_all_guests_tlb, alle1is) /* Flush all hypervisor mappings from the TLB of the local processor. */ -TLB_HELPER(flush_xen_tlb_local, alle2, nsh) +TLB_HELPER_LOCAL(flush_xen_tlb_local, alle2) + +#undef TLB_HELPER_LOCAL +#undef TLB_HELPER + +/* + * FLush TLB by VA. This will likely be used in a loop, so the caller + * is responsible to use the appropriate memory barriers before/after + * the sequence. + */ /* Flush TLB of local processor for address va. */ -TLB_HELPER_VA(__flush_xen_tlb_one_local, vae2) +static inline void __flush_xen_tlb_one_local(vaddr_t va) +{ + asm_inline volatile ( + "tlbi vae2, %0" : : "r" (va >> PAGE_SHIFT) : "memory"); +} /* Flush TLB of all processors in the inner-shareable domain for address va. */ -TLB_HELPER_VA(__flush_xen_tlb_one, vae2is) +static inline void __flush_xen_tlb_one(vaddr_t va) +{ + asm_inline volatile ( + "tlbi vae2is, %0" : : "r" (va >> PAGE_SHIFT) : "memory"); +} -#undef TLB_HELPER -#undef TLB_HELPER_VA +/* + * ARM64_WORKAROUND_REPEAT_TLBI: + * For all relevant erratas it is only necessary to execute a single + * additional TLBI;DSB sequence after any number of TLBIs are completed by DSB. + */ +static inline void __tlb_repeat_sync(void) +{ + asm_inline volatile ( + ALTERNATIVE( + "nop; nop;", + "tlbi vale2is, xzr;" + "dsb ish;", + ARM64_WORKAROUND_REPEAT_TLBI, + CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) + : : : "memory"); +} #endif /* __ASM_ARM_ARM64_FLUSHTLB_H__ */ /* diff --git a/xen/arch/arm/include/asm/flushtlb.h b/xen/arch/arm/include/asm/flushtlb.h index e45fb6d97b02..c292c3c00d29 100644 --- a/xen/arch/arm/include/asm/flushtlb.h +++ b/xen/arch/arm/include/asm/flushtlb.h @@ -65,6 +65,7 @@ static inline void flush_xen_tlb_range_va(vaddr_t va, va += PAGE_SIZE; } dsb(ish); /* Ensure the TLB invalidation has completed */ + __tlb_repeat_sync(); isb(); } -- 2.43.0

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.