[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 00/14] Nesting support for lazy MMU mode


  • To: Kevin Brodsky <kevin.brodsky@xxxxxxx>
  • From: Yeoreum Yun <yeoreum.yun@xxxxxxx>
  • Date: Mon, 15 Dec 2025 16:52:38 +0000
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 4.158.2.129) smtp.rcpttodomain=kvack.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8/W6l29RwKRvQiKjOmlgFycm7Qo8+6A28MW3ffQAxVM=; b=AJXsYXpvlpwfCvh8d6jYBF00xh6SYRiRI43JwODw3HPcA8ZspLglhyBIkhH/rnQZ7bI9VX/CMI3+Mqv8uumdAmD8lBsyGEcer0UksZvZ8o6BafqQ0jZ+xYejQ2KcDPz5cWp4OFFmJHiaCc9akXquv+aw4EJxrSo+KnI3FA+lhq2qCPphv8w1ng07zC+6Vm2j0zIE8aSDm+d4KiOBeYItnGKmcyYVRgiiYKmA0IhfKHdJRAd7y4irtg3RC1hpNGrE+smbUZO0va3WGQS+1VU37mdbg/nEswz+9L2ll/sbeQX0B1tiemch83XxUa7/WErZE/43tNWXkpgu0jhNADkXKQ==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8/W6l29RwKRvQiKjOmlgFycm7Qo8+6A28MW3ffQAxVM=; b=uYSUo8FEDj+a84PWVNUzr+egX2mNotnRPqbY8QbuV7gvrefO7noRRMijbo1S6m62kBzPaWkZ+qUjWTTGAiOA3mAd0QUYaOKePolGXgmd0viAZmr09/cTQxaCUrdEA+PZ7GaaYy4D1KQO8S93CdIRmERw6XRCc+5DluOfG3fqvDf59fI5BPqX4N+Dx/42bc27wviqUnGTXMwP36iiW2JP7A3jOz6kaUG2+VdZy4VFh73g7JjDQ4RADvVsOtzreQM0kseTsc2nU+N3WWLHetLpLinmoizMBy6suEXsrEB0zB0QhigjbjJJP5tBKbBqnFDf9QNWG3LYUNF+OWqJIX+zoQ==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=Uqgjc6EJPcuU47SZIzC8zdoCw+Surzn0qG4wAuNOPzg5fMbTPMt4Oe4W5kB07UbSsRRqGD2xjl7DsK8VTtljMety8Wq8Ay8mA73ifPTXY8sE/nwHkLM4Q50tqkfYzLPFMFuFDB6pYwvLVsWuRcCNoO+vPv6szVhK9r6kC2AuNHq+LvTIy1j6Q5bFWdnY72VoQJpE+7kjAl9lC8BCnQ5iguiHGw87zZCv4xyEcOe7Dq2DjDCL3VPXNo2qil2ZKl+0Dw6SsOF6KePMvnThEhP8YHdzuJLA68XtojvJy9A36e8Pd1w/F6qYBlq5MJR6QSZ47gykH06RP6NDgxulRWhEkg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LgddzwHmxbJSjKxG0hhVuK+odF1FC/zGMr5Rfr3WOvKhRVqG1YtcjvjwIn/2aUQ9f/k6fjZao+EqMjdRylwcLu/aensvxftZaEIdFGQ6Nt6A9Fm2q043hWOUaKBRPREEpZ4rZoZAtDJn1QcqMIkv/hOUoh2IHrDOI0vFd7wof6F4hMuka2GIissipp6NjAN87vWjJFCtFdlRZ3y6a1jUx7wtCpRhFJjtRXy1n5eJ65l/bxWNV8jc9qiZsVgoNSaINWYrywU9o7XDDp9XdcxPeoeBsVsf0jhwDY4KxJtgEBqF7QOa9CFsIdMZMyAydegHcUcTyrE5AoDI+iH2MS/s9A==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: linux-mm@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, Alexander Gordeev <agordeev@xxxxxxxxxxxxx>, Andreas Larsson <andreas@xxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Anshuman Khandual <anshuman.khandual@xxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Catalin Marinas <catalin.marinas@xxxxxxx>, Christophe Leroy <christophe.leroy@xxxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, David Hildenbrand <david@xxxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, David Woodhouse <dwmw2@xxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Jann Horn <jannh@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>, Madhavan Srinivasan <maddy@xxxxxxxxxxxxx>, Michael Ellerman <mpe@xxxxxxxxxxxxxx>, Michal Hocko <mhocko@xxxxxxxx>, Mike Rapoport <rppt@xxxxxxxxxx>, Nicholas Piggin <npiggin@xxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx>, Ryan Roberts <ryan.roberts@xxxxxxx>, Suren Baghdasaryan <surenb@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Venkat Rao Bagalkote <venkat88@xxxxxxxxxxxxx>, Vlastimil Babka <vbabka@xxxxxxx>, Will Deacon <will@xxxxxxxxxx>, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, linuxppc-dev@xxxxxxxxxxxxxxxx, sparclinux@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, x86@xxxxxxxxxx
  • Delivery-date: Mon, 15 Dec 2025 16:54:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true

> When the lazy MMU mode was introduced eons ago, it wasn't made clear
> whether such a sequence was legal:
>
>       arch_enter_lazy_mmu_mode()
>       ...
>               arch_enter_lazy_mmu_mode()
>               ...
>               arch_leave_lazy_mmu_mode()
>       ...
>       arch_leave_lazy_mmu_mode()
>
> It seems fair to say that nested calls to
> arch_{enter,leave}_lazy_mmu_mode() were not expected, and most
> architectures never explicitly supported it.
>
> Nesting does in fact occur in certain configurations, and avoiding it
> has proved difficult. This series therefore enables lazy_mmu sections to
> nest, on all architectures.
>
> Nesting is handled using a counter in task_struct (patch 8), like other
> stateless APIs such as pagefault_{disable,enable}(). This is fully
> handled in a new generic layer in <linux/pgtable.h>; the arch_* API
> remains unchanged. A new pair of calls, lazy_mmu_mode_{pause,resume}(),
> is also introduced to allow functions that are called with the lazy MMU
> mode enabled to temporarily pause it, regardless of nesting.
>
> An arch now opts in to using the lazy MMU mode by selecting
> CONFIG_ARCH_LAZY_MMU; this is more appropriate now that we have a
> generic API, especially with state conditionally added to task_struct.
>
> ---
>
> Background: Ryan Roberts' series from March [1] attempted to prevent
> nesting from ever occurring, and mostly succeeded. Unfortunately, a
> corner case (DEBUG_PAGEALLOC) may still cause nesting to occur on arm64.
> Ryan proposed [2] to address that corner case at the generic level but
> this approach received pushback; [3] then attempted to solve the issue
> on arm64 only, but it was deemed too fragile.
>
> It feels generally difficult to guarantee that lazy_mmu sections don't
> nest, because callers of various standard mm functions do not know if
> the function uses lazy_mmu itself.
>
> The overall approach in v3/v4 is very close to what David Hildenbrand
> proposed on v2 [4].
>
> Unlike in v1/v2, no special provision is made for architectures to
> save/restore extra state when entering/leaving the mode. Based on the
> discussions so far, this does not seem to be required - an arch can
> store any relevant state in thread_struct during arch_enter() and
> restore it in arch_leave(). Nesting is not a concern as these functions
> are only called at the top level, not in nested sections.
>
> The introduction of a generic layer, and tracking of the lazy MMU state
> in task_struct, also allows to streamline the arch callbacks - this
> series removes 67 lines from arch/.
>
> Patch overview:
>
> * Patch 1: cleanup - avoids having to deal with the powerpc
>   context-switching code
>
> * Patch 2-4: prepare arch_flush_lazy_mmu_mode() to be called from the
>   generic layer (patch 9)
>
> * Patch 5: documentation clarification (not directly related to the
>   changes in this series)
>
> * Patch 6-7: new API + CONFIG_ARCH_LAZY_MMU
>
> * Patch 8: ensure correctness in interrupt context
>
> * Patch 9: nesting support
>
> * Patch 10-13: replace arch-specific tracking of lazy MMU mode with
>   generic API
>
> * Patch 14: basic tests to ensure that the state added in patch 9 is
>   tracked correctly
>
> This series has been tested by running the mm kselftests on arm64 with
> DEBUG_VM, DEBUG_PAGEALLOC, KFENCE and KASAN. Extensive testing on
> powerpc was also kindly provided by Venkat Rao Bagalkote [5]. It was
> build-tested on other architectures (with and without XEN_PV on x86).
>
> - Kevin
>
> [1] https://lore.kernel.org/all/20250303141542.3371656-1-ryan.roberts@xxxxxxx/
> [2] https://lore.kernel.org/all/20250530140446.2387131-1-ryan.roberts@xxxxxxx/
> [3] https://lore.kernel.org/all/20250606135654.178300-1-ryan.roberts@xxxxxxx/
> [4] 
> https://lore.kernel.org/all/ef343405-c394-4763-a79f-21381f217b6c@xxxxxxxxxx/
> [5] 
> https://lore.kernel.org/all/94889730-1AEF-458F-B623-04092C0D6819@xxxxxxxxxxxxx/
> ---
> Changelog
>
> v5..v6:
>
> - Rebased on v6.19-rc1
> - Overall: no functional change
> - Patch 5: new patch clarifying that generic code may not sleep while in lazy
>   MMU mode [Alexander Gordeev]
> - Patch 6: added description for the ARCH_HAS_LAZY_MMU_MODE option
>   [Anshuman Khandual]
> - Patch 9: rename in_lazy_mmu_mode() to is_lazy_mmu_mode_active() [Alexander]
> - Patch 14: new patch with basic KUnit tests [Anshuman]
> - Collected R-b/A-b/T-b tags
>
> v5: https://lore.kernel.org/all/20251124132228.622678-1-kevin.brodsky@xxxxxxx/
>
> v4..v5:
>
> - Rebased on mm-unstable
> - Patch 3: added missing radix_enabled() check in arch_flush()
>   [Ritesh Harjani]
> - Patch 6: declare arch_flush_lazy_mmu_mode() as static inline on x86
>   [Ryan Roberts]
> - Patch 7 (formerly 12): moved before patch 8 to ensure correctness in
>   interrupt context [Ryan]. The diffs in in_lazy_mmu_mode() and
>   queue_pte_barriers() are moved to patch 8 and 9 resp.
> - Patch 8:
>   * Removed all restrictions regarding lazy_mmu_mode_{pause,resume}().
>     They may now be called even when lazy MMU isn't enabled, and
>     any call to lazy_mmu_mode_* may be made while paused (such calls
>     will be ignored). [David, Ryan]
>   * lazy_mmu_state.{nesting_level,active} are replaced with
>     {enable_count,pause_count} to track arbitrary nesting of both
>     enable/disable and pause/resume [Ryan]
>   * Added __task_lazy_mmu_mode_active() for use in patch 12 [David]
>   * Added documentation for all the functions [Ryan]
> - Patch 9: keep existing test + set TIF_LAZY_MMU_PENDING instead of
>   atomic RMW [David, Ryan]
> - Patch 12: use __task_lazy_mmu_mode_active() instead of accessing
>   lazy_mmu_state directly [David]
> - Collected R-b/A-b tags
>
> v4: 
> https://lore.kernel.org/all/20251029100909.3381140-1-kevin.brodsky@xxxxxxx/
>
> v3..v4:
>
> - Patch 2: restored ordering of preempt_{disable,enable}() [Dave Hansen]
> - Patch 5 onwards: s/ARCH_LAZY_MMU/ARCH_HAS_LAZY_MMU_MODE/ [Mike Rapoport]
> - Patch 7: renamed lazy_mmu_state members, removed VM_BUG_ON(),
>   reordered writes to lazy_mmu_state members [David Hildenbrand]
> - Dropped patch 13 as it doesn't seem justified [David H]
> - Various improvements to commit messages [David H]
>
> v3: 
> https://lore.kernel.org/all/20251015082727.2395128-1-kevin.brodsky@xxxxxxx/
>
> v2..v3:
>
> - Full rewrite; dropped all Acked-by/Reviewed-by.
> - Rebased on v6.18-rc1.
>
> v2: 
> https://lore.kernel.org/all/20250908073931.4159362-1-kevin.brodsky@xxxxxxx/
>
> v1..v2:
> - Rebased on mm-unstable.
> - Patch 2: handled new calls to enter()/leave(), clarified how the "flush"
>   pattern (leave() followed by enter()) is handled.
> - Patch 5,6: removed unnecessary local variable [Alexander Gordeev's
>   suggestion].
> - Added Mike Rapoport's Acked-by.
>
> v1: 
> https://lore.kernel.org/all/20250904125736.3918646-1-kevin.brodsky@xxxxxxx/
> ---
> Cc: Alexander Gordeev <agordeev@xxxxxxxxxxxxx>
> Cc: Andreas Larsson <andreas@xxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
> Cc: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: Juergen Gross <jgross@xxxxxxxx>
> Cc: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> Cc: Madhavan Srinivasan <maddy@xxxxxxxxxxxxx>
> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Mike Rapoport <rppt@xxxxxxxxxx>
> Cc: Nicholas Piggin <npiggin@xxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Venkat Rao Bagalkote <venkat88@xxxxxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Yeoreum Yun <yeoreum.yun@xxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
> Cc: sparclinux@xxxxxxxxxxxxxxx
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: x86@xxxxxxxxxx
> ---
> Alexander Gordeev (1):
>   powerpc/64s: Do not re-activate batched TLB flush
>
> Kevin Brodsky (13):
>   x86/xen: simplify flush_lazy_mmu()
>   powerpc/mm: implement arch_flush_lazy_mmu_mode()
>   sparc/mm: implement arch_flush_lazy_mmu_mode()
>   mm: clarify lazy_mmu sleeping constraints
>   mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE
>   mm: introduce generic lazy_mmu helpers
>   mm: bail out of lazy_mmu_mode_* in interrupt context
>   mm: enable lazy_mmu sections to nest
>   arm64: mm: replace TIF_LAZY_MMU with is_lazy_mmu_mode_active()
>   powerpc/mm: replace batch->active with is_lazy_mmu_mode_active()
>   sparc/mm: replace batch->active with is_lazy_mmu_mode_active()
>   x86/xen: use lazy_mmu_state when context-switching
>   mm: Add basic tests for lazy_mmu
>
>  arch/arm64/Kconfig                            |   1 +
>  arch/arm64/include/asm/pgtable.h              |  41 +----
>  arch/arm64/include/asm/thread_info.h          |   3 +-
>  arch/arm64/mm/mmu.c                           |   8 +-
>  arch/arm64/mm/pageattr.c                      |   4 +-
>  .../include/asm/book3s/64/tlbflush-hash.h     |  20 +--
>  arch/powerpc/include/asm/thread_info.h        |   2 -
>  arch/powerpc/kernel/process.c                 |  25 ---
>  arch/powerpc/mm/book3s64/hash_tlb.c           |  10 +-
>  arch/powerpc/mm/book3s64/subpage_prot.c       |   4 +-
>  arch/powerpc/platforms/Kconfig.cputype        |   1 +
>  arch/sparc/Kconfig                            |   1 +
>  arch/sparc/include/asm/tlbflush_64.h          |   5 +-
>  arch/sparc/mm/tlb.c                           |  14 +-
>  arch/x86/Kconfig                              |   1 +
>  arch/x86/boot/compressed/misc.h               |   1 +
>  arch/x86/boot/startup/sme.c                   |   1 +
>  arch/x86/include/asm/paravirt.h               |   1 -
>  arch/x86/include/asm/pgtable.h                |   1 +
>  arch/x86/include/asm/thread_info.h            |   4 +-
>  arch/x86/xen/enlighten_pv.c                   |   3 +-
>  arch/x86/xen/mmu_pv.c                         |   6 +-
>  fs/proc/task_mmu.c                            |   4 +-
>  include/linux/mm_types_task.h                 |   5 +
>  include/linux/pgtable.h                       | 158 +++++++++++++++++-
>  include/linux/sched.h                         |  45 +++++
>  mm/Kconfig                                    |  19 +++
>  mm/Makefile                                   |   1 +
>  mm/kasan/shadow.c                             |   8 +-
>  mm/madvise.c                                  |  18 +-
>  mm/memory.c                                   |  16 +-
>  mm/migrate_device.c                           |   8 +-
>  mm/mprotect.c                                 |   4 +-
>  mm/mremap.c                                   |   4 +-
>  mm/tests/lazy_mmu_mode_kunit.c                |  71 ++++++++
>  mm/userfaultfd.c                              |   4 +-
>  mm/vmalloc.c                                  |  12 +-
>  mm/vmscan.c                                   |  12 +-
>  38 files changed, 380 insertions(+), 166 deletions(-)
>  create mode 100644 mm/tests/lazy_mmu_mode_kunit.c
>
>
> base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
> --
> 2.51.2

All of these look good to me.

Reviewed-by: Yeoreum Yun <yeoreum.yun@xxxxxxx>

--
Sincerely,
Yeoreum Yun



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.