[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections
On Tue, Sep 09, 2025 at 03:49:46PM +0200, Kevin Brodsky wrote: > On 09/09/2025 13:54, David Hildenbrand wrote: > > On 09.09.25 13:45, Alexander Gordeev wrote: > >> On Tue, Sep 09, 2025 at 12:09:48PM +0200, David Hildenbrand wrote: > >>> On 09.09.25 11:40, Alexander Gordeev wrote: > >>>> On Tue, Sep 09, 2025 at 11:07:36AM +0200, David Hildenbrand wrote: > >>>>> On 08.09.25 09:39, Kevin Brodsky wrote: > >>>>>> arch_{enter,leave}_lazy_mmu_mode() currently have a stateless API > >>>>>> (taking and returning no value). This is proving problematic in > >>>>>> situations where leave() needs to restore some context back to its > >>>>>> original state (before enter() was called). In particular, this > >>>>>> makes it difficult to support the nesting of lazy_mmu sections - > >>>>>> leave() does not know whether the matching enter() call occurred > >>>>>> while lazy_mmu was already enabled, and whether to disable it or > >>>>>> not. > >>>>>> > >>>>>> This patch gives all architectures the chance to store local state > >>>>>> while inside a lazy_mmu section by making enter() return some value, > >>>>>> storing it in a local variable, and having leave() take that value. > >>>>>> That value is typed lazy_mmu_state_t - each architecture defining > >>>>>> __HAVE_ARCH_ENTER_LAZY_MMU_MODE is free to define it as it sees fit. > >>>>>> For now we define it as int everywhere, which is sufficient to > >>>>>> support nesting. > >>>> ... > >>>>>> { > >>>>>> + lazy_mmu_state_t lazy_mmu_state; > >>>>>> ... > >>>>>> - arch_enter_lazy_mmu_mode(); > >>>>>> + lazy_mmu_state = arch_enter_lazy_mmu_mode(); > >>>>>> ... > >>>>>> - arch_leave_lazy_mmu_mode(); > >>>>>> + arch_leave_lazy_mmu_mode(lazy_mmu_state); > >>>>>> ... > >>>>>> } > >>>>>> > >>>>>> * In a few cases (e.g. xen_flush_lazy_mmu()), a function knows that > >>>>>> lazy_mmu is already enabled, and it temporarily disables it by > >>>>>> calling leave() and then enter() again. Here we want to ensure > >>>>>> that any operation between the leave() and enter() calls is > >>>>>> completed immediately; for that reason we pass > >>>>>> LAZY_MMU_DEFAULT to > >>>>>> leave() to fully disable lazy_mmu. enter() will then > >>>>>> re-enable it > >>>>>> - this achieves the expected behaviour, whether nesting > >>>>>> occurred > >>>>>> before that function was called or not. > >>>>>> > >>>>>> Note: it is difficult to provide a default definition of > >>>>>> lazy_mmu_state_t for architectures implementing lazy_mmu, because > >>>>>> that definition would need to be available in > >>>>>> arch/x86/include/asm/paravirt_types.h and adding a new generic > >>>>>> #include there is very tricky due to the existing header soup. > >>>>> > >>>>> Yeah, I was wondering about exactly that. > >>>>> > >>>>> In particular because LAZY_MMU_DEFAULT etc resides somewehere > >>>>> compeltely > >>>>> different. > >>>>> > >>>>> Which raises the question: is using a new type really of any > >>>>> benefit here? > >>>>> > >>>>> Can't we just use an "enum lazy_mmu_state" and call it a day? > >>>> > >>>> I could envision something completely different for this type on s390, > >>>> e.g. a pointer to a per-cpu structure. So I would really ask to stick > >>>> with the current approach. > > This is indeed the motivation - let every arch do whatever it sees fit. > lazy_mmu_state_t is basically an opaque type as far as generic code is > concerned, which also means that this API change is the first and last > one we need (famous last words, I know). > > I mentioned in the cover letter that the pkeys-based page table > protection series [1] would have an immediate use for lazy_mmu_state_t. > In that proposal, any helper writing to pgtables needs to modify the > pkey register and then restore it. To reduce the overhead, lazy_mmu is > used to set the pkey register only once in enter(), and then restore it > in leave() [2]. This currently relies on storing the original pkey > register value in thread_struct, which is suboptimal and most > importantly doesn't work if lazy_mmu sections nest. With this series, we > could instead store the pkey register value in lazy_mmu_state_t > (enlarging it to 64 bits or more). > > I also considered going further and making lazy_mmu_state_t a pointer as > Alexander suggested - more complex to manage, but also a lot more flexible. > > >>> Would that integrate well with LAZY_MMU_DEFAULT etc? > >> > >> Hmm... I though the idea is to use LAZY_MMU_* by architectures that > >> want to use it - at least that is how I read the description above. > >> > >> It is only kasan_populate|depopulate_vmalloc_pte() in generic code > >> that do not follow this pattern, and it looks as a problem to me. > > This discussion also made me realise that this is problematic, as the > LAZY_MMU_{DEFAULT,NESTED} macros were meant only for architectures' > convenience, not for generic code (where lazy_mmu_state_t should ideally > be an opaque type as mentioned above). It almost feels like the kasan > case deserves a different API, because this is not how enter() and > leave() are meant to be used. This would mean quite a bit of churn > though, so maybe just introduce another arch-defined value to pass to > leave() for such a situation - for instance, > arch_leave_lazy_mmu_mode(LAZY_MMU_FLUSH)? What about to adjust the semantics of apply_to_page_range() instead? It currently assumes any caller is fine with apply_to_pte_range() to enter the lazy mode. By contrast, kasan_(de)populate_vmalloc_pte() are not fine at all and must leave the lazy mode. That literally suggests the original assumption is incorrect. We could change int apply_to_pte_range(..., bool create, ...) to e.g. apply_to_pte_range(..., unsigned int flags, ...) and introduce a flag that simply skips entering the lazy mmu mode. Thanks!
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |