[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH v1 0/6] Lazy mmu mode fixes and improvements
On 31/05/2025 08:46, Mike Rapoport wrote: > Hi Ryan, > > On Fri, May 30, 2025 at 04:55:36PM +0100, Ryan Roberts wrote: >> On 30/05/2025 15:47, Lorenzo Stoakes wrote: >>> +cc Jann who is a specialist in all things page table-y and especially scary >>> edge cases :) >>> >>> On Fri, May 30, 2025 at 03:04:38PM +0100, Ryan Roberts wrote: >>>> Hi All, >>>> >>>> I recently added support for lazy mmu mode on arm64. The series is now in >>>> Linus's tree so should be in v6.16-rc1. But during testing in linux-next we >>>> found some ugly corners (unexpected nesting). I was able to fix those >>>> issues by >>>> making the arm64 implementation more permissive (like the other arches). >>>> But >>>> this is quite fragile IMHO. So I'd rather fix the root cause and ensure >>>> that >>>> lazy mmu mode never nests, and more importantly, that code never makes >>>> pgtable >>>> modifications expecting them to be immediate, not knowing that it's >>>> actually in >>>> lazy mmu mode so the changes get deferred. >>> >>> When you say fragile, are you confident it _works_ but perhaps not quite as >>> well >>> as you want? Or are you concerned this might be broken upstream in any way? >> >> I'm confident that it _works_ for arm64 as it is, upstream. But if Dev's >> series >> were to go in _without_ the lazy_mmu bracketting in some manner, then it >> would >> be broken if the config includes CONFIG_DEBUG_PAGEALLOC. >> >> There's a lot more explanation in the later patches as to how it can be >> broken, >> but for arm64, the situation is currently like this, because our >> implementation >> of __change_memory_common() uses apply_to_page_range() which implicitly >> starts >> an inner lazy_mmu_mode. We enter multiple times, but we exit one the first >> call >> to exit. Everything works correctly but it's not optimal because C is no >> longer >> deferred: >> >> arch_enter_lazy_mmu_mode() << outer lazy mmu region >> <do some pte changes (A)> >> alloc_pages() >> debug_pagealloc_map_pages() >> __kernel_map_pages() >> __change_memory_common() >> arch_enter_lazy_mmu_mode() << inner lazy mmu region >> <change kernel pte to make valid (B)> >> arch_leave_lazy_mmu_mode() << exit; complete A + B >> clear_page() >> <do some more pte changes (C)> << no longer in lazy mode >> arch_leave_lazy_mmu_mode() << nop >> >> An alternative implementation would not add the nested lazy mmu mode, so we >> end >> up with this: >> >> arch_enter_lazy_mmu_mode() << outer lazy mmu region >> <do some pte changes (A)> >> alloc_pages() >> debug_pagealloc_map_pages() >> __kernel_map_pages() >> __change_memory_common() >> <change kernel pte to make valid (B)> << deferred due to lazy mmu >> clear_page() << BANG! B has not be >> actioned >> <do some more pte changes (C)> >> arch_leave_lazy_mmu_mode() >> >> This is clearly a much worse outcome. It's not happening today but it could >> in >> future. That's why I'm claiming it's fragile. It's much better (IMHO) to >> disallow calling the page allocator when in lazy mmu mode. > > First, I think it should be handled completely inside arch/arm64. Page > allocation worked on lazy mmu mode on other architectures, no reason it > should be changed because of the way arm64 implements lazy mmu. > > Second, DEBUG_PAGEALLOC already implies that performance is bad, for it to > be useful the kernel should be mapped with base pages and there's map/unmap > for every page allocation so optimizing a few pte changes (C in your > example) won't matter much. > > If there's a potential correctness issue with Dev's patches, it should be > dealt with as a part of those patches with the necessary updates of how > lazy mmu is implemented on arm64 and used in pageattr.c. > > And it seems to me that adding something along the lines below to > __kernel_map_pages() would solve DEBUG_PAGEALLOC issue: > > void __kernel_map_pages(struct page *page, int numpages, int enable) > { > unsigned long flags; > bool lazy_mmu = false; > > if (!can_set_direct_map()) > return; > > flags = read_thread_flags(); > if (flags & BIT(TIF_LAZY_MMU)) > lazy_mmu = true; > > set_memory_valid((unsigned long)page_address(page), numpages, enable); > > if (lazy_mmu) > set_thread_flag(TIF_LAZY_MMU); > } Hi Mike, I've thought about this for a bit, and concluded that you are totally right. This is a much smaller, arm64-contained patch. Sorry for the noise here, and thanks for the suggestion. Thanks, Ryan > >> Thanks, >> Ryan >
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |