[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 0/5] Fix lazy mmu mode
On 14/04/2025 15:04, Alexander Gordeev wrote: > On Mon, Apr 14, 2025 at 02:22:53PM +0100, Ryan Roberts wrote: >> On 10/04/2025 17:07, Alexander Gordeev wrote: >>>> I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As >>>> part >>>> of that, I will extend lazy mmu mode to cover kernel mappings in vmalloc >>>> table >>>> walkers. While lazy mmu mode is already used for kernel mappings in a few >>>> places, this will extend it's use significantly. >>>> >>>> Having reviewed the existing lazy mmu implementations in powerpc, sparc >>>> and x86, >>>> it looks like there are a bunch of bugs, some of which may be more likely >>>> to >>>> trigger once I extend the use of lazy mmu. >>> >>> Do you have any idea about generic code issues as result of not adhering to >>> the originally stated requirement: >>> >>> /* >>> ... >>> * the PTE updates which happen during this window. Note that using this >>> * interface requires that read hazards be removed from the code. A read >>> * hazard could result in the direct mode hypervisor case, since the >>> actual >>> * write to the page tables may not yet have taken place, so reads though >>> * a raw PTE pointer after it has been modified are not guaranteed to be >>> * up to date. >>> ... >>> */ >>> >>> I tried to follow few code paths and at least this one does not look so >>> good: >>> >>> copy_pte_range(..., src_pte, ...) >>> ret = copy_nonpresent_pte(..., src_pte, ...) >>> try_restore_exclusive_pte(..., src_pte, ...) // >>> is_device_exclusive_entry(entry) >>> restore_exclusive_pte(..., ptep, ...) >>> set_pte_at(..., ptep, ...) >>> set_pte(ptep, pte); // save in lazy >>> mmu mode >>> >>> // ret == -ENOENT >>> >>> ptent = ptep_get(src_pte); // lazy mmu >>> save is not observed >>> ret = copy_present_ptes(..., ptent, ...); // wrong ptent >>> used >>> >>> I am not aware whether the effort to "read hazards be removed from the code" >>> has ever been made and the generic code is safe in this regard. >>> >>> What is your take on this? >> >> Hmm, that looks like a bug to me, at least based on the stated requirements. >> Although this is not a "read through a raw PTE *pointer*", it is a >> ptep_get(). >> The arch code can override that so I guess it has an opportunity to flush. >> But I >> don't think any arches are currently doing that. >> >> Probably the simplest fix is to add arch_flush_lazy_mmu_mode() before the >> ptep_get()? > > Which would completely revert the very idea of the lazy mmu mode? > (As one would flush on every PTE page table iteration). Well yes, but this is a pretty rare path, I'm guessing? > >> It won't be a problem in practice for arm64, since the pgtables are always >> updated immediately. I just want to use these hooks to defer/batch barriers >> in >> certain cases. >> >> And this is a pre-existing issue for the arches that use lazy mmu with >> device-exclusive mappings, which my extending lazy mmu into vmalloc won't >> exacerbate. >> >> Would you be willing/able to submit a fix? > > Well, we have a dozen of lazy mmu cases and I would guess it is not the > only piece of code that seems affected. I was thinking about debug feature > that could help spotting all troubled locations. > > Then we could assess and decide if it is feasible to fix. Just turning the > code above into the PTE read-modify-update pattern is quite an exercise... > >> Thanks, >> Ryan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |