[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections
Can't we just use an "enum lazy_mmu_state" and call it a day?I could envision something completely different for this type on s390, e.g. a pointer to a per-cpu structure. So I would really ask to stick with the current approach.This is indeed the motivation - let every arch do whatever it sees fit. lazy_mmu_state_t is basically an opaque type as far as generic code is concerned, which also means that this API change is the first and last one we need (famous last words, I know). It makes the API more complicated, though. :) I mentioned in the cover letter that the pkeys-based page table protection series [1] would have an immediate use for lazy_mmu_state_t. In that proposal, any helper writing to pgtables needs to modify the pkey register and then restore it. To reduce the overhead, lazy_mmu is used to set the pkey register only once in enter(), and then restore it in leave() [2]. This currently relies on storing the original pkey register value in thread_struct, which is suboptimal and most Can you elaborate why this is suboptimal? See below regarding the size of task_struct. importantly doesn't work if lazy_mmu sections nest. Can you elaborate why it would be problematic with nesting (if we would have a count and can handle the transition from 0->1 and 1->0)? With this series, we could instead store the pkey register value in lazy_mmu_state_t (enlarging it to 64 bits or more). Yes. I also considered going further and making lazy_mmu_state_t a pointer as Alexander suggested - more complex to manage, but also a lot more flexible.Would that integrate well with LAZY_MMU_DEFAULT etc?Hmm... I though the idea is to use LAZY_MMU_* by architectures that want to use it - at least that is how I read the description above. It is only kasan_populate|depopulate_vmalloc_pte() in generic code that do not follow this pattern, and it looks as a problem to me.This discussion also made me realise that this is problematic, as the LAZY_MMU_{DEFAULT,NESTED} macros were meant only for architectures' convenience, not for generic code (where lazy_mmu_state_t should ideally be an opaque type as mentioned above). It almost feels like the kasan case deserves a different API, because this is not how enter() and leave() are meant to be used. This would mean quite a bit of churn though, so maybe just introduce another arch-defined value to pass to leave() for such a situation - for instance, arch_leave_lazy_mmu_mode(LAZY_MMU_FLUSH)? The discussion made me realize that it's a bit hack right now :) If LAZY_MMU_DEFAULT etc. are not for common code, then please maintain them for the individual archs as well, just like you do with the opaque type. Yes, that's why I am asking. What kind of information (pointer to a per-cpu structure) would you want to return, and would handling it similar to how pagefault_disable()/pagefault_enable() e.g., using a variable in "current" to track the nesting level avoid having s390x to do that?The pagefault_disabled approach works fine for simple use-cases, but it doesn't scale well. The space allocated in task_struct/thread_struct to track that state is wasted (unused) most of the time. I'm not sure that's a concern. Fitting an int into existing holes should work and even another 64bit (8byte )... I just checked with pahole using the Fedora config on current mm-unstable. /* size: 9792, cachelines: 153, members: 276 */ /* sum members: 9619, holes: 20, sum holes: 125 */ /* sum bitfield members: 85 bits, bit holes: 2, sum bit holes: 43 bits */ /* padding: 32 */ /* member types with holes: 4, total: 6, bit holes: 2, total: 2 */ /* paddings: 6, sum paddings: 49 */ /* forced alignments: 12, forced holes: 2, sum forced holes: 60 */ Due to some "arch_task_struct_size" we might actually allocate more space. Staring at my live system: $ sudo slabinfo Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg ... task_struct 1491 12376 24.8M 721/25/37 2 3 3 74 I am not sure if even an additional 8byte would move the needle here. Worse, it does not truly enable states to be nested: it allows the outermost section to store some state, but nested sections cannot allocate extra space. This is really what the stack is for. If it's really just 8 bytes I don't really see the problem. So likely there is more to it? -- Cheers David / dhildenb
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |