[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 bpf-next 2/2] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().



> > This interface and in general VM_SPARSE would be useful for
> > dynamically grown kernel stacks [1]. However, the might_sleep() here
> > would be a problem. We would need to be able to handle
> > vm_area_map_pages() from interrupt disabled context therefore no
> > sleeping. The caller would need to guarantee that the page tables are
> > pre-allocated before the mapping.
>
> Sounds like we'd need to differentiate two kinds of sparse regions.
> One that is really sparse where page tables are not populated (bpf use case)
> and another where only the pte level might be empty.
> Only the latter one will be usable for such auto-grow stacks.
>
> Months back I played with this idea:
> https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?&id=ce63949a879f2f26c1c1834303e6dfbfb79d1fbd
> that
> "Make vmap_pages_range() allocate page tables down to the last (PTE) level."
> Essentially pass NULL instead of 'pages' into vmap_pages_range()
> and it will populate all levels except the last.

Yes, this is what is needed, however, it can be a little simpler with
kernel stacks:
given that the first page in the vm_area is mapped when stack is first
allocated, and that the VA range is aligned to 16K, we actually are
guaranteed to have all page table levels down to pte pre-allocated
during that initial mapping. Therefore, we do not need to worry about
allocating them later during PFs.

> Then the page fault handler can service a fault in auto-growing stack
> area if it has a page stashed in some per-cpu free list.
> I suspect this is something you might need for
> "16k stack that is populated on fault",
> plus a free list of 3 pages per-cpu,
> and set_pte_at() in pf handler.

Yes, what you described is exactly what I am working on: using 3-pages
per-cpu to handle kstack page faults. The only thing that is missing
is that I would like to have the ability to call a non-sleeping
version of vm_area_map_pages().

Pasha



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.