[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 14/17] xen/riscv: implement p2m_next_level()
On 7/2/25 10:35 AM, Jan Beulich wrote:
On 10.06.2025 15:05, Oleksii Kurochko wrote:--- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -387,6 +387,17 @@ static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte) return p2m_type_radix_get(p2m, pte) != p2m_invalid; } +/* + * pte_is_* helpers are checking the valid bit set in the + * PTE but we have to check p2m_type instead (look at the comment above + * p2me_is_valid()) + * Provide our own overlay to check the valid bit. + */ +static inline bool p2me_is_mapping(struct p2m_domain *p2m, pte_t pte) +{ + return p2me_is_valid(p2m, pte) && (pte.pte & PTE_ACCESS_MASK); +}Same question as on the earlier patch - does P2M type apply to intermediate page tables at all? (Conceptually it shouldn't.) It doesn't matter whether it is an intermediate page table or a leaf PTE pointing to a page — PTE should be valid. Considering that in the current implementation it’s possible for PTE.v = 0 but P2M.v = 1, it is better to check P2M.v instead of PTE.v. @@ -492,6 +503,70 @@ static pte_t p2m_entry_from_mfn(struct p2m_domain *p2m, mfn_t mfn, p2m_type_t t, return e; } +/* Generate table entry with correct attributes. */ +static pte_t page_to_p2m_table(struct p2m_domain *p2m, struct page_info *page) +{ + /* + * Since this function generates a table entry, according to "Encoding + * of PTE R/W/X fields," the entry's r, w, and x fields must be set to 0 + * to point to the next level of the page table. + * Therefore, to ensure that an entry is a page table entry, + * `p2m_access_n2rwx` is passed to `mfn_to_p2m_entry()` as the access value, + * which overrides whatever was passed as `p2m_type_t` and guarantees that + * the entry is a page table entry by setting r = w = x = 0. + */ + return p2m_entry_from_mfn(p2m, page_to_mfn(page), p2m_ram_rw, p2m_access_n2rwx);Similarly P2M access shouldn't apply to intermediate page tables. (Moot with that, but (ab)using p2m_access_n2rwx would also look wrong: You did read what it means, didn't you?)
+} + +static struct page_info *p2m_alloc_page(struct domain *d) +{ + struct page_info *pg; + + /* + * For hardware domain, there should be no limit in the number of pages that + * can be allocated, so that the kernel may take advantage of the extended + * regions. Hence, allocate p2m pages for hardware domains from heap. + */ + if ( is_hardware_domain(d) ) + { + pg = alloc_domheap_page(d, MEMF_no_owner); + if ( pg == NULL ) + printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n"); + }The comment looks to have been taken verbatim from Arm. Whatever "extended regions" are, does the same concept even exist on RISC-V? Initially, I missed that it’s used only for Arm. Since it was mentioned in
Also, special casing Dom0 like this has benefits, but also comes with a pitfall: If the system's out of memory, allocations will fail. A pre- populated pool would avoid that (until exhausted, of course). If special- casing of Dom0 is needed, I wonder whether ...+ else + { + spin_lock(&d->arch.paging.lock); + pg = page_list_remove_head(&d->arch.paging.p2m_freelist); + spin_unlock(&d->arch.paging.lock); + }... going this path but with a Dom0-only fallback to general allocation wouldn't be the better route. IIUC, then it should be something like: static struct page_info *p2m_alloc_page(struct domain *d) { struct page_info *pg; spin_lock(&d->arch.paging.lock); pg = page_list_remove_head(&d->arch.paging.p2m_freelist); spin_unlock(&d->arch.paging.lock); if ( !pg && is_hardware_domain(d) ) { /* Need to allocate more memory from domheap */ pg = alloc_domheap_page(d, MEMF_no_owner); if ( pg == NULL ) { printk(XENLOG_ERR "Failed to allocate pages.\n"); return pg; } ACCESS_ONCE(d->arch.paging.total_pages)++; page_list_add_tail(pg, &d->arch.paging.freelist); } return pg; } And basically use Do I understand your idea correctly? ( Probably, this is the reply you’re referring to: https://lore.kernel.org/xen-devel/43e89225-5e69-49a6-a8c8-bda6d120d8ff@xxxxxxxx/, at the moment, I can't find a better one. ) + return pg; +} + +/* Allocate a new page table page and hook it in via the given entry. */ +static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry) +{ + struct page_info *page; + pte_t *p; + + ASSERT(!p2me_is_valid(p2m, *entry)); + + page = p2m_alloc_page(p2m->domain); + if ( page == NULL ) + return -ENOMEM; + + page_list_add(page, &p2m->pages); + + p = __map_domain_page(page); + clear_page(p); + + unmap_domain_page(p);clear_domain_page()? Or actually clear_and_clean_page()? Agree, clear_and_clean_page() would be better here. @@ -516,9 +591,33 @@ static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl, unsigned int level, pte_t **table, unsigned int offset) { - panic("%s: hasn't been implemented yet\n", __func__); + pte_t *entry; + int ret; + mfn_t mfn; + + entry = *table + offset; + + if ( !p2me_is_valid(p2m, *entry) ) + { + if ( !alloc_tbl ) + return GUEST_TABLE_MAP_NONE; + + ret = p2m_create_table(p2m, entry); + if ( ret ) + return GUEST_TABLE_MAP_NOMEM; + } + + /* The function p2m_next_level() is never called at the last level */ + ASSERT(level != 0);Logically you would perhaps better do this ahead of trying to allocate a page table. Calls here with level == 0 are invalid in all cases aiui, not just when you make it here. It makes sense. I will move ASSERT() to the start of the function p2m_next_level(). + if ( p2me_is_mapping(p2m, *entry) ) + return GUEST_TABLE_SUPER_PAGE; + + mfn = mfn_from_pte(*entry); + + unmap_domain_page(*table); + *table = map_domain_page(mfn);Just to mention it (may not need taking care of right away), there's an inefficiency here: In p2m_create_table() you map the page to clear it. Then you tear down that mapping, just to re-establish it here. I will add: /* * TODO: There's an inefficiency here: * In p2m_create_table(), the page is mapped to clear it. * Then that mapping is torn down in p2m_create_table(), * only to be re-established here. */ *table = map_domain_page(mfn); Thanks. ~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |