[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 12/17] xen/riscv: Implement p2m_free_entry() and related helpers
On 7/14/25 9:15 AM, Jan Beulich wrote:
On 11.07.2025 17:56, Oleksii Kurochko wrote:On 7/1/25 4:23 PM, Jan Beulich wrote:On 10.06.2025 15:05, Oleksii Kurochko wrote:+/* + * In the case of the P2M, the valid bit is used for other purpose. Use + * the type to check whether an entry is valid. + */ static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte) { - panic("%s: isn't implemented for now\n", __func__); + return p2m_type_radix_get(p2m, pte) != p2m_invalid; +}No checking of the valid bit?As mentioned in the comment, only the P2M type should be checked, since the valid bit is used for other purposes we discussed earlier, for example, to track whether pages were accessed by a guest domain, or to support certain table invalidation optimizations (1) and (2). So, in this case, we only need to consider whether the entry is invalid from the P2M perspective. (1)https://github.com/xen-project/xen/blob/19772b67/xen/arch/arm/mmu/p2m.c#L1245 (2)https://github.com/xen-project/xen/blob/19772b67/xen/arch/arm/mmu/p2m.c#L1386And there can be e.g. entries with the valid bit set and the type being p2m_invalid? It shouldn't be so, at least, at the moment, I don't know such cases. IOW there's no short-circuiting possible in any of the possible cases, avoiding the radix tree lookup in at least some of the cases? Yes, I’ve implemented such optimization. I started using two free bits in the PTE for some “popular” types: static p2m_type_t p2m_get_type(struct p2m_domain *p2m, pte_t pte) { p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK); if ( type == p2m_device_tree_type ) { ... ptr = radix_tree_lookup(&p2m->p2m_types, gfn_x(gfn)); ... return radix_tree_ptr_to_int(ptr); } return type; } /* * In the case of the P2M, the valid bit is used for other purpose. Use * the type to check whether an entry is valid. */ static inline bool p2m_is_valid(struct p2m_domain *p2m, pte_t pte) { return p2m_get_type(p2m, pte) != p2m_invalid; } But thanks to your reply, I realized that in the case of @@ -404,11 +426,127 @@ static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl, return GUEST_TABLE_MAP_NONE; } +static void p2m_put_foreign_page(struct page_info *pg) +{ + /* + * It's safe to do the put_page here because page_alloc will + * flush the TLBs if the page is reallocated before the end of + * this loop. + */ + put_page(pg);Is the comment really true? The page allocator will flush the normal TLBs, but not the stage-2 ones. Yet those are what you care about here, aiui.In alloc_heap_pages(): ... if ( need_tlbflush ) filtered_flush_tlb_mask(tlbflush_timestamp); ... filtered_flush_tlb_mask() calls arch_flush_tlb_mask(). and arch_flush_tlb_mask(), at least, on Arm (I haven't checked x86) is implented as: void arch_flush_tlb_mask(const cpumask_t *mask) { /* No need to IPI other processors on ARM, the processor takes care of it. */ flush_all_guests_tlb(); } So it flushes stage-2 TLB.Hmm, okay. And I take it you have the same plan on RISC-V? Yes, there is such a plan. What I'd like to ask for, though, is that the comment (also) mentions where that (guest) flushing actually happens. That's not in page_alloc.c, and it also wasn't originally intended for guest TLBs to also be flushed from there (as x86 is where the flush avoidance machinery originates, which Arm and now also RISC-V don't really use). Sure, it makes sense to update the comment. +/* Put any references on the single 4K page referenced by mfn. */ +static void p2m_put_4k_page(mfn_t mfn, p2m_type_t type) +{ + /* TODO: Handle other p2m types */ + + /* Detect the xenheap page and mark the stored GFN as invalid. */ + if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) ) + page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);Is this a valid thing to do? How do you make sure the respective uses (in gnttab's shared and status page arrays) are / were also removed?As grant table frame GFN is stored directly in struct page_info instead of keeping it in standalone status/shared arrays, thereby there is no need for status/shared arrays.I fear I don't follow. Looking at Arm's header (which I understand you derive from), I see #define gnttab_shared_page(t, i) virt_to_page((t)->shared_raw[i]) #define gnttab_status_page(t, i) virt_to_page((t)->status[i]) Are you intending to do things differently? I missed these arrays... Arm had different arrays: - (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_); \ - (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_); \ I think I don't know the answer to your question, as I'm not deeply familiar with grant tables and would need to do some additional investigation. And just to be sure I understand your question correctly: are you asking
whether I marked a page as
+/* Put any references on the superpage referenced by mfn. */ +static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t type) +{ + struct page_info *pg; + unsigned int i; + + ASSERT(mfn_valid(mfn)); + + pg = mfn_to_page(mfn); + + for ( i = 0; i < XEN_PT_ENTRIES; i++, pg++ ) + p2m_put_foreign_page(pg); +} + +/* Put any references on the page referenced by pte. */ +static void p2m_put_page(struct p2m_domain *p2m, const pte_t pte, + unsigned int level) +{ + mfn_t mfn = pte_get_mfn(pte); + p2m_type_t p2m_type = p2m_type_radix_get(p2m, pte);This gives you the type of the 1st page. What guarantees that all other pages in a superpage are of the exact same type?Doesn't superpage mean that all the 4KB pages within that superpage have the same type and contiguous in memory?If the mapping is a super-page one - yes. Yet I see nothing super-page-ish here. Probably, I just misunderstood your reply, but there is a check below:
if ( level == 2 )
return p2m_put_l2_superpage(mfn, pte.p2m.type);
And I expect that if
+ if ( level == 1 ) + return p2m_put_2m_superpage(mfn, p2m_type); + else if ( level == 0 ) + return p2m_put_4k_page(mfn, p2m_type);Use switch() right away?It could be, I think that no big difference at the moment, at least. But I am okay to rework it.If you don't want to use switch() here, then my other style nit would need giving: Please avoid "else" in situations like this.+static void p2m_free_page(struct domain *d, struct page_info *pg) +{ + if ( is_hardware_domain(d) ) + free_domheap_page(pg);Why's the hardware domain different here? It should have a pool just like all other domains have.Hardware domain (dom0) should be no limit in the number of pages that can be allocated, so allocate p2m pages for hardware domain is done from heap. An idea of p2m pool is to provide a way how to put clear limit and amount to the p2m allocation.Well, we had been there on another thread, and I outlined how I think Dom0 may want handling. I think that I don't remember. Could you please remind me what was that thread? Probably, do you mean this reply: https://lore.kernel.org/xen-devel/cover.1749555949.git.oleksii.kurochko@xxxxxxxxx/T/#m4789842aaae1653b91d3368f66cadb0ef87fb17e ? But this is not really about Dom0 case. /* Free pte sub-tree behind an entry */ static void p2m_free_entry(struct p2m_domain *p2m, pte_t entry, unsigned int level) { - panic("%s: hasn't been implemented yet\n", __func__); + unsigned int i; + pte_t *table; + mfn_t mfn; + struct page_info *pg; + + /* Nothing to do if the entry is invalid. */ + if ( !p2me_is_valid(p2m, entry) ) + return;Does this actually apply to intermediate page tables (which you handle later in the function), when that's (only) a P2M type check?Yes, any PTE should have V bit set to 1, so from P2M perspective it also should be, at least, not equal to p2m_invalid.I don't follow. Where would that type be set? The radix tree being GFN- indexed, you would need to "invent" a GFN for every intermediate page table, just to be able to (legitimately) invoke the type retrieval function. Maybe, it is incorrect, but in this patch series the type is set when
Maybe you mean to leverage that (now, i.e. post-v2) you encode some of the types directly in the PTE, and p2m_invalid may be one of them. But that wasn't the case in the v2 submission, and hence the code looked wrong to me. Which in turn suggests that at least some better commentary is going to be needed, maybe even some BUILD_BUG_ON(). ... p2m_invalid type will be encoded directly in the PTE in the next patch version. ~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |