[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen staging] xen/riscv: Implement superpage splitting for p2m mappings



commit 72a516613f459f56ffb7e80395886c1f6707ef74
Author:     Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
AuthorDate: Tue Dec 16 17:55:24 2025 +0100
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Thu Dec 18 14:25:37 2025 +0100

    xen/riscv: Implement superpage splitting for p2m mappings
    
    Add support for down large memory mappings ("superpages") in the RISC-V
    p2m mapping so that smaller, more precise mappings ("finer-grained entries")
    can be inserted into lower levels of the page table hierarchy.
    
    To implement that the following is done:
    - Introduce p2m_split_superpage(): Recursively shatters a superpage into
      smaller page table entries down to the target level, preserving original
      permissions and attributes.
    - p2m_set_entry() updated to invoke superpage splitting when inserting
      entries at lower levels within a superpage-mapped region.
    
    This implementation is based on the ARM code, with modifications to the part
    that follows the BBM (break-before-make) approach, some parts are simplified
    as according to RISC-V spec:
      It is permitted for multiple address-translation cache entries to co-exist
      for the same address. This represents the fact that in a conventional
      TLB hierarchy, it is possible for multiple entries to match a single
      address if, for example, a page is upgraded to a superpage without first
      clearing the original non-leaf PTEâ??s valid bit and executing an 
SFENCE.VMA
      with rs1=x0, or if multiple TLBs exist in parallel at a given level of the
      hierarchy. In this case, just as if an SFENCE.VMA is not executed between
      a write to the memory-management tables and subsequent implicit read of 
the
      same address: it is unpredictable whether the old non-leaf PTE or the new
      leaf PTE is used, but the behavior is otherwise well defined.
    In contrast to the Arm architecture, where BBM is mandatory and failing to
    use it in some cases can lead to CPU instability, RISC-V guarantees
    stability, and the behavior remains safe â?? though unpredictable in terms 
of
    which translation will be used.
    
    Additionally, the page table walk logic has been adjusted, as ARM uses the
    opposite level numbering compared to RISC-V.
    
    Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
    Acked-by: Jan Beulich <jbeulich@xxxxxxxx>
---
 xen/arch/riscv/include/asm/page.h |   5 ++
 xen/arch/riscv/p2m.c              | 116 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/xen/arch/riscv/include/asm/page.h 
b/xen/arch/riscv/include/asm/page.h
index b7cd61df8d..b465a90325 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -190,6 +190,11 @@ static inline bool pte_is_mapping(pte_t p)
     return (p.pte & PTE_VALID) && (p.pte & PTE_ACCESS_MASK);
 }
 
+static inline bool pte_is_superpage(pte_t p, unsigned int level)
+{
+    return (level > 0) && pte_is_mapping(p);
+}
+
 static inline int clean_and_invalidate_dcache_va_range(const void *p,
                                                        unsigned long size)
 {
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 9e1a660316..8d572f838f 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -742,7 +742,88 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
     p2m_free_page(p2m, pg);
 }
 
-/* Insert an entry in the p2m */
+static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
+                                unsigned int level, unsigned int target,
+                                const unsigned int *offsets)
+{
+    struct page_info *page;
+    unsigned long i;
+    pte_t pte, *table;
+    bool rv = true;
+
+    /* Convenience aliases */
+    mfn_t mfn = pte_get_mfn(*entry);
+    unsigned int next_level = level - 1;
+    unsigned int level_order = P2M_LEVEL_ORDER(next_level);
+
+    /*
+     * This should only be called with target != level and the entry is
+     * a superpage.
+     */
+    ASSERT(level > target);
+    ASSERT(pte_is_superpage(*entry, level));
+
+    page = p2m_alloc_page(p2m);
+    if ( !page )
+    {
+        /*
+         * The caller is in charge to free the sub-tree.
+         * As we didn't manage to allocate anything, just tell the
+         * caller there is nothing to free by invalidating the PTE.
+         */
+        memset(entry, 0, sizeof(*entry));
+        return false;
+    }
+
+    table = __map_domain_page(page);
+
+    for ( i = 0; i < P2M_PAGETABLE_ENTRIES(p2m, next_level); i++ )
+    {
+        pte_t *new_entry = table + i;
+
+        /*
+         * Use the content of the superpage entry and override
+         * the necessary fields. So the correct attributes are kept.
+         */
+        pte = *entry;
+        pte_set_mfn(&pte, mfn_add(mfn, i << level_order));
+
+        write_pte(new_entry, pte);
+    }
+
+    /*
+     * Shatter superpage in the page to the level we want to make the
+     * changes.
+     * This is done outside the loop to avoid checking the offset
+     * for every entry to know whether the entry should be shattered.
+     */
+    if ( next_level != target )
+        rv = p2m_split_superpage(p2m, table + offsets[next_level],
+                                 next_level, target, offsets);
+
+    if ( p2m->clean_dcache )
+        clean_dcache_va_range(table, PAGE_SIZE);
+
+    /*
+     * TODO: an inefficiency here: the caller almost certainly wants to map
+     *       the same page again, to update the one entry that caused the
+     *       request to shatter the page.
+     */
+    unmap_domain_page(table);
+
+    /*
+     * Even if we failed, we should (according to the current implemetation
+     * of a way how sub-tree is freed if p2m_split_superpage hasn't been
+     * finished fully) install the newly allocated PTE
+     * entry.
+     * The caller will be in charge to free the sub-tree.
+     */
+    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_dcache);
+
+    return rv;
+}
+
+/* Insert an entry in the p2m. */
 static int p2m_set_entry(struct p2m_domain *p2m,
                          gfn_t gfn,
                          unsigned long page_order,
@@ -807,7 +888,38 @@ static int p2m_set_entry(struct p2m_domain *p2m,
      */
     if ( level > target )
     {
-        panic("Shattering isn't implemented\n");
+        /* We need to split the original page. */
+        pte_t split_pte = *entry;
+
+        ASSERT(pte_is_superpage(*entry, level));
+
+        if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
+        {
+            /* Free the allocated sub-tree */
+            p2m_free_subtree(p2m, split_pte, level);
+
+            rc = -ENOMEM;
+            goto out;
+        }
+
+        p2m_write_pte(entry, split_pte, p2m->clean_dcache);
+
+        p2m->need_flush = true;
+
+        /* Then move to the level we want to make real changes */
+        for ( ; level > target; level-- )
+        {
+            rc = p2m_next_level(p2m, true, level, &table, offsets[level]);
+
+            /*
+             * The entry should be found and either be a table
+             * or a superpage if level 0 is not targeted
+             */
+            ASSERT(rc == P2M_TABLE_NORMAL ||
+                   (rc == P2M_TABLE_SUPER_PAGE && target > 0));
+        }
+
+        entry = table + offsets[level];
     }
 
     /*
--
generated by git-patchbot for /home/xen/git/xen.git#staging



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.