[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 6/7] xen/riscv: page table handling
On Tue, 2024-08-27 at 17:00 +0200, Jan Beulich wrote: > On 21.08.2024 18:06, Oleksii Kurochko wrote: > > Implement map_pages_to_xen() which requires several > > functions to manage page tables and entries: > > - pt_update() > > - pt_mapping_level() > > - pt_update_entry() > > - pt_next_level() > > - pt_check_entry() > > > > To support these operations, add functions for creating, > > mapping, and unmapping Xen tables: > > - create_table() > > - map_table() > > - unmap_table() > > > > Introduce internal macros starting with PTE_* for convenience. > > These macros closely resemble PTE bits, with the exception of > > PTE_SMALL, which indicates that 4KB is needed. > > What macros are you talking about here? Is this partially stale, as > only PTE_SMALL and PTE_POPULATE (and a couple of masks) are being > added? I am speaking about macros connected to masks: #define PTE_R_MASK(x) ((x) & PTE_READABLE) #define PTE_W_MASK(x) ((x) & PTE_WRITABLE) #define PTE_X_MASK(x) ((x) & PTE_EXECUTABLE) #define PTE_RWX_MASK(x) ((x) & (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE)) > > > --- a/xen/arch/riscv/include/asm/flushtlb.h > > +++ b/xen/arch/riscv/include/asm/flushtlb.h > > @@ -5,12 +5,24 @@ > > #include <xen/bug.h> > > #include <xen/cpumask.h> > > > > +#include <asm/sbi.h> > > + > > /* Flush TLB of local processor for address va. */ > > static inline void flush_tlb_one_local(vaddr_t va) > > { > > asm volatile ( "sfence.vma %0" :: "r" (va) : "memory" ); > > } > > > > +/* > > + * Flush a range of VA's hypervisor mappings from the TLB of all > > + * processors in the inner-shareable domain. > > + */ > > Isn't inner-sharable an Arm term? Don't you simply mean "all" here? Yes, this is Arm term. It should used "all" instead. Thanks. > > > @@ -68,6 +111,20 @@ static inline bool pte_is_valid(pte_t p) > > return p.pte & PTE_VALID; > > } > > > > +inline bool pte_is_table(const pte_t p) > > +{ > > + return ((p.pte & (PTE_VALID | > > + PTE_READABLE | > > + PTE_WRITABLE | > > + PTE_EXECUTABLE)) == PTE_VALID); > > +} > > In how far is the READABLE check valid here? You (imo correctly) ... > > > +static inline bool pte_is_mapping(const pte_t p) > > +{ > > + return (p.pte & PTE_VALID) && > > + (p.pte & (PTE_WRITABLE | PTE_EXECUTABLE)); > > +} > > ... don't consider this bit here. pte_is_mapping() seems to me is correct as according to the RISC-V privileged spec: 4. Otherwise, the PTE is valid. If pte.r = 1 or pte.x = 1, go to step 5. Otherwise, this PTE is a pointer to the next level of the page table. 5. A leaf PTE has been found. ... and regarding pte_is_table() READABLE check is valid as we have to check only that pte.r = pte.x = 0. WRITABLE check should be dropped. Or just use define pte_is_table() as: inline bool pte_is_table(const pte_t p) { return !pte_is_mapping(p); } > > > --- a/xen/arch/riscv/include/asm/riscv_encoding.h > > +++ b/xen/arch/riscv/include/asm/riscv_encoding.h > > @@ -164,6 +164,7 @@ > > #define SSTATUS_SD SSTATUS64_SD > > #define SATP_MODE SATP64_MODE > > #define SATP_MODE_SHIFT SATP64_MODE_SHIFT > > +#define SATP_PPN_MASK _UL(0x00000FFFFFFFFFFF) > > > > #define HGATP_PPN HGATP64_PPN > > #define HGATP_VMID_SHIFT HGATP64_VMID_SHIFT > > This looks odd, padding-wise, but that's because hard tabs are being > used here. Is that intentional? I use tabs here because riscv_encoding.h was copied from Linux kernel which uses hard tabs and definitions above use 3 tabs so I used 3 hard tabs too. > > > --- /dev/null > > +++ b/xen/arch/riscv/pt.c > > @@ -0,0 +1,420 @@ > > +#include <xen/bug.h> > > +#include <xen/domain_page.h> > > +#include <xen/errno.h> > > +#include <xen/mm.h> > > +#include <xen/mm-frame.h> > > +#include <xen/pmap.h> > > +#include <xen/spinlock.h> > > + > > +#include <asm/flushtlb.h> > > +#include <asm/page.h> > > + > > +static inline const mfn_t get_root_page(void) > > +{ > > + paddr_t root_maddr = (csr_read(CSR_SATP) & SATP_PPN_MASK) << > > PAGE_SHIFT; > > + > > + return maddr_to_mfn(root_maddr); > > +} > > + > > +/* Sanity check of the entry. */ > > +static bool pt_check_entry(pte_t entry, mfn_t mfn, unsigned int > > flags) > > +{ > > + /* > > + * See the comment about the possible combination of (mfn, > > flags) in > > + * the comment above pt_update(). > > + */ > > + > > + /* Sanity check when modifying an entry. */ > > + if ( (flags & PTE_VALID) && mfn_eq(mfn, INVALID_MFN) ) > > + { > > + /* We don't allow modifying an invalid entry. */ > > + if ( !pte_is_valid(entry) ) > > + { > > + printk("Modifying invalid entry is not allowed.\n"); > > Perhaps all of these printk()s should be dprintk()? It could be dprintk() but at the same time I don't see any issue if it will be printed once. > And not have a full > stop? By "full stop," do you mean something like panic() or BUG_ON()? The error is propagated up to the caller, which then calls panic(). Anexample of this is: if ( (offset + size) > MB(2) ) { rc = map_pages_to_xen(BOOT_FDT_VIRT_START + MB(2), maddr_to_mfn(base_paddr + MB(2)), MB(2) >> PAGE_SHIFT, PAGE_HYPERVISOR_RO); if ( rc ) panic("Unable to map the device-tree\n"); } If it would be better for some reason to call panic() or BUG_ON() as soon as pt_check_entry() returns false, I can do it that way as well. > > > + return false; > > + } > > + > > + /* We don't allow modifying a table entry */ > > + if ( pte_is_table(entry) ) > > + { > > + printk("Modifying a table entry is not allowed.\n"); > > + return false; > > + } > > + } > > + /* Sanity check when inserting a mapping */ > > + else if ( flags & PTE_VALID ) > > + { > > + /* We should be here with a valid MFN. */ > > + ASSERT(!mfn_eq(mfn, INVALID_MFN)); > > This is odd to have here, considering the if() further up. Agree, ASSERT() could be drop. > > > + /* > > + * We don't allow replacing any valid entry. > > + * > > + * Note that the function pt_update() relies on this > > + * assumption and will skip the TLB flush (when Svvptc > > + * extension will be ratified). The function will need > > + * to be updated if the check is relaxed. > > + */ > > + if ( pte_is_valid(entry) ) > > + { > > + if ( pte_is_mapping(entry) ) > > + printk("Changing MFN for a valid entry is not > > allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n", > > + mfn_x(mfn_from_pte(entry)), mfn_x(mfn)); > > + else > > + printk("Trying to replace a table with a > > mapping.\n"); > > + return false; > > + } > > + } > > + /* Sanity check when removing a mapping. */ > > + else if ( (flags & (PTE_VALID | PTE_POPULATE)) == 0 ) > > The PTE_VALID part of the check is pointless considering the earlier > if(). I guess you may want to have it for doc purposes ... Yes, it just helps to read the code and understand "confusing" if's() above. > > Since further up you're using "else if ( flags & PTE_VALID )" imo > here you want to use "else if ( !(flags & ...) )". > > > + { > > + /* We should be here with an invalid MFN. */ > > + ASSERT(mfn_eq(mfn, INVALID_MFN)); > > + > > + /* We don't allow removing a table */ > > + if ( pte_is_table(entry) ) > > + { > > + printk("Removing a table is not allowed.\n"); > > + return false; > > + } > > Is this restriction temporary? Yes. > > > + } > > + /* Sanity check when populating the page-table. No check so > > far. */ > > + else > > + { > > + ASSERT(flags & PTE_POPULATE); > > This again is redundant with earlier if() conditions. > > > +#define XEN_TABLE_MAP_FAILED 0 > > +#define XEN_TABLE_SUPER_PAGE 1 > > +#define XEN_TABLE_NORMAL 2 > > + > > +/* > > + * Take the currently mapped table, find the corresponding entry, > > + * and map the next table, if available. > > + * > > + * The alloc_tbl parameters indicates whether intermediate tables > > should > > + * be allocated when not present. > > + * > > + * Return values: > > + * XEN_TABLE_MAP_FAILED: Either alloc_only was set and the entry > > + * was empty, or allocating a new page failed. > > + * XEN_TABLE_NORMAL: next level or leaf mapped normally > > + * XEN_TABLE_SUPER_PAGE: The next entry points to a superpage. > > + */ > > +static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned > > int offset) > > Having the boolean first is unusual, but well - it's your choice. > > > +{ > > + pte_t *entry; > > + int ret; > > + mfn_t mfn; > > + > > + entry = *table + offset; > > + > > + if ( !pte_is_valid(*entry) ) > > + { > > + if ( alloc_tbl ) > > + return XEN_TABLE_MAP_FAILED; > > Is this condition meant to be inverted? if alloc_tbl = true we shouldn't allocatetable as: * The intermediate page table shouldn't be allocated when MFN isn't * valid and we are not populating page table. ... */ bool alloc_tbl = mfn_eq(mfn, INVALID_MFN) && !(flags & PTE_POPULATE); So if mfn = INVALID_MFN and flags.PTE_POPULATE=0 it means that this table shouldn't be allocated and thereby pt_next_level() should return XEN_TABLE_MAP_FAILED. Or to invert if ( alloc_tbl )it will be needed to invert defintion of alloc_tbl: bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE); > > > + ret = create_table(entry); > > + if ( ret ) > > + return XEN_TABLE_MAP_FAILED; > > You don't really use "ret". Why not omit the local variable, even > more so that it has too wide scope? I'll omit that, it is really useless. > > > +/* Update an entry at the level @target. */ > > +static int pt_update_entry(mfn_t root, unsigned long virt, > > + mfn_t mfn, unsigned int target, > > + unsigned int flags) > > +{ > > + int rc; > > + unsigned int level = HYP_PT_ROOT_LEVEL; > > + pte_t *table; > > + /* > > + * The intermediate page table shouldn't be allocated when MFN > > isn't > > + * valid and we are not populating page table. > > + * This means we either modify permissions or remove an entry, > > or > > + * inserting brand new entry. > > + * > > + * See the comment above pt_update() for an additional > > explanation about > > + * combinations of (mfn, flags). > > + */ > > + bool alloc_tbl = mfn_eq(mfn, INVALID_MFN) && !(flags & > > PTE_POPULATE); > > Is this meant to be inverted, too (to actually match variable name > and > comment)? Oh, you mentioned that too. I wrote the similar above. I think it would be better to invert if we want to use alloc_tbl variable name. > > > + break; > > + } > > + > > + if ( level != target ) > > + { > > + printk("%s: Shattering superpage is not supported\n", > > __func__); > > + rc = -EOPNOTSUPP; > > + goto out; > > + } > > + > > + entry = table + offsets[level]; > > + > > + rc = -EINVAL; > > + if ( !pt_check_entry(*entry, mfn, flags) ) > > + goto out; > > + > > + /* We are removing the page */ > > + if ( !(flags & PTE_VALID) ) > > + memset(&pte, 0x00, sizeof(pte)); > > + else > > + { > > + /* We are inserting a mapping => Create new pte. */ > > + if ( !mfn_eq(mfn, INVALID_MFN) ) > > + pte = pte_from_mfn(mfn, PTE_VALID); > > + else /* We are updating the permission => Copy the current > > pte. */ > > + pte = *entry; > > + > > + /* update permission according to the flags */ > > + pte.pte |= PTE_RWX_MASK(flags) | PTE_ACCESSED | PTE_DIRTY; > > When updating an entry, don't you also need to clear (some of) the > flags? I am not sure why some flags should be cleared. Here we are taking only necessary for pte flags such as R, W, X or other bits in flags are ignored. > > > +/* Return the level where mapping should be done */ > > +static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned > > long nr, > > + unsigned int flags) > > +{ > > + unsigned int level = 0; > > + unsigned long mask; > > + unsigned int i; > > + > > + /* Use blocking mapping unless the caller requests 4K mapping > > */ > > + if ( unlikely(flags & PTE_SMALL) ) > > + return level; > > + > > + /* > > + * Don't take into account the MFN when removing mapping (i.e > > + * MFN_INVALID) to calculate the correct target order. > > + * > > + * `vfn` and `mfn` must be both superpage aligned. > > + * They are or-ed together and then checked against the size > > of > > + * each level. > > + * > > + * `left` is not included and checked separately to allow > > + * superpage mapping even if it is not properly aligned (the > > + * user may have asked to map 2MB + 4k). > > What is this about? There's nothing named "left" here. It refer to "remaining" pages or "leftover" space after trying to align a mapping to a superpage boundary. > > > + */ > > + mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0; > > + mask |= vfn; > > + > > + for ( i = HYP_PT_ROOT_LEVEL; i != 0; i-- ) > > + { > > + if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) && > > + (nr >= BIT(XEN_PT_LEVEL_ORDER(i), UL)) ) > > + { > > + level = i; > > + break; > > + } > > + } > > + > > + return level; > > +} > > + > > +static DEFINE_SPINLOCK(xen_pt_lock); > > Another largely meaningless xen_ prefix? Thanks. I'll drop it. > > > +/* > > + * If `mfn` equals `INVALID_MFN`, it indicates that the following > > page table > > + * update operation might be related to either populating the > > table ( > > + * PTE_POPULATE will be set additionaly), destroying a mapping, or > > modifying > > + * an existing mapping. > > And the latter two are distinguished by? PTE_VALID? inserting -> (PTE_VALID=1 + (mfn=something valid)) destroying-> ( PTE_VALID=0 ) > > > + * If `mfn` is valid and flags has PTE_VALID bit set then it means > > that > > + * inserting will be done. > > + */ > > What about mfn != INVALID_MFN and PTE_VALID clear? PTE_VALID=0 will be always considered as destroying and no matter what is mfn value as in this case the removing is done in the way where mfn isn't used: memset(&pte, 0x00, sizeof(pte)); > Also note that "`mfn` is > valid" isn't the same as "mfn != INVALID_MFN". You want to be precise > here, > to avoid confusion later on. (I say that knowing that we're still > fighting > especially shadow paging code on x86 not having those properly > separated.) If it is needed to be precise and mfn is valid isn't the same as "mfn != INVALID_MFN" only for the case of shadow paging? > > > > > + unsigned long left = nr_mfns; > > + > > + const mfn_t root = get_root_page(); > > + > > + /* > > + * It is bad idea to have mapping both writeable and > > + * executable. > > + * When modifying/creating mapping (i.e PTE_VALID is set), > > + * prevent any update if this happen. > > + */ > > + if ( (flags & PTE_VALID) && PTE_W_MASK(flags) && > > PTE_X_MASK(flags) ) > > Seeing them in use, I wonder about the naming of those PTE_?_MASK() > macros. Along with the lhs, why not simply (flags & PTE_...)? Hmm, good point. They can be really dropped with simplification of the mentioned if(...). Thanks. ~ Oleksii
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |