[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 8/9] xen/riscv: page table handling



> > +/*
> > + * The PTE format does not contain the following bits within
> > itself;
> > + * they are created artificially to inform the Xen page table
> > + * handling algorithm. These bits should not be explicitly written
> > + * to the PTE entry.
> > + */
> > +#define PTE_SMALL       BIT(10, UL)
> > +#define PTE_POPULATE    BIT(11, UL)
> > +
> > +#define PTE_XWV_BITS    (PTE_WRITABLE | PTE_EXECUTABLE |
> > PTE_VALID)
> > +#define PTE_XWV_MASK(x) ((x) & PTE_XWV_BITS)
> > +#define PTE_RWX_MASK(x) ((x) & (PTE_READABLE | PTE_WRITABLE |
> > PTE_EXECUTABLE))
> 
> I think I commented on *_MASK macros before: They are conventionally
> constants (see e.g. PAGETABLE_ORDER_MASK that you have further down),
> not operations on an input. It's not really clear to me what the
> "mask" in this name is meant to signify as to what the macros are
> doing. I seem to vaguely recall that you indicated you'd drop all
> such helpers, in favor of using respective constants directly at use
> sites.
Regarding *_MASK I wrote about PTE_{R,W,X}_MASK ( which were dropped
becuase they were used only once ) and by MASK here I mean that only
some bits are going to be taken. For example, PTE_XWV_MASK() means that
only eXecute, Write and Valid bits will be taken. Probably EXTR (
extract ) would be better to use instead of EXTR.

> 
> As a less significant (because of being a matter of personal taste to
> a fair degree) aspect: XWV is a pretty random sequence of characters.
> I for one wouldn't remember what order they need to be used in, and
> hence would always need to look this up.
I used that letter as  they are used by RISC-V spec.

> 
> Taken together, what about
> 
> #define PTE_LEAF_MASK   (PTE_WRITABLE | PTE_EXECUTABLE | PTE_VALID)
> #define PTE_ACCESS_MASK (PTE_READABLE | PTE_WRITABLE |
> PTE_EXECUTABLE)
> 
> ?
Looks good to me. I will use them. Thanks for the naming and
clarification.

> 
> > @@ -68,6 +109,37 @@ static inline bool pte_is_valid(pte_t p)
> >      return p.pte & PTE_VALID;
> >  }
> >  
> > +/*
> > + * From the RISC-V spec:
> > + *    Table 4.5 summarizes the encoding of the permission bits.
> > + *      X W R Meaning
> > + *      0 0 0 Pointer to next level of page table.
> > + *      0 0 1 Read-only page.
> > + *      0 1 0 Reserved for future use.
> > + *      0 1 1 Read-write page.
> > + *      1 0 0 Execute-only page.
> > + *      1 0 1 Read-execute page.
> > + *      1 1 0 Reserved for future use.
> > + *      1 1 1 Read-write-execute page.
> > + */
> > +inline bool pte_is_table(const pte_t p)
> > +{
> > +    return ((p.pte & (PTE_VALID |
> > +                      PTE_READABLE |
> > +                      PTE_WRITABLE |
> > +                      PTE_EXECUTABLE)) == PTE_VALID);
> > +}
> > +
> > +static inline bool pte_is_mapping(const pte_t p)
> > +{
> > +    /* W = 1 || (X=1 && W=1) -> Reserved for future use */
> > +    ASSERT((PTE_RWX_MASK(p.pte) != PTE_WRITABLE) ||
> > +           (PTE_RWX_MASK(p.pte) != (PTE_WRITABLE |
> > PTE_EXECUTABLE)));
> 
> I'm afraid I'm pretty unhappy with comments not matching the
> commented
> code: The comment mentions only set bits, but not clear ones.
I assumed that it would be clear that other bits should be 0 taking
into account the table above but I will update the comment to be more
precise.

>  Further
> you're missing a check of the V bit - with that clear, the other bits
> can be set whichever way. Taken together (and the spec also says it
> this way): If V=1 and W=1 then R also needs to be 1.
My intention was to check in the way how it is mentioned in the table
4.5. For example,
  0 1 0 Reserved for future use.
So I wanted to check that X=R=0 and W=1, I just confused myself with
that ASSERT(p) checks inside !p. I will update ASSERT() properly.

> 
> Also - isn't this check equally relevant in pte_is_table()?
Missed that, it should be in pte_is_table() too.

> 
> > --- a/xen/arch/riscv/include/asm/riscv_encoding.h
> > +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> > @@ -164,6 +164,7 @@
> >  #define SSTATUS_SD                 SSTATUS64_SD
> >  #define SATP_MODE                  SATP64_MODE
> >  #define SATP_MODE_SHIFT                    SATP64_MODE_SHIFT
> > +#define SATP_PPN_MASK                      _UL(0x00000FFFFFFFFFFF)
> 
> Why not SATP64_PPN on the rhs? And why no similar #define in the
> #else
> block that follows, using SATP32_PPN?
> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/pt.c
> > @@ -0,0 +1,423 @@
> > +#include <xen/bug.h>
> > +#include <xen/domain_page.h>
> > +#include <xen/errno.h>
> > +#include <xen/lib.h>
> > +#include <xen/mm.h>
> > +#include <xen/mm-frame.h>
> 
> I don#t think you need this when you already have xen/mm.h.
> 
> > +#include <xen/pfn.h>
> > +#include <xen/pmap.h>
> > +#include <xen/spinlock.h>
> > +
> > +#include <asm/flushtlb.h>
> > +#include <asm/page.h>
> > +
> > +static inline const mfn_t get_root_page(void)
> > +{
> > +    paddr_t root_maddr = (csr_read(CSR_SATP) & SATP_PPN_MASK) <<
> > PAGE_SHIFT;
> 
> Won't this lose bits in RV32 mode? IOW wouldn't you better avoid
> open-
> coding pfn_to_paddr() here?
Considering that PPN for RV32 mode is 22 bits then it will lose bits.
Anyway I agree that it would be better to use pfn_to_paddr().

> 
> > +static int pt_update_entry(mfn_t root, unsigned long virt,
> > +                           mfn_t mfn, unsigned int target,
> > +                           unsigned int flags)
> > +{
> > +    int rc;
> > +    unsigned int level = HYP_PT_ROOT_LEVEL;
> > +    pte_t *table;
> > +    /*
> > +     * The intermediate page table shouldn't be allocated when MFN
> > isn't
> > +     * valid and we are not populating page table.
> > +     * This means we either modify permissions or remove an entry,
> > or
> > +     * inserting brand new entry.
> > +     *
> > +     * See the comment above pt_update() for an additional
> > explanation about
> > +     * combinations of (mfn, flags).
> > +    */
> > +    bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags &
> > PTE_POPULATE);
> > +    pte_t pte, *entry;
> > +
> > +    /* convenience aliases */
> > +    DECLARE_OFFSETS(offsets, virt);
> > +
> > +    table = map_table(root);
> > +    for ( ; level > target; level-- )
> > +    {
> > +        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> > +        if ( rc == XEN_TABLE_MAP_FAILED )
> > +        {
> > +            rc = 0;
> > +
> > +            /*
> > +             * We are here because pt_next_level has failed to map
> > +             * the intermediate page table (e.g the table does not
> > exist
> > +             * and the pt shouldn't be allocated). It is a valid
> > case when
> > +             * removing a mapping as it may not exist in the page
> > table.
> > +             * In this case, just ignore it.
> > +             */
> > +            if ( flags & PTE_VALID )
> > +            {
> > +                printk("%s: Unable to map level %u\n", __func__,
> > level);
> > +                rc = -ENOENT;
> > +            }
> 
> Both comment and error code assume the !populate case. What about the
> case
> where the allocation failed? That's "couldn't be allocated" and would
> better
> return back -ENOMEM (as create_table() correctly returns in that
> case).
The condition should be updated here:
            if ( flags & (PTE_VALID|PTE_POPULATE) )
            {
                ...

> 
> > +static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned
> > long nr,
> > +                            unsigned int flags)
> > +{
> > +    unsigned int level = 0;
> > +    unsigned long mask;
> > +    unsigned int i;
> > +
> > +    /* Use blocking mapping unless the caller requests 4K mapping
> > */
> > +    if ( unlikely(flags & PTE_SMALL) )
> > +        return level;
> 
> Maybe "block mapping" in the comment? "Blocking" typically has quite
> different a meaning. I'm uncertain about that terminology anyway, as
> the spec doesn't use it.
You are right that the spec doesn't define how to call bigger then 4k
mapping so I just re-use terminology from Arm here. Probably it would
be better just to reword:
/* Use a larger mapping than 4K unless the caller specifically requests
a 4K mapping */

> 
> > + * If `mfn` != INVALID_MFN and flags has PTE_VALID bit set then it
> > means that
> > + * inserting will be done.
> > + */
> > +static int pt_update(unsigned long virt,
> > +                     mfn_t mfn,
> > +                     unsigned long nr_mfns,
> > +                     unsigned int flags)
> > +{
> > +    int rc = 0;
> > +    unsigned long vfn = PFN_DOWN(virt);
> > +    unsigned long left = nr_mfns;
> > +
> > +    const mfn_t root = get_root_page();
> > +
> > +    /*
> > +     * It is bad idea to have mapping both writeable and
> > +     * executable.
> > +     * When modifying/creating mapping (i.e PTE_VALID is set),
> > +     * prevent any update if this happen.
> > +     */
> > +    if ( PTE_XWV_MASK(flags) == PTE_XWV_BITS )
> > +    {
> > +        printk("Mappings should not be both Writeable and
> > Executable.\n");
> 
> I'm pretty sure I asked before that you omit full stops from log
> messages.
Yes, you asked and I think that even in this place. Just overlooked
that it was a lot of comments to the previous patch version. Sorry for
that.

> 
> > +int map_pages_to_xen(unsigned long virt,
> > +                     mfn_t mfn,
> > +                     unsigned long nr_mfns,
> > +                     unsigned int flags)
> > +{
> > +    /*
> > +     * Ensure that flags has PTE_VALID bit as map_pages_to_xen()
> > is supposed
> > +     * to create a mapping.
> > +     *
> > +     * Ensure that we have a valid MFN before proceeding.
> > +     *
> > +     * If the MFN is invalid, pt_update() might misinterpret the
> > operation,
> > +     * treating it as either a population, a mapping destruction,
> > +     * or a mapping modification.
> > +     */
> > +    ASSERT(!mfn_eq(mfn, INVALID_MFN) || (flags & PTE_VALID));
> 
> Judging from the comment, do you mean && instead of || ?
Yes, it should be &&.

Thanks.

~ Oleksii





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.