[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/3] xen/riscv: introduce setup_mm()



On Thu, 2024-11-07 at 10:19 +0100, Jan Beulich wrote:
> On 06.11.2024 13:44, oleksii.kurochko@xxxxxxxxx wrote:
> > On Tue, 2024-11-05 at 16:20 +0100, Jan Beulich wrote:
> > > On 01.11.2024 14:16, Oleksii Kurochko wrote:> @@ -423,3 +429,138
> > > @@
> > > void * __init early_fdt_map(paddr_t fdt_paddr)
> > > >  
> > > >      return fdt_virt;
> > > >  }
> > > > +
> > > > +vaddr_t __ro_after_init directmap_virt_start =
> > > > DIRECTMAP_VIRT_START;
> > > > +
> > > > +#ifndef CONFIG_RISCV_32
> > > > +
> > > > +/* Map a frame table to cover physical addresses ps through pe
> > > > */
> > > > +static void __init setup_frametable_mappings(paddr_t ps,
> > > > paddr_t
> > > > pe)
> > > > +{
> > > > +    paddr_t aligned_ps = ROUNDUP(ps, PAGE_SIZE);
> > > > +    paddr_t aligned_pe = ROUNDDOWN(pe, PAGE_SIZE);
> > > > +    unsigned long nr_mfns = PFN_DOWN(aligned_pe - aligned_ps);
> > > > +    unsigned long frametable_size = nr_mfns *
> > > > sizeof(*frame_table);
> > > > +    mfn_t base_mfn;
> > > > +
> > > > +    if ( frametable_size > FRAMETABLE_SIZE )
> > > > +        panic("The frametable cannot cover [%#"PRIpaddr",
> > > > %#"PRIpaddr")\n",
> > > > +              ps, pe);
> > > > +
> > > > +    /*
> > > > +     * align base_mfn and frametable_size to MB(2) to have
> > > > superpage mapping
> > > > +     * in map_pages_to_xen()
> > > > +     */
> > > > +    frametable_size = ROUNDUP(frametable_size, MB(2));
> > > > +    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT,
> > > > PFN_DOWN(MB(2)));
> > > > +
> > > > +    if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> > > > +                          PFN_DOWN(frametable_size),
> > > > +                          PAGE_HYPERVISOR_RW) )
> > > > +        panic("frametable mappings failed: %#lx -> %#lx\n",
> > > > +              FRAMETABLE_VIRT_START, mfn_x(base_mfn));
> > > > +
> > > > +    memset(&frame_table[0], -1, frametable_size);
> > > > +    memset(&frame_table[PFN_DOWN(aligned_ps)],
> > > > +           0, nr_mfns * sizeof(*frame_table));
> > > 
> > > Interesting - now you write out a huge amount of -1s, just to
> > > then
> > > overwrite
> > > most of them with zeroes. I'm not going to insist that you change
> > > this yet
> > > another time, but the performance hit from this is going to bite
> > > you/us as
> > > soon as Xen is run on bigger-memory systems.
> > I agree that validating or invalidating frames in a single pass
> > would
> > be preferable to nearly two passes. I’m considering whether there’s
> > a
> > way to ensure that frame_table is set to -1 at compile time.
> 
> How would that work, if the entire frame table is allocated
> dynamically?
Yeah, there is no way to work in dynamically allocated way. Then let's
stick to ...
> 
> > It seems
> > the best approach (and only one?) might be to initialize
> > frame_table in
> > one pass as follows:
> > 1) [0, aligned_ps) = -1
> > 2) [aligned_ps, nr_mfns * sizeof(*frame_table)) = 0
> > 3) [nr_mfns * sizeof(*frame_table), frame_table_size) = -1
> > Does this approach seem optimal?
> 
...
> That's what I would have expected, yes.

> 
> > > Plus, unless I'm mistaken, the function continues to rely on ps
> > > == 0
> > > as
> > > input. Just that the dependency is now better hidden.
> > > Specifically if
> > > you
> > > calculate nr_mfns from the difference, and then use that for
> > > allocation,
> > > then you need to offset the start of the mapping you create
> > > accordingly.
> > I'm not quite understanding why the method of calculating nr_mfns
> > affects how the frame_table is mapped. Isn’t it only necessary to
> > calculate the correct size of the frame_table that needs to be
> > allocated?
> 
> Assume there's 4G of memory actually starting at 16G. Afaict you'll
> allocate a frame table for those 4G, but you'll map it right at
> FRAMETABLE_VIRT_START. Hence it'll cover the first 4G of physical
> addresses, but _none_ of the actual memory you've got.
I need to clarify some basics about the frame table.

Does Xen expect that frame_table[0] = 0 (PA), frame_table[1] = 4k (PA),
..., frame_table[x] = RAM_START_PA, frame_table[x+1] = RAM_START_PA +
4k, and so on?

My understanding is that it could be done as follows: frame_table[0] =
RAM_START_PA, frame_table[1] = RAM_START_PA + 4k, and so on, taking
into account mfn_to_page() and page_to_mfn() logic. (And yes, in that
case, the current implementations of mfn_to_page() and page_to_mfn()
aren't correct and should be updated as suggested here:
https://lore.kernel.org/xen-devel/cover.1730465154.git.oleksii.kurochko@xxxxxxxxx/T/#me2fc410f3d4758b71a9974d0be18a36f50d683b1as
as these implementations are based on that ps == 0). With this setup,
mapping FRAMETABLE_VIRT_START to base_mfn should be correct, shouldn’t
it?

With the current implementations of mfn_to_page() and page_to_mfn(), we
either need to allocate a larger frame_table to cover the [0, ram_end)
range (in which case mapping FRAMETABLE_VIRT_START to base_mfn would
work), or change the mapping to frame_table=( FRAMETABLE_VIRT_START +
PFN_DOWN(ram_start) ) -> base_mfn. Or to not loose virtual addrees
space of FRAMETABLE ( so the mapping will still be
FRAMETABLE_VIRT_START -> base_mfn ) to do the similar to directmap
mapping ( or what the changes suggested in the link above). Is my
understanding correct?

> 
> > >  At
> > > which point you may need to apply extra care to cover the case
> > > where
> > > sizeof(*frame_table) is not a power of two, and hence e.g. the
> > > first
> > > valid
> > > page might have a struct instance straddling a page boundary.
> > The first valid page is aligned_ps ( which is aligned on a page
> > boundary ) so assuming that sizeof(*frame_table) < PAGE_SIZE we
> > can't
> > straddle a page boundary, can we?
> 
> But sizeof(*frame_table) < PAGE_SIZE means nothing as to the
> alignment
> of an individual struct instance in memory. Iirc sizeof(*frame_table)
> is 48 for RISC-V, so the common alignment across struct instances
> isn't
> going to be better than 8, and there _will_ be instances straddling
> page boundaries.
If we speak about the alignment of an individual struct instance in
memory, what is the issue with that, except that it could be less
efficient when accessing this memory? This inefficiency could
potentially be addressed by adding some padding to the struct page_info
but then we will have bigger frame table size.
Another issue I can see is that the size of the frame table could be
calculated incorrectly. It may require an additional page to cover the
case when the frame table size isn't aligned to PAGE_SIZE, but this is
accounted for by rounding up the frame table size to 2MB
(frametable_size = ROUNDUP(frametable_size, MB(2));) before allocating
the frame table (base_mfn = alloc_boot_pages(frametable_size >>
PAGE_SHIFT, PFN_DOWN(MB(2)));).

Something else should be considered?

> 
> > > > +    /*
> > > > +     * The base address may not be aligned to the second level
> > > > +     * size in case of Sv39 (e.g. 1GB when using 4KB pages).
> > > > +     * This would prevent superpage mappings for all the
> > > > regions
> > > > +     * because the virtual address and machine address should
> > > > +     * both be suitably aligned.
> > > > +     *
> > > > +     * Prevent that by offsetting the start of the directmap
> > > > virtual
> > > > +     * address.
> > > > +     */
> > > > +    if ( map_pages_to_xen(DIRECTMAP_VIRT_START + (base_addr &
> > > > ~high_bits_mask),
> > > 
> > > I'm afraid this is correct only for the first invocation of the
> > > function.
> > > For any further invocation you'd likely (attempt to) replace
> > > previously
> > > established mappings. I think that here you need to use
> > > directmap_virt_start
> > > instead.
> > Banks are sorted by bank start address ( common/device-
> > tree/bootfdt.c:623 ):
> >        /*
> >         * On Arm64 setup_directmap_mappings() expects to be called
> > with
> >    the lowest
> >         * bank in memory first. There is no requirement that the DT
> > will
> >    provide
> >         * the banks sorted in ascending order. So sort them
> > through.
> >         */
> >        sort(mem->bank, mem->nr_banks, sizeof(struct membank),
> >             cmp_memory_node, swap_memory_node);
> > ( btw, comment is needed to be updated ... )
> > 
> > Thereby no replacement should happen if I don't miss something.
> 
> I don't see how banks being sorted matters here. On the 2nd
> invocation
> you'll start mapping pages again from DIRECTMAP_VIRT_START plus an at
> most 1G (for SV39) offset. If both banks have 2G, the resulting
> mappings
> will necessarily overlap, if I'm not mistaken.
Yes, you are write, overlapping is really possible.

Then __mfn_to_virt(base_mfn) to calculate VA of a bank start should be
used:
   @@ -473,24 +483,24 @@ static void __init
   setup_directmap_mappings(unsigned long base_mfn,
        {
            directmap_mfn_start = _mfn(base_mfn);
    
   -        directmap_virt_start -= (base_addr & high_bits_mask);
   +       /*
   +        * The base address may not be aligned to the second level
   +        * size in case of Sv39 (e.g. 1GB when using 4KB pages).
   +        * This would prevent superpage mappings for all the regions
   +        * because the virtual address and machine address should
   +        * both be suitably aligned.
   +        *
   +        * Prevent that by offsetting the start of the directmap
   virtual
   +        * address.
   +        */
   +        directmap_virt_start -= (base_addr & high_bits_mask) +
   (base_addr & ~high_bits_mask);
        }
    
        if ( base_mfn < mfn_x(directmap_mfn_start) )
            panic("cannot add directmap mapping at %#lx below heap
   start %#lx\n",
                  base_mfn, mfn_x(directmap_mfn_start));
    
   -    /*
   -     * The base address may not be aligned to the second level
   -     * size in case of Sv39 (e.g. 1GB when using 4KB pages).
   -     * This would prevent superpage mappings for all the regions
   -     * because the virtual address and machine address should
   -     * both be suitably aligned.
   -     *
   -     * Prevent that by offsetting the start of the directmap
   virtual
   -     * address.
   -     */
   -    if ( map_pages_to_xen(DIRECTMAP_VIRT_START + (base_addr &
   ~high_bits_mask),
   +    if ( map_pages_to_xen((vaddr_t)mfn_to_virt(base_mfn),
                              _mfn(base_mfn), nr_mfns,
                              PAGE_HYPERVISOR_RW) )

Thanks!

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.