[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/3] xen/riscv: introduce setup_mm()


  • To: oleksii.kurochko@xxxxxxxxx
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 7 Nov 2024 10:19:18 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 07 Nov 2024 09:19:31 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.11.2024 13:44, oleksii.kurochko@xxxxxxxxx wrote:
> On Tue, 2024-11-05 at 16:20 +0100, Jan Beulich wrote:
>> On 01.11.2024 14:16, Oleksii Kurochko wrote:> @@ -423,3 +429,138 @@
>> void * __init early_fdt_map(paddr_t fdt_paddr)
>>>  
>>>      return fdt_virt;
>>>  }
>>> +
>>> +vaddr_t __ro_after_init directmap_virt_start =
>>> DIRECTMAP_VIRT_START;
>>> +
>>> +#ifndef CONFIG_RISCV_32
>>> +
>>> +/* Map a frame table to cover physical addresses ps through pe */
>>> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t
>>> pe)
>>> +{
>>> +    paddr_t aligned_ps = ROUNDUP(ps, PAGE_SIZE);
>>> +    paddr_t aligned_pe = ROUNDDOWN(pe, PAGE_SIZE);
>>> +    unsigned long nr_mfns = PFN_DOWN(aligned_pe - aligned_ps);
>>> +    unsigned long frametable_size = nr_mfns *
>>> sizeof(*frame_table);
>>> +    mfn_t base_mfn;
>>> +
>>> +    if ( frametable_size > FRAMETABLE_SIZE )
>>> +        panic("The frametable cannot cover [%#"PRIpaddr",
>>> %#"PRIpaddr")\n",
>>> +              ps, pe);
>>> +
>>> +    /*
>>> +     * align base_mfn and frametable_size to MB(2) to have
>>> superpage mapping
>>> +     * in map_pages_to_xen()
>>> +     */
>>> +    frametable_size = ROUNDUP(frametable_size, MB(2));
>>> +    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT,
>>> PFN_DOWN(MB(2)));
>>> +
>>> +    if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
>>> +                          PFN_DOWN(frametable_size),
>>> +                          PAGE_HYPERVISOR_RW) )
>>> +        panic("frametable mappings failed: %#lx -> %#lx\n",
>>> +              FRAMETABLE_VIRT_START, mfn_x(base_mfn));
>>> +
>>> +    memset(&frame_table[0], -1, frametable_size);
>>> +    memset(&frame_table[PFN_DOWN(aligned_ps)],
>>> +           0, nr_mfns * sizeof(*frame_table));
>>
>> Interesting - now you write out a huge amount of -1s, just to then
>> overwrite
>> most of them with zeroes. I'm not going to insist that you change
>> this yet
>> another time, but the performance hit from this is going to bite
>> you/us as
>> soon as Xen is run on bigger-memory systems.
> I agree that validating or invalidating frames in a single pass would
> be preferable to nearly two passes. I’m considering whether there’s a
> way to ensure that frame_table is set to -1 at compile time.

How would that work, if the entire frame table is allocated dynamically?

> It seems
> the best approach (and only one?) might be to initialize frame_table in
> one pass as follows:
> 1) [0, aligned_ps) = -1
> 2) [aligned_ps, nr_mfns * sizeof(*frame_table)) = 0
> 3) [nr_mfns * sizeof(*frame_table), frame_table_size) = -1
> Does this approach seem optimal?

That's what I would have expected, yes.

>> Plus, unless I'm mistaken, the function continues to rely on ps == 0
>> as
>> input. Just that the dependency is now better hidden. Specifically if
>> you
>> calculate nr_mfns from the difference, and then use that for
>> allocation,
>> then you need to offset the start of the mapping you create
>> accordingly.
> I'm not quite understanding why the method of calculating nr_mfns
> affects how the frame_table is mapped. Isn’t it only necessary to
> calculate the correct size of the frame_table that needs to be
> allocated?

Assume there's 4G of memory actually starting at 16G. Afaict you'll
allocate a frame table for those 4G, but you'll map it right at
FRAMETABLE_VIRT_START. Hence it'll cover the first 4G of physical
addresses, but _none_ of the actual memory you've got.

> I assume by the offset you mean something similar to what was done for
> directmap mapping?

Kind of, yes.

>>  At
>> which point you may need to apply extra care to cover the case where
>> sizeof(*frame_table) is not a power of two, and hence e.g. the first
>> valid
>> page might have a struct instance straddling a page boundary.
> The first valid page is aligned_ps ( which is aligned on a page
> boundary ) so assuming that sizeof(*frame_table) < PAGE_SIZE we can't
> straddle a page boundary, can we?

But sizeof(*frame_table) < PAGE_SIZE means nothing as to the alignment
of an individual struct instance in memory. Iirc sizeof(*frame_table)
is 48 for RISC-V, so the common alignment across struct instances isn't
going to be better than 8, and there _will_ be instances straddling
page boundaries.

>>> +    /*
>>> +     * The base address may not be aligned to the second level
>>> +     * size in case of Sv39 (e.g. 1GB when using 4KB pages).
>>> +     * This would prevent superpage mappings for all the regions
>>> +     * because the virtual address and machine address should
>>> +     * both be suitably aligned.
>>> +     *
>>> +     * Prevent that by offsetting the start of the directmap
>>> virtual
>>> +     * address.
>>> +     */
>>> +    if ( map_pages_to_xen(DIRECTMAP_VIRT_START + (base_addr &
>>> ~high_bits_mask),
>>
>> I'm afraid this is correct only for the first invocation of the
>> function.
>> For any further invocation you'd likely (attempt to) replace
>> previously
>> established mappings. I think that here you need to use
>> directmap_virt_start
>> instead.
> Banks are sorted by bank start address ( common/device-
> tree/bootfdt.c:623 ):
>        /*
>         * On Arm64 setup_directmap_mappings() expects to be called with
>    the lowest
>         * bank in memory first. There is no requirement that the DT will
>    provide
>         * the banks sorted in ascending order. So sort them through.
>         */
>        sort(mem->bank, mem->nr_banks, sizeof(struct membank),
>             cmp_memory_node, swap_memory_node);
> ( btw, comment is needed to be updated ... )
> 
> Thereby no replacement should happen if I don't miss something.

I don't see how banks being sorted matters here. On the 2nd invocation
you'll start mapping pages again from DIRECTMAP_VIRT_START plus an at
most 1G (for SV39) offset. If both banks have 2G, the resulting mappings
will necessarily overlap, if I'm not mistaken.

>>> +    for ( i = 0; i < banks->nr_banks; i++ )
>>> +    {
>>> +        const struct membank *bank = &banks->bank[i];
>>> +        paddr_t bank_start = ROUNDUP(bank->start, PAGE_SIZE);
>>> +        paddr_t bank_end = ROUNDDOWN(bank->start + bank->size,
>>> PAGE_SIZE);
>>> +        unsigned long bank_size = bank_end - bank_start;
>>> +
>>> +        ram_size += bank_size;
>>> +        ram_start = min(ram_start, bank_start);
>>> +        ram_end = max(ram_end, bank_end);
>>> +
>>> +        setup_directmap_mappings(PFN_DOWN(bank_start),
>>> PFN_DOWN(bank_size));
>>> +    }
>>
>> You maintain ram_start in the loop, just to then ...
>>
>>> +    total_pages = PFN_DOWN(ram_size);
>>> +
>>> +    setup_frametable_mappings(0, ram_end);
>>> +    max_page = PFN_DOWN(ram_end);
>>> +}
>>
>> ... not use it at all - why?
> ram_start was needed for the case when setup_frametable_mappings() used
> it as the first argument. Now it isn't true anymore so should be
> dropped.

As per above it may actually be necessary (or at least desirable) to pass
it into there again, if nothing else then to know how much of the initial
part of the (mapped) frame table to invalidate.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.