[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v11 03/12] xen/arm: permit non direct-mapped Dom0 construction



Hi Michal,

On Fri, Dec 13, 2024 at 10:46 AM Michal Orzel <michal.orzel@xxxxxxx> wrote:
>
> Hi Carlo, Andrea,
>
> On 12/12/2024 19:22, Andrea Bastoni wrote:
> >
> >
> > On 12/12/2024 18:48, Carlo Nonato wrote:
> >> Hi,
> >>
> >> On Mon, Dec 9, 2024 at 8:17 PM Julien Grall <julien@xxxxxxx> wrote:
> >>>
> >>> Hi Michal,
> >>>
> >>> On 07/12/2024 15:04, Michal Orzel wrote:
> >>>>
> >>>>
> >>>> On 06/12/2024 19:37, Julien Grall wrote:
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Sorry for the late answer.
> >>>>>
> >>>>> On 05/12/2024 09:40, Michal Orzel wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 02/12/2024 17:59, Carlo Nonato wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Cache coloring requires Dom0 not to be direct-mapped because of its 
> >>>>>>> non
> >>>>>>> contiguous mapping nature, so allocate_memory() is needed in this 
> >>>>>>> case.
> >>>>>>> 8d2c3ab18cc1 ("arm/dom0less: put dom0less feature code in a separate 
> >>>>>>> module")
> >>>>>>> moved allocate_memory() in dom0less_build.c. In order to use it
> >>>>>>> in Dom0 construction bring it back to domain_build.c and declare it in
> >>>>>>> domain_build.h.
> >>>>>>>
> >>>>>>> Take the opportunity to adapt the implementation of allocate_memory() 
> >>>>>>> so
> >>>>>>> that it uses the host layout when called on the hwdom, via
> >>>>>>> find_unallocated_memory().
> >>>>>>>
> >>>>>>> Signed-off-by: Carlo Nonato <carlo.nonato@xxxxxxxxxxxxxxx>
> >>>>>>> ---
> >>>>>>> v11:
> >>>>>>> - GUEST_RAM_BANKS instead of hardcoding the number of banks in 
> >>>>>>> allocate_memory()
> >>>>>>> - hwdom_ext_regions -> hwdom_free_mem in allocate_memory()
> >>>>>>> - added a comment in allocate_memory() when skipping small banks
> >>>>>>> v10:
> >>>>>>> - fixed a compilation bug that happened when dom0less support was 
> >>>>>>> disabled
> >>>>>>> v9:
> >>>>>>> - no changes
> >>>>>>> v8:
> >>>>>>> - patch adapted to new changes to allocate_memory()
> >>>>>>> v7:
> >>>>>>> - allocate_memory() now uses the host layout when called on the hwdom
> >>>>>>> v6:
> >>>>>>> - new patch
> >>>>>>> ---
> >>>>>>>    xen/arch/arm/dom0less-build.c           | 44 -----------
> >>>>>>>    xen/arch/arm/domain_build.c             | 97 
> >>>>>>> ++++++++++++++++++++++++-
> >>>>>>>    xen/arch/arm/include/asm/domain_build.h |  1 +
> >>>>>>>    3 files changed, 94 insertions(+), 48 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/xen/arch/arm/dom0less-build.c 
> >>>>>>> b/xen/arch/arm/dom0less-build.c
> >>>>>>> index d93a85434e..67b1503647 100644
> >>>>>>> --- a/xen/arch/arm/dom0less-build.c
> >>>>>>> +++ b/xen/arch/arm/dom0less-build.c
> >>>>>>> @@ -49,50 +49,6 @@ bool __init is_dom0less_mode(void)
> >>>>>>>        return ( !dom0found && domUfound );
> >>>>>>>    }
> >>>>>>>
> >>>>>>> -static void __init allocate_memory(struct domain *d, struct 
> >>>>>>> kernel_info *kinfo)
> >>>>>>> -{
> >>>>>>> -    struct membanks *mem = kernel_info_get_mem(kinfo);
> >>>>>>> -    unsigned int i;
> >>>>>>> -    paddr_t bank_size;
> >>>>>>> -
> >>>>>>> -    printk(XENLOG_INFO "Allocating mappings totalling %ldMB for 
> >>>>>>> %pd:\n",
> >>>>>>> -           /* Don't want format this as PRIpaddr (16 digit hex) */
> >>>>>>> -           (unsigned long)(kinfo->unassigned_mem >> 20), d);
> >>>>>>> -
> >>>>>>> -    mem->nr_banks = 0;
> >>>>>>> -    bank_size = MIN(GUEST_RAM0_SIZE, kinfo->unassigned_mem);
> >>>>>>> -    if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM0_BASE),
> >>>>>>> -                               bank_size) )
> >>>>>>> -        goto fail;
> >>>>>>> -
> >>>>>>> -    bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
> >>>>>>> -    if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
> >>>>>>> -                               bank_size) )
> >>>>>>> -        goto fail;
> >>>>>>> -
> >>>>>>> -    if ( kinfo->unassigned_mem )
> >>>>>>> -        goto fail;
> >>>>>>> -
> >>>>>>> -    for( i = 0; i < mem->nr_banks; i++ )
> >>>>>>> -    {
> >>>>>>> -        printk(XENLOG_INFO "%pd BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" 
> >>>>>>> (%ldMB)\n",
> >>>>>>> -               d,
> >>>>>>> -               i,
> >>>>>>> -               mem->bank[i].start,
> >>>>>>> -               mem->bank[i].start + mem->bank[i].size,
> >>>>>>> -               /* Don't want format this as PRIpaddr (16 digit hex) 
> >>>>>>> */
> >>>>>>> -               (unsigned long)(mem->bank[i].size >> 20));
> >>>>>>> -    }
> >>>>>>> -
> >>>>>>> -    return;
> >>>>>>> -
> >>>>>>> -fail:
> >>>>>>> -    panic("Failed to allocate requested domain memory."
> >>>>>>> -          /* Don't want format this as PRIpaddr (16 digit hex) */
> >>>>>>> -          " %ldKB unallocated. Fix the VMs configurations.\n",
> >>>>>>> -          (unsigned long)kinfo->unassigned_mem >> 10);
> >>>>>>> -}
> >>>>>>> -
> >>>>>>>    #ifdef CONFIG_VGICV2
> >>>>>>>    static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
> >>>>>>>    {
> >>>>>>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> >>>>>>> index 2c30792de8..2b8cba9b2f 100644
> >>>>>>> --- a/xen/arch/arm/domain_build.c
> >>>>>>> +++ b/xen/arch/arm/domain_build.c
> >>>>>>> @@ -416,7 +416,6 @@ static void __init allocate_memory_11(struct 
> >>>>>>> domain *d,
> >>>>>>>        }
> >>>>>>>    }
> >>>>>>>
> >>>>>>> -#ifdef CONFIG_DOM0LESS_BOOT
> >>>>>>>    bool __init allocate_domheap_memory(struct domain *d, paddr_t 
> >>>>>>> tot_size,
> >>>>>>>                                        alloc_domheap_mem_cb cb, void 
> >>>>>>> *extra)
> >>>>>>>    {
> >>>>>>> @@ -508,7 +507,6 @@ bool __init allocate_bank_memory(struct 
> >>>>>>> kernel_info *kinfo, gfn_t sgfn,
> >>>>>>>
> >>>>>>>        return true;
> >>>>>>>    }
> >>>>>>> -#endif
> >>>>>>>
> >>>>>>>    /*
> >>>>>>>     * When PCI passthrough is available we want to keep the
> >>>>>>> @@ -1003,6 +1001,94 @@ out:
> >>>>>>>        return res;
> >>>>>>>    }
> >>>>>>>
> >>>>>>> +void __init allocate_memory(struct domain *d, struct kernel_info 
> >>>>>>> *kinfo)
> >>>>>>> +{
> >>>>>>> +    struct membanks *mem = kernel_info_get_mem(kinfo);
> >>>>>>> +    unsigned int i, nr_banks = GUEST_RAM_BANKS;
> >>>>>>> +    paddr_t bank_start, bank_size;
> >>>>>> Limit the scope
> >>>>>>
> >>>>>>> +    struct membanks *hwdom_free_mem = NULL;
> >>>>>>> +    const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
> >>>>>>> +    const uint64_t banksize[] = GUEST_RAM_BANK_SIZES;
> >>>>>> Limit the scope
> >>>>>>
> >>>>>>> +
> >>>>>>> +    printk(XENLOG_INFO "Allocating mappings totalling %ldMB for 
> >>>>>>> %pd:\n",
> >>>>>>> +           /* Don't want format this as PRIpaddr (16 digit hex) */
> >>>>>>> +           (unsigned long)(kinfo->unassigned_mem >> 20), d);
> >>>>>>> +
> >>>>>>> +    mem->nr_banks = 0;
> >>>>>>> +    /*
> >>>>>>> +     * Use host memory layout for hwdom. Only case for this is when 
> >>>>>>> LLC coloring
> >>>>>>> +     * is enabled.
> >>>>>>> +     */
> >>>>>>> +    if ( is_hardware_domain(d) )
> >>>>>>> +    {
> >>>>>>> +        ASSERT(llc_coloring_enabled);
> >>>>>> This patch does not build because of declaration not being visible. 
> >>>>>> You must include <xen/llc-coloring.h>.
> >>>>>
> >>>>> Piggying back on this comment. AFAICT, the code below would work also in
> >>>>> the non cache coloring case. So what's the assert is for?
> >>>>>
> >>>>>>
> >>>>>>> +
> >>>>>>> +        hwdom_free_mem = xzalloc_flex_struct(struct membanks, bank,
> >>>>>>> +                                             NR_MEM_BANKS);
> >>>>>>> +        if ( !hwdom_free_mem )
> >>>>>>> +            goto fail;
> >>>>>>> +
> >>>>>>> +        hwdom_free_mem->max_banks = NR_MEM_BANKS;
> >>>>>>> +
> >>>>>>> +        if ( find_unallocated_memory(kinfo, hwdom_free_mem) )
> >>>>>> My remarks for the use of find_unallocated_memory() 1:1 have not been 
> >>>>>> addressed. You did not even
> >>>>>> change the comments inside the function. The problem is that the 
> >>>>>> function is specifically designed
> >>>>>> for finding extended regions and assumes being called at certain point 
> >>>>>> i.e. dom0 RAM allocated, gnttab
> >>>>>> region allocated, etc.
> >>>>>
> >>>>> So I agree that the function should be updated if we plan to use it for
> >>>>> other purpose.
> >>>>>
> >>>>> My opinion is that we should attempt to make the function generic so
> >>>>> that in your
> >>>>>> case you can choose which regions to exclude, define even your own 
> >>>>>> function to grab free regions (at the moment
> >>>>>> add_ext_regions grabs banks >= 64M but you still discards banks >= 
> >>>>>> 128M, so it's a bit wasteful.
> >>>>>>
> >>>>>> My very short attempt to make the function as generic as possible in 
> >>>>>> the first iteration:
> >>>>>> https://paste.debian.net/1338334/
> >>>>>
> >>>>> This looks better, but I wonder why we need still need to exclude the
> >>>>> static regions? Wouldn't it be sufficient to exclude just reserved 
> >>>>> regions?
> >>>> Static shared memory banks are not part of reserved memory (i.e. 
> >>>> bootinfo.reserved_mem) if that's what you're asking.
> >>>> They are stored in bootinfo.shmem, hence we need to take them into 
> >>>> account when searching for unused address space.
> >>>
> >>> Oh I missed the fact you now pass "mem_banks" as a parameter. I thought
> >>> they would still get excluded for cache coloring case.
> >>>
> >>>>
> >>>> If you and Carlo are ok with my proposed solution for making the 
> >>>> function generic, I can send a patch as a prerequisite
> >>>> patch for Carlo series.
> >>>
> >>> I am fine with the approach.
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Julien Grall
> >>>
> >>
> >>> @@ -2152,7 +2238,10 @@ static int __init construct_dom0(struct domain *d)
> >>>      /* type must be set before allocate_memory */
> >>>      d->arch.type = kinfo.type;
> >>>  #endif
> >>> -    allocate_memory_11(d, &kinfo);
> >>> +    if ( is_domain_direct_mapped(d) )
> >>> +        allocate_memory_11(d, &kinfo);
> >>> +    else
> >>> +        allocate_memory(d, &kinfo);
> >>>      find_gnttab_region(d, &kinfo);
> >>
> >> Since find_gnttab_region() is called after allocate_memory(), 
> >> kinfo->gnttab_*
> >> fields aren't initialized and the call to find_unallocated_memory() with
> >> gnttab as the region to exclude, fails ending in a crash since memory for
> >> dom0 can't be allocated.
> >>
> >> Can the solution be to call find_gnttab_region() before the above if?
> >
> > The function is called find, but currently it only initializes 
> > kinfo->gnttab_start
> > and kinfo->gnttab_size and we tested that moving it before allocate_memory* 
> > doesn't
> > cause fallouts.
> >
> > If moving before allocate_memory*, would it be better to rename it e.g., 
> > init_gnttab_region()?
> >
> > Thanks,
> > Andrea
> >
> >> Or should I just call it before allocate_memory() in one case, but still 
> >> after
> >> allocate_memory_11() in the other?
> >>
> >> Thanks.
> >
>
> AFAICT there is nothing stopping us from moving find_gnttab_region() before 
> allocate_*. This function initializes
> gnttab region with PA of Xen. In normal case, because Xen is added as 
> bootmodule, it will never be mapped in dom0 memory map
> and the placement does not matter. In LLC case, it will point to relocated 
> address of Xen and it needs to be known before
> calling find_unallocated_memory. Don't rename it, leave as is, just move 
> before allocate_*.
>
> @Carlo:
> My prerequisite patch has been merged, so you're good to respin a series 
> (unless you wait for some feedback in which case do let me know).
> To prevent too many respins, you're going to call find_unallocated_memory for 
> LLC passing resmem and gnttab to be excluded. If you're going
> to reuse add_ext_regions, you need to rename it and fix comments to make it 
> more generic. As for the size, the decision is yours. One solution
> would be to modify add_ext_regions to take min bank size as parameter (64MB 
> for extended regions, X for LLC dom0). In your code, you write that
> the first bank must contain dom0, dtb, ramdisk and you chose 128MB. However, 
> looking at the code, you seem to discard banks < 128 for all the banks,
> not only for the first one. This is the part that I don't have a ready 
> solution. Maybe you could define your own add_free_region function and sort
> the banks, so that you take the largest possible bank first for dom0. This 
> could simplify things.

For the moment I added a __add_ext_regions() helper that also takes a skip_size
parameter. This is called by add_ext_regions() and by a new
add_hwdom_free_regions() callback used in allocate_memory().
I still use 128MB for all the banks. Do you think this is acceptable, maybe
with a FIXME comment cause we should skip only the first bank, or not?

> You can also ask others for opinion.
>
> We are approaching Dec 20th deadline, and I want this series to be in as it's 
> been on the list for too many years. I'm willing to accept a sub-optimal 
> solution
> (so far will be used only for LLC, and LLC as experimental feature will be 
> the only victim of not optimal algorithm) for now, and we can think of a 
> better one
> after the release. But still, even the sub-optimal solution must make sense.
>
> ~Michal
>

Thanks.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.