[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] create multiple banks for dom0 in 1:1 mapping if necessary



On Fri, 2014-01-10 at 04:12 +0000, Karim Raslan wrote:
> Create multiple banks to hold dom0 memory in case of 1:1 mapping
> if we failed to find 1 large contiguous memory to hold the whole
> thing.

Thanks. While with my ARM maintainer hat on I'd love for this to go in
for 4.4 with my acting release manager hat on I think if I have to be
honest this is too big a change for 4.4 at this stage, which is a
pity :-(


> 
> Signed-off-by: Karim Raslan <karim.allah.ahmed@xxxxxxxxx>
> ---
>  xen/arch/arm/domain_build.c |   74 ++++++++++++++++++++++++++-----------
>  xen/arch/arm/kernel.c       |   86 
> ++++++++++++++++++++++++++++++++++---------
>  2 files changed, 121 insertions(+), 39 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index faff88e..bb44cdd 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -69,39 +69,71 @@ static void allocate_memory_11(struct domain *d, struct 
> kernel_info *kinfo)
>  {
>      paddr_t start;
>      paddr_t size;
> +    unsigned int cur_order, cur_bank, nr_banks = 1, index = 0;
>      struct page_info *pg = NULL;
> -    unsigned int order = get_order_from_bytes(dom0_mem);
> +    unsigned int order = get_order_from_bytes(kinfo->unassigned_mem);
>      int res;
>      paddr_t spfn;
>      unsigned int bits;
>  
> -    for ( bits = PAGE_SHIFT + 1; bits < PADDR_BITS; bits++ )
> +#define MIN_BANK_ORDER 10

2^10 is < PAGE_SIZE (PAGE_SHIFT is 12). 12 is the lowest allocation size
which can be made, but I think in practice the lowest useful bank size
is going to be somewhat larger than that.

NR_MEM_BANKS is 8 so we also need to consider that.

A 64M dom0 would mean 8M per bank, which seems like a reasonable minimum
bank size. That would be a MIN_BANK_ORDER of 23. Please include a
comment explaining where this number comes from.

The other way to look at this would be to calculate it dynamically as
get_order_from_bytes(dom0_mem / NR_MEM_BANKS).

> +
> +    kinfo->mem.nr_banks = 0;
> +
> +    /*
> +     * We always first try to allocate all dom0 memory in 1 bank.
> +     * However, if we failed to allocate physically contiguous memory
> +     * from the allocator, then we try to create more than one bank.
> +     */
> +    for ( cur_order = order; cur_order > MIN_BANK_ORDER; cur_order--)

I think this can be just 
        for( order = get_order_from_bytes(...) ; order > MIN_BANK_ORDER ; 
order-- )
(maybe order >= ?) or
        while (order > MIN_BANK_ORDER )
        {
                ...
                order--;
        }
I think the first works better.

This does away with the need for cur_order vs order. I think order is
mostly unused after this patch, also not renaming cur_order would
hopefully reduce the diff and therefore the "scariness" of the patch wrt
4.4 (although that may not be sufficient).

>      {
> -        pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> -        if ( pg != NULL )
> -            break;
> +        for( cur_bank = 1; cur_bank <= nr_banks; cur_bank++ )

Is cur_bank redundant with index? Also kinfo->mem.nr_banks tells you how
many banks are filled in.

> +        {
> +            for ( bits = PAGE_SHIFT + 1; bits < PADDR_BITS; bits++ )
> +                     {

There something a bit odd going on with the whitespace here and in the
rest of this loop. Perhaps some hard tabs snuck in?

> +                             pg = alloc_domheap_pages(d, cur_order, 
> MEMF_bits(bits));
> +                             if ( pg != NULL )
> +                                     break;
> +                     }
> +
> +                     if ( !pg )
> +                             break;
> +
> +                     spfn = page_to_mfn(pg);
> +                     start = pfn_to_paddr(spfn);
> +                     size = pfn_to_paddr((1 << cur_order));
> +
> +                 kinfo->mem.bank[index].start = start;
> +                 kinfo->mem.bank[index].size = size;
> +                 index++;
> +                 kinfo->mem.nr_banks++;
> +     }
> +
> +     if( pg )
> +             break;
> +
> +     nr_banks = (nr_banks - cur_bank + 1) << 1;

<<1 ? 

Is this not just kinfo->mem.nr_banks?

The basic structure here is:

for ( cur_order = order; cur_order > MIN_BANK_ORDER; cur_order--)
        for( cur_bank = 1; cur_bank <= nr_banks; cur_bank++ )
                for ( bits = PAGE_SHIFT + 1; bits < PADDR_BITS; bits++ )
                
Shouldn't the iteration over bank be the outer one?

The banks might be different sizes, right?

Also with either approach then depending on where memory is available
this may result in the memory not being allocated in and/or that they
are not in increasing order (in fact, because Xen prefer to allocate
higher memory first it seems likely that it will be in reverse order).

I don't know if either of those things matter. What does ePAPR have to
say on the matter?

I'd expect that the ordering might matter from the point of view of
putting the kernel in the first bank -- since that may no longer be the
lowest address.

You don't seem to reference kinfo->unassigned_mem anywhere after the
initial order calculation -- I think you need to subtract memory from it
on each iteration, or else I'm not sure you will actually get the right
amount allocated in all cases.

>      }
>  
> -    if ( !pg )
> -        panic("Failed to allocate contiguous memory for dom0");
> +     if ( !pg )
> +             panic("Failed to allocate contiguous memory for dom0");
>  
> -    spfn = page_to_mfn(pg);
> -    start = pfn_to_paddr(spfn);
> -    size = pfn_to_paddr((1 << order));
> +     for ( index = 0; index < kinfo->mem.nr_banks; index++ )
> +     {
> +         start = kinfo->mem.bank[index].start;
> +         size = kinfo->mem.bank[index].size;
> +         spfn = paddr_to_pfn(start);
> +         order = get_order_from_bytes(size);
>  
> -    // 1:1 mapping
> -    printk("Populate P2M %#"PRIx64"->%#"PRIx64" (1:1 mapping for dom0)\n",
> -           start, start + size);
> -    res = guest_physmap_add_page(d, spfn, spfn, order);
> -
> -    if ( res )
> -        panic("Unable to add pages in DOM0: %d", res);
> +         printk("Populate P2M %#"PRIx64"->%#"PRIx64" (1:1 mapping for dom0 - 
> order : %i)\n",
> +                 start, start + size, order);
> +         res = guest_physmap_add_page(d, spfn, spfn, order);

Can this not be done as it is allocated rather than in a second pass?

>  
> -    kinfo->mem.bank[0].start = start;
> -    kinfo->mem.bank[0].size = size;
> -    kinfo->mem.nr_banks = 1;
> +         if ( res )
> +             panic("Unable to add pages in DOM0: %d", res);
>  
> -    kinfo->unassigned_mem -= size;
> +         kinfo->unassigned_mem -= size;
> +     }
>  }
>  
>  static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index 6a5772b..658c3de 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -79,15 +79,43 @@ static void place_modules(struct kernel_info *info,
>      const paddr_t total = initrd_len + dtb_len;
>  
>      /* Convenient */

If you are going to remove all of the following convenience variables
then this comment is no longer correct.

(Convenient here means a shorter local name for something complex)

> -    const paddr_t mem_start = info->mem.bank[0].start;
> -    const paddr_t mem_size = info->mem.bank[0].size;
> -    const paddr_t mem_end = mem_start + mem_size;
> -    const paddr_t kernel_size = kernel_end - kernel_start;
> +    unsigned int i, min_i = -1;
> +    bool_t same_bank = false;
> +
> +    paddr_t mem_start, mem_end, mem_size;
> +    paddr_t kernel_size;
>  
>      paddr_t addr;
>  
> -    if ( total + kernel_size > mem_size )
> -        panic("Not enough memory in the first bank for the dtb+initrd");
> +    kernel_size = kernel_end - kernel_start;
> +
> +    for ( i = 0; i < info->mem.nr_banks; i++ )
> +    {
> +        mem_start = info->mem.bank[i].start;
> +        mem_size = info->mem.bank[i].size;
> +        mem_end = mem_start + mem_size - 1;
> +
> +        if ( (kernel_end > mem_start) && (kernel_end <= mem_end) )
> +            same_bank = true;
> +        else
> +            same_bank = false;
> +
> +        if ( same_bank && ((total + kernel_size) < mem_size) )
> +            min_i = i;
> +        else if ( (!same_bank) && (total < mem_size) )
> +            min_i = i;
> +        else
> +            continue;

What is all this doing?

> +
> +        break;
> +    }
> +
> +    if ( min_i == -1 )
> +        panic("Not enough memory for the dtb+initrd");
> +
> +    mem_start = info->mem.bank[min_i].start;
> +    mem_size = info->mem.bank[min_i].size;
> +    mem_end = mem_start + mem_size;
>  
>      /*
>       * DTB must be loaded such that it does not conflict with the
> @@ -104,17 +132,25 @@ static void place_modules(struct kernel_info *info,
>       * just after the kernel, if there is room, otherwise just before.
>       */
>  
> -    if ( kernel_end < MIN(mem_start + MB(128), mem_end - total) )
> -        addr = MIN(mem_start + MB(128), mem_end - total);
> -    else if ( mem_end - ROUNDUP(kernel_end, MB(2)) >= total )
> -        addr = ROUNDUP(kernel_end, MB(2));
> -    else if ( kernel_start - mem_start >= total )
> -        addr = kernel_start - total;
> -    else
> +    if ( same_bank )
>      {
> -        panic("Unable to find suitable location for dtb+initrd");
> -        return;
> -    }
> +        if ( kernel_end < MIN(mem_start + MB(128), mem_end - total) )
> +            addr = MIN(mem_start + MB(128), mem_end - total);
> +        if ( mem_end - ROUNDUP(kernel_end, MB(2)) >= total )
> +            addr = ROUNDUP(kernel_end, MB(2));
> +        else if ( kernel_start - mem_start >= total )
> +            addr = kernel_start - total;
> +        else
> +        {
> +            /*
> +             * We should never hit this condition because we've already
> +             * done the check while choosing the bank.
> +             */
> +            panic("Unable to find suitable location for dtb+initrd");
> +            return;
> +        }
> +    } else
> +        addr = ROUNDUP(mem_end - total, MB(2));
>  
>      info->dtb_paddr = addr;
>      info->initrd_paddr = info->dtb_paddr + dtb_len;
> @@ -264,10 +300,24 @@ static int kernel_try_zimage32_prepare(struct 
> kernel_info *info,
>       */
>      if (start == 0)
>      {
> +        unsigned int i, min_i = 0, min_start = -1;
>          paddr_t load_end;
>  
> -        load_end = info->mem.bank[0].start + info->mem.bank[0].size;
> -        load_end = MIN(info->mem.bank[0].start + MB(128), load_end);
> +        /*
> +         * Load kernel at the lowest possible bank
> +         * ( check commit 6c21cb36e263de2db8716b477157a5b6cd531e1e for 
> reason behind that )

That commit says nothing about loading in the lowest possible bank,
though. If there is some relevant factor which is worth commenting on
please do so directly.

Actually now that the kernel is fixed upstream we don't need the
behaviour of that commit at all. Although there are still restrictions
based on load address vs start of RAM (See booting.txt in the kernel
docs)

Ian.

> +         */
> +        for ( i = 0; i < info->mem.nr_banks; i++ )
> +        {
> +            if( (unsigned int)info->mem.bank[i].start < min_start )
> +            {
> +                min_start = info->mem.bank[i].start;
> +                min_i = i;
> +            }
> +        }
> +
> +        load_end = info->mem.bank[min_i].start + info->mem.bank[min_i].size;
> +        load_end = MIN(info->mem.bank[min_i].start + MB(128), load_end);
>  
>          info->zimage.load_addr = load_end - end;
>          /* Align to 2MB */




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.