[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/3] x86/PVH: Support relocatable dom0 kernels


  • To: Jason Andryuk <jason.andryuk@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 20 Mar 2024 15:39:14 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 20 Mar 2024 14:39:26 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19.03.2024 21:58, Jason Andryuk wrote:
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -537,6 +537,97 @@ static paddr_t __init find_memory(
>      return INVALID_PADDR;
>  }
>  
> +static bool __init check_load_address(
> +    const struct domain *d, const struct elf_binary *elf)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base;

As before I think it is illegitimate to cast a pointer to paddr_t. The
variable type wants to remain such, but the cast wants to be to
unsigned long or uintptr_t (or else at least a comment wants adding).

> +    paddr_t kernel_end = kernel_start + elf->dest_size;
> +    unsigned int i;
> +
> +    /* Relies on a sorted memory map. */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;

Nit: Shorter if you use "start" here (also again in the other function).

> +        if ( start >= kernel_end )
> +            return false;
> +
> +        if ( d->arch.e820[i].type == E820_RAM  &&

Nit: Excess blank before &&.

> +             start <= kernel_start &&
> +             end >= kernel_end )
> +            return true;

As to the comment ahead of the function: This further relies on adjacent
entries of the same type having got merged.

> +    }
> +
> +    return false;
> +}
> +
> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
> +static paddr_t __init find_kernel_memory(
> +    const struct domain *d, struct elf_binary *elf,
> +    const struct elf_dom_parms *parms)
> +{
> +    paddr_t kernel_size = elf->dest_size;
> +    unsigned int i;
> +
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> +        paddr_t kstart, kend;
> +
> +        if ( d->arch.e820[i].type != E820_RAM ||
> +             d->arch.e820[i].size < kernel_size )
> +            continue;
> +
> +        kstart = ROUNDUP(start, parms->phys_align);
> +        kstart = max_t(paddr_t, kstart, parms->phys_min);

I'd generally try to avoid max_t(), but I cannot think of any good way
of expressing this without using it.

> +        kend = kstart + kernel_size;
> +
> +        if ( kend > parms->phys_max )

So despite its default phys_max is exclusive aiui. Otherwise with
kend being exclusive too, this would look to be off by one.

> +            return 0;
> +
> +        if ( kend <= end )
> +            return kstart;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Check the kernel load address, and adjust if necessary and possible. */
> +static bool __init check_and_adjust_load_address(
> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms 
> *parms)
> +{
> +    paddr_t reloc_base;
> +
> +    if ( check_load_address(d, elf) )
> +        return true;
> +
> +    if ( !parms->phys_reloc )
> +    {
> +        printk("%pd kernel: Address conflict and not relocatable\n", d);
> +        return false;

This better wouldn't result in -ENOMEM in the caller. Then again reasons
are logged, so the specific error code perhaos doesn't matter that much.

> +    }
> +
> +    reloc_base = find_kernel_memory(d, elf, parms);
> +    if ( reloc_base == 0 )

With ! used above please be consistent and do so here, too.

> +    {
> +        printk("%pd kernel: Failed find a load address\n", d);
> +        return false;
> +    }
> +
> +    if ( opt_dom0_verbose )
> +        printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", 
> %"PRIpaddr"]\n", d,
> +               elf->dest_base, elf->dest_base + elf->dest_size - 1,
> +               reloc_base, reloc_base + elf->dest_size - 1);
> +
> +    parms->phys_entry = reloc_base +
> +                            (parms->phys_entry - (paddr_t)elf->dest_base);
> +    elf->dest_base = (char *)reloc_base;

Just as a remark, no request to change anything: We're not dealing with
strings here. Hence char * isn't quite right, just that "dest_base" is
of that type (for whatever reason).

> @@ -161,6 +164,10 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary 
> *elf,
>      {
>          elf_msg(elf, "ELF: note: %s\n", note_desc[type].name);
>      }
> +    else if ( note_desc[type].type == ELFNOTE_CUSTOM )
> +    {
> +        elf_msg(elf, "ELF: note: %s", note_desc[type].name);
> +    }

This could do with a brief comment, to make clear the \n isn't accidentally
missing here.

> @@ -225,6 +232,28 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary 
> *elf,
>      case XEN_ELFNOTE_PHYS32_ENTRY:
>          parms->phys_entry = val;
>          break;
> +
> +    case XEN_ELFNOTE_PHYS32_RELOC:
> +        parms->phys_reloc = true;
> +
> +        if ( descsz >= 4 )
> +        {
> +            parms->phys_max = elf_note_numeric_array(elf, note, 4, 0);
> +            elf_msg(elf, " = max: %#"PRIx32, parms->phys_max);
> +        }
> +        if ( descsz >= 8 )
> +        {
> +            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
> +            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
> +        }
> +        if ( descsz >= 12 )
> +        {
> +            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
> +            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
> +        }
> +
> +        elf_msg(elf, "\n");

Imo the newline wants adding outside of the switch(), for everything
set to ELFNOTE_CUSTOM. In fact with that I think ELFNOTE_CUSTOM and
ELFNOTE_NAME could be folded (with what's NAME right now simply
printing nothing in the switch here). Which suggests that I may
better not commit the (slightly adjusted) earlier patch, yet.

> @@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
>      parms->p2m_base = UNSET_ADDR;
>      parms->elf_paddr_offset = UNSET_ADDR;
>      parms->phys_entry = UNSET_ADDR32;
> +    parms->phys_min = 0;
> +    parms->phys_max = 0xffffffff;
> +    parms->phys_align = 0x200000;

I think this wants to be MB(2) (requiring a pre-patch to centralize MB()
in the tool stack to tools/include/xen-tools/common-macros.h). And I
further think this needs to be an arch-specific constant, even if right
now the note is expected to be present only for x86. Which then also
needs saying ...

> --- a/xen/include/public/elfnote.h
> +++ b/xen/include/public/elfnote.h
> @@ -194,10 +194,26 @@
>   */
>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>  
> +/*
> + * Physical loading constraints for PVH kernels
> + *
> + * Used to place constraints on the guest physical loading addresses and
> + * alignment for a PVH kernel.
> + *
> + * The presence of this note indicates the kernel supports relocating itself.
> + *
> + * The note may include up to three 32bit values in the following order:
> + *  - a maximum address for the entire image to be loaded below (default
> + *      0xffffffff)
> + *  - a minimum address for the start of the image (default 0)
> + *  - a required start alignment (default 0x200000)
> + */
> +#define XEN_ELFNOTE_PHYS32_RELOC 19

... in the comment here. The reasoning behind not (intermediately) falling
back to the alignment in the ELF headers also wants adding to the patch
description (or a code comment, albeit there's no really good place to put
it, imo).

Additionally I think the pieces of the comment want re-ordering. The primary
purpose is indicating relocatability; when not relocating, the constraints
aren't applied in any way.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.