|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 15/16] x86: make Xen early boot code relocatable
>>> On 20.08.16 at 00:43, <daniel.kiper@xxxxxxxxxx> wrote:
> Every multiboot protocol (regardless of version) compatible image must
> specify its load address (in ELF or multiboot header). Multiboot protocol
> compatible loader have to load image at specified address. However, there
> is no guarantee that the requested memory region (in case of Xen it starts
> at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM
> and it is free (legacy BIOS platforms are merciful for Xen but I found at
> least one EFI platform on which Xen load address conflicts with EFI boot
> services; it is Dell PowerEdge R820 with latest firmware). To cope with that
> problem we must make Xen early boot code relocatable and help boot loader to
> relocate image in proper way by suggesting, not requesting specific load
> addresses as it is right now, allowed address ranges. This patch does former.
> It does not add multiboot2 protocol interface which is done in "x86: add
> multiboot2 protocol support for relocatable images" patch.
>
> This patch changes following things:
> - default load address is changed from 1 MiB to 2 MiB; I did that because
> initial page tables are using 2 MiB huge pages and this way required
> updates for them are quite easy; it means that e.g. we avoid spacial
> cases for start and end of required memory region if it live at address
> not aligned to 2 MiB,
Should this be a separate change then, to separate possible
regressions due to that from such due to any other of the changes
here? And then I don't see why, when making the image relocatable
anyway, the link time load address actually still matters.
> - %esi and %r15d registers are used as a storage for Xen image load base
> address (%r15d shortly because %rsi is used for EFI SystemTable address
> in 64-bit code); both registers are (%esi is mostly) unused in early
> boot code and preserved during C functions calls,
In a context where you (also) talk about 64-bit code, %esi can't
be called preserved unilaterally. You should make clear that this is
for 32-bit function calls.
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -12,13 +12,16 @@
> .text
> .code32
>
> -#define sym_phys(sym) ((sym) - __XEN_VIRT_START)
> +#define sym_offset(sym) ((sym) - __XEN_VIRT_START)
In a patch already large and hence hard to review, did you really
need to do this rename?
> @@ -126,26 +130,26 @@ vga_text_buffer:
> .section .init.text, "ax", @progbits
>
> bad_cpu:
> - mov $(sym_phys(.Lbad_cpu_msg)),%esi # Error message
> + lea esi_offset(.Lbad_cpu_msg),%esi # Error message
Why not just
add $sym_offset(.Lbad_cpu_msg),%esi # Error message
? Together with not doing said rename, this would be more obviously
correct.
> @@ -409,36 +436,93 @@ trampoline_bios_setup:
> cmovb %edx,%ecx /* and use the smaller */
>
> trampoline_setup:
> + /*
> + * Called on legacy BIOS and EFI platforms.
> + *
> + * Compute 0-15 bits of BOOT_FS segment descriptor base address.
> + */
> + mov %esi,%edx
> + shl $16,%edx
> + or %edx,BOOT_FS+esi_offset(trampoline_gdt)
mov %si,BOOT_FS+esi_offset(trampoline_gdt)
> + /* Compute 16-23 bits of BOOT_FS segment descriptor base address. */
> + mov %esi,%edx
> + shr $16,%edx
> + and $0x000000ff,%edx
> + or %edx,BOOT_FS+4+esi_offset(trampoline_gdt)
mov %dl,BOOT_FS+4+esi_offset(trampoline_gdt)
> + /* Compute 24-31 bits of BOOT_FS segment descriptor base address. */
> + mov %esi,%edx
> + and $0xff000000,%edx
> + or %edx,BOOT_FS+4+esi_offset(trampoline_gdt)
mov %dh,BOOT_FS+7+esi_offset(trampoline_gdt)
(with various of the intermediate instructions dropped)
That'll reduce you code size concern for the GDT setup quite a bit.
> + /*
> + * Initialise %fs and later use it to access Xen data if possible.
> + * According to Intel 64 and IA-32 Architectures Software Developer’s
> + * Manual it is safe to do that without reloading GDTR before.
> + *
> + * Please check Intel 64 and IA-32 Architectures Software Developer’s
> + * Manual, Volume 2 (2A, 2B & 2C): Instruction Set Reference,
> + * LGDT and MOV instructions description and
> + * Intel 64 and IA-32 Architectures Software Developer’s
> + * Manual Volume 3 (3A, 3B & 3C): System Programming Guide,
> + * section 3.4.3, Segment Registers for more details.
> + *
> + * AIUI, only GDT address and limit are loaded into GDTR when
> + * lgdt is executed. Segment descriptor is loaded directly from
> + * memory into segment register (hiden part) only when relevant
> + * load instruction is used (e.g. mov %edx,%fs). Though GDT content
> + * probably could be stored in CPU cache but nothing suggest that
> + * CPU caching interfere in one way or another with segment
> descriptor
> + * load. So, it looks that every change in active GDT is immediately
> + * available for relevant segment descriptor load instruction.
> + *
> + * I was not able to find anything which invalidates above.
> + * So, everything suggest that we do not need an extra lgdt here.
> + */
All of this comment except for the first sentence is just stating basic
architecture facts. Please drop all that.
> + mov $BOOT_FS,%edx
> + mov %edx,%fs
> +
> /* Reserve 64kb for the trampoline. */
> sub $0x1000,%ecx
>
> /* From arch/x86/smpboot.c: start_eip had better be page-aligned! */
> xor %cl, %cl
> shl $4, %ecx
> - mov %ecx,sym_phys(trampoline_phys)
> + mov %ecx,fs_offset(trampoline_phys)
Seeing the first instance, together with the earlier comment about
not renaming sym_phys() I think this would end up more consistently
with using sym_fs() (and then obviously sym_esi()) as names.
> @@ -461,62 +545,88 @@ trampoline_setup:
>
> /* Stash TSC to calculate a good approximation of time-since-boot */
> rdtsc
> - mov %eax,sym_phys(boot_tsc_stamp)
> - mov %edx,sym_phys(boot_tsc_stamp+4)
> + mov %eax,fs_offset(boot_tsc_stamp)
> + mov %edx,fs_offset(boot_tsc_stamp)+4
> +
> + /* Update frame addresses in page tables. */
> + mov $((__page_tables_end-__page_tables_start)/8),%ecx
> +1: testl $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8)
> + jz 2f
> + add %esi,fs_offset(__page_tables_start)-8(,%ecx,8)
> +2: loop 1b
This loop includes the mapping of the low Mb, which I don't think it
should modify. Or wait - you move __page_tables_start, which by
itself is fragile (but looks to be acceptable if accompanied by a
comment explaining why it doesn't cover l1_identmap).
Also, just to double check - all these page table setup adjustments
don't require reflecting in xen.efi's page table setup code?
> /* Apply relocations to bootstrap trampoline. */
> - mov sym_phys(trampoline_phys),%edx
> - mov $sym_phys(__trampoline_rel_start),%edi
> + mov fs_offset(trampoline_phys),%edx
> + mov $sym_offset(__trampoline_rel_start),%edi
> + mov $sym_offset(__trampoline_rel_stop),%ebx
> 1:
> - mov (%edi),%eax
> - add %edx,(%edi,%eax)
> + mov %fs:(%edi),%eax
> + add %edx,%fs:(%edi,%eax)
> add $4,%edi
> - cmp $sym_phys(__trampoline_rel_stop),%edi
> + cmp %ebx,%edi
> jb 1b
And again it looks like the switch to using %ebx here is not needed.
> /* Patch in the trampoline segment. */
> shr $4,%edx
> - mov $sym_phys(__trampoline_seg_start),%edi
> + mov $sym_offset(__trampoline_seg_start),%edi
> + mov $sym_offset(__trampoline_seg_stop),%ebx
> 1:
> - mov (%edi),%eax
> - mov %dx,(%edi,%eax)
> + mov %fs:(%edi),%eax
> + mov %dx,%fs:(%edi,%eax)
> add $4,%edi
> - cmp $sym_phys(__trampoline_seg_stop),%edi
> + cmp %ebx,%edi
> jb 1b
Same here then, obviously.
> --- a/xen/arch/x86/boot/trampoline.S
> +++ b/xen/arch/x86/boot/trampoline.S
> @@ -54,12 +54,20 @@ trampoline_gdt:
> /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */
> .long 0x0000ffff
> .long 0x00009200
> + /*
> + * 0x0030: ring 0 Xen data, 16 MiB size, base
> + * address is computed during runtime.
> + */
> + .quad 0x00c0920000001000
This does not look like it covers 1Mb, it's more like 1Mb+4k-1.
> .pushsection .trampoline_rel, "a"
> .long trampoline_gdt + BOOT_PSEUDORM_CS + 2 - .
> .long trampoline_gdt + BOOT_PSEUDORM_DS + 2 - .
> .popsection
>
> +GLOBAL(xen_img_load_base_addr)
> + .long 0
I've yet to understand the purpose of this symbol, but in any case:
If the trampoline code doesn't reference it, why does it get placed
here?
> @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
> #endif
>
> + /* Do not relocate Xen image if boot loader did work for us. */
> + if ( xen_img_load_base_addr )
> + xen_phys_start = xen_img_load_base_addr;
Okay, with this change the question really is: Why do you need the
new symbol? I.e. why can't you just use xen_phys_start, just like I
managed to re-use it when I introduced boot from EFI?
> for ( i = boot_e820.nr_map-1; i >= 0; i-- )
> {
> uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
> uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT;
>
> - /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but
please don't delete the comment as a whole.
> s = (boot_e820.map[i].addr + mask) & ~mask;
> e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> - s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
> continue;
This means you now map memory between 2Mb and 16Mb here. Is
this necessary?
> @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> l4_pgentry_t *pl4e;
> l3_pgentry_t *pl3e;
> l2_pgentry_t *pl2e;
> - uint64_t load_start;
> int i, j, k;
>
> /* Select relocation address. */
> @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> * with a barrier(). After this we must *not* modify
> static/global
> * data until after we have switched to the relocated pagetables!
> */
> - load_start = (unsigned long)_start - XEN_VIRT_START;
> barrier();
> - move_memory(e + load_start, load_start, _end - _start, 1);
> + move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start,
> 1);
Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this
change necessary? Or is it rather just simplification, which again
should be split from an already complex patch.
> @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> * Undo the temporary-hooking of the l1_identmap.
> __2M_text_start
> * is contained in this PTE.
> */
> - BUG_ON(l2_table_offset((unsigned long)_erodata) ==
> - l2_table_offset((unsigned long)_stext));
At least when using_2M_mapping() this surely ought to hold?
> @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> : "memory" );
>
> bootstrap_map(NULL);
> +
> + printk("New Xen image base address: 0x%08lx\n", xen_phys_start);
I don't see the motivation for adding such a message in this patch,
but if, then please use %#lx.
> @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>
> if ( !xen_phys_start )
> panic("Not enough memory to relocate Xen.");
> +
> reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));
Another stray change.
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -55,7 +55,7 @@ SECTIONS
> __2M_text_start = .; /* Start of 2M superpages, mapped RX. */
> #endif
>
> - . = __XEN_VIRT_START + MB(1);
> + . = __XEN_VIRT_START + XEN_IMG_OFFSET;
> _start = .;
> .text : {
> _stext = .; /* Text and read-only data */
> @@ -257,12 +257,14 @@ SECTIONS
> .reloc : {
> *(.reloc)
> } :text
> - /* Trick the linker into setting the image size to exactly 16Mb. */
> . = ALIGN(__section_alignment__);
> +#endif
> +
> + /* Trick the linker into setting the image size to exactly 16Mb. */
> .pad : {
> . = ALIGN(MB(16));
> + __end_of_image__ = .;
I see that you use this symbol in xen/arch/x86/Makefile, but I again
don't follow why this logic needs to change.
> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -288,7 +288,7 @@ extern root_pgentry_t
> idle_pg_table[ROOT_PAGETABLE_ENTRIES];
> extern l2_pgentry_t *compat_idle_pg_table_l2;
> extern unsigned int m2p_compat_vstart;
> extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES],
> - l2_bootmap[L2_PAGETABLE_ENTRIES];
> + l2_bootmap[4*L2_PAGETABLE_ENTRIES];
Why do you need 4 of them? I can see why one might not suffice,
but two surely should?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |