Xen project Mailing List

Re: [Xen-devel] [PATCH v5 15/16] x86: make Xen early boot code relocatable

>>> On 20.08.16 at 00:43, <daniel.kiper@xxxxxxxxxx> wrote: > Every multiboot protocol (regardless of version) compatible image must > specify its load address (in ELF or multiboot header). Multiboot protocol > compatible loader have to load image at specified address. However, there > is no guarantee that the requested memory region (in case of Xen it starts > at 1 MiB and ends at 17 MiB) where image should be loaded initially is a RAM > and it is free (legacy BIOS platforms are merciful for Xen but I found at > least one EFI platform on which Xen load address conflicts with EFI boot > services; it is Dell PowerEdge R820 with latest firmware). To cope with that > problem we must make Xen early boot code relocatable and help boot loader to > relocate image in proper way by suggesting, not requesting specific load > addresses as it is right now, allowed address ranges. This patch does former. > It does not add multiboot2 protocol interface which is done in "x86: add > multiboot2 protocol support for relocatable images" patch. > > This patch changes following things: > - default load address is changed from 1 MiB to 2 MiB; I did that because > initial page tables are using 2 MiB huge pages and this way required > updates for them are quite easy; it means that e.g. we avoid spacial > cases for start and end of required memory region if it live at address > not aligned to 2 MiB, Should this be a separate change then, to separate possible regressions due to that from such due to any other of the changes here? And then I don't see why, when making the image relocatable anyway, the link time load address actually still matters. > - %esi and %r15d registers are used as a storage for Xen image load base > address (%r15d shortly because %rsi is used for EFI SystemTable address > in 64-bit code); both registers are (%esi is mostly) unused in early > boot code and preserved during C functions calls, In a context where you (also) talk about 64-bit code, %esi can't be called preserved unilaterally. You should make clear that this is for 32-bit function calls. > --- a/xen/arch/x86/boot/head.S > +++ b/xen/arch/x86/boot/head.S > @@ -12,13 +12,16 @@ > .text > .code32 > > -#define sym_phys(sym) ((sym) - __XEN_VIRT_START) > +#define sym_offset(sym) ((sym) - __XEN_VIRT_START) In a patch already large and hence hard to review, did you really need to do this rename? > @@ -126,26 +130,26 @@ vga_text_buffer: > .section .init.text, "ax", @progbits > > bad_cpu: > - mov $(sym_phys(.Lbad_cpu_msg)),%esi # Error message > + lea esi_offset(.Lbad_cpu_msg),%esi # Error message Why not just add $sym_offset(.Lbad_cpu_msg),%esi # Error message ? Together with not doing said rename, this would be more obviously correct. > @@ -409,36 +436,93 @@ trampoline_bios_setup: > cmovb %edx,%ecx /* and use the smaller */ > > trampoline_setup: > + /* > + * Called on legacy BIOS and EFI platforms. > + * > + * Compute 0-15 bits of BOOT_FS segment descriptor base address. > + */ > + mov %esi,%edx > + shl $16,%edx > + or %edx,BOOT_FS+esi_offset(trampoline_gdt) mov %si,BOOT_FS+esi_offset(trampoline_gdt) > + /* Compute 16-23 bits of BOOT_FS segment descriptor base address. */ > + mov %esi,%edx > + shr $16,%edx > + and $0x000000ff,%edx > + or %edx,BOOT_FS+4+esi_offset(trampoline_gdt) mov %dl,BOOT_FS+4+esi_offset(trampoline_gdt) > + /* Compute 24-31 bits of BOOT_FS segment descriptor base address. */ > + mov %esi,%edx > + and $0xff000000,%edx > + or %edx,BOOT_FS+4+esi_offset(trampoline_gdt) mov %dh,BOOT_FS+7+esi_offset(trampoline_gdt) (with various of the intermediate instructions dropped) That'll reduce you code size concern for the GDT setup quite a bit. > + /* > + * Initialise %fs and later use it to access Xen data if possible. > + * According to Intel 64 and IA-32 Architectures Software Developer’s > + * Manual it is safe to do that without reloading GDTR before. > + * > + * Please check Intel 64 and IA-32 Architectures Software Developer’s > + * Manual, Volume 2 (2A, 2B & 2C): Instruction Set Reference, > + * LGDT and MOV instructions description and > + * Intel 64 and IA-32 Architectures Software Developer’s > + * Manual Volume 3 (3A, 3B & 3C): System Programming Guide, > + * section 3.4.3, Segment Registers for more details. > + * > + * AIUI, only GDT address and limit are loaded into GDTR when > + * lgdt is executed. Segment descriptor is loaded directly from > + * memory into segment register (hiden part) only when relevant > + * load instruction is used (e.g. mov %edx,%fs). Though GDT content > + * probably could be stored in CPU cache but nothing suggest that > + * CPU caching interfere in one way or another with segment > descriptor > + * load. So, it looks that every change in active GDT is immediately > + * available for relevant segment descriptor load instruction. > + * > + * I was not able to find anything which invalidates above. > + * So, everything suggest that we do not need an extra lgdt here. > + */ All of this comment except for the first sentence is just stating basic architecture facts. Please drop all that. > + mov $BOOT_FS,%edx > + mov %edx,%fs > + > /* Reserve 64kb for the trampoline. */ > sub $0x1000,%ecx > > /* From arch/x86/smpboot.c: start_eip had better be page-aligned! */ > xor %cl, %cl > shl $4, %ecx > - mov %ecx,sym_phys(trampoline_phys) > + mov %ecx,fs_offset(trampoline_phys) Seeing the first instance, together with the earlier comment about not renaming sym_phys() I think this would end up more consistently with using sym_fs() (and then obviously sym_esi()) as names. > @@ -461,62 +545,88 @@ trampoline_setup: > > /* Stash TSC to calculate a good approximation of time-since-boot */ > rdtsc > - mov %eax,sym_phys(boot_tsc_stamp) > - mov %edx,sym_phys(boot_tsc_stamp+4) > + mov %eax,fs_offset(boot_tsc_stamp) > + mov %edx,fs_offset(boot_tsc_stamp)+4 > + > + /* Update frame addresses in page tables. */ > + mov $((__page_tables_end-__page_tables_start)/8),%ecx > +1: testl $_PAGE_PRESENT,fs_offset(__page_tables_start)-8(,%ecx,8) > + jz 2f > + add %esi,fs_offset(__page_tables_start)-8(,%ecx,8) > +2: loop 1b This loop includes the mapping of the low Mb, which I don't think it should modify. Or wait - you move __page_tables_start, which by itself is fragile (but looks to be acceptable if accompanied by a comment explaining why it doesn't cover l1_identmap). Also, just to double check - all these page table setup adjustments don't require reflecting in xen.efi's page table setup code? > /* Apply relocations to bootstrap trampoline. */ > - mov sym_phys(trampoline_phys),%edx > - mov $sym_phys(__trampoline_rel_start),%edi > + mov fs_offset(trampoline_phys),%edx > + mov $sym_offset(__trampoline_rel_start),%edi > + mov $sym_offset(__trampoline_rel_stop),%ebx > 1: > - mov (%edi),%eax > - add %edx,(%edi,%eax) > + mov %fs:(%edi),%eax > + add %edx,%fs:(%edi,%eax) > add $4,%edi > - cmp $sym_phys(__trampoline_rel_stop),%edi > + cmp %ebx,%edi > jb 1b And again it looks like the switch to using %ebx here is not needed. > /* Patch in the trampoline segment. */ > shr $4,%edx > - mov $sym_phys(__trampoline_seg_start),%edi > + mov $sym_offset(__trampoline_seg_start),%edi > + mov $sym_offset(__trampoline_seg_stop),%ebx > 1: > - mov (%edi),%eax > - mov %dx,(%edi,%eax) > + mov %fs:(%edi),%eax > + mov %dx,%fs:(%edi,%eax) > add $4,%edi > - cmp $sym_phys(__trampoline_seg_stop),%edi > + cmp %ebx,%edi > jb 1b Same here then, obviously. > --- a/xen/arch/x86/boot/trampoline.S > +++ b/xen/arch/x86/boot/trampoline.S > @@ -54,12 +54,20 @@ trampoline_gdt: > /* 0x0028: real-mode data @ BOOT_TRAMPOLINE */ > .long 0x0000ffff > .long 0x00009200 > + /* > + * 0x0030: ring 0 Xen data, 16 MiB size, base > + * address is computed during runtime. > + */ > + .quad 0x00c0920000001000 This does not look like it covers 1Mb, it's more like 1Mb+4k-1. > .pushsection .trampoline_rel, "a" > .long trampoline_gdt + BOOT_PSEUDORM_CS + 2 - . > .long trampoline_gdt + BOOT_PSEUDORM_DS + 2 - . > .popsection > > +GLOBAL(xen_img_load_base_addr) > + .long 0 I've yet to understand the purpose of this symbol, but in any case: If the trampoline code doesn't reference it, why does it get placed here? > @@ -861,15 +860,17 @@ void __init noreturn __start_xen(unsigned long mbi_p) > highmem_start &= ~((1UL << L3_PAGETABLE_SHIFT) - 1); > #endif > > + /* Do not relocate Xen image if boot loader did work for us. */ > + if ( xen_img_load_base_addr ) > + xen_phys_start = xen_img_load_base_addr; Okay, with this change the question really is: Why do you need the new symbol? I.e. why can't you just use xen_phys_start, just like I managed to re-use it when I introduced boot from EFI? > for ( i = boot_e820.nr_map-1; i >= 0; i-- ) > { > uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1; > uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT; > > - /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */ I can see that you want to get rid of BOOTSTRAP_MAP_BASE, but please don't delete the comment as a whole. > s = (boot_e820.map[i].addr + mask) & ~mask; > e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask; > - s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE); > if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) ) > continue; This means you now map memory between 2Mb and 16Mb here. Is this necessary? > @@ -901,7 +902,6 @@ void __init noreturn __start_xen(unsigned long mbi_p) > l4_pgentry_t *pl4e; > l3_pgentry_t *pl3e; > l2_pgentry_t *pl2e; > - uint64_t load_start; > int i, j, k; > > /* Select relocation address. */ > @@ -915,9 +915,8 @@ void __init noreturn __start_xen(unsigned long mbi_p) > * with a barrier(). After this we must *not* modify > static/global > * data until after we have switched to the relocated pagetables! > */ > - load_start = (unsigned long)_start - XEN_VIRT_START; > barrier(); > - move_memory(e + load_start, load_start, _end - _start, 1); > + move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, > 1); Assuming _start - XEN_VIRT_START == XEN_IMG_OFFSET, is this change necessary? Or is it rather just simplification, which again should be split from an already complex patch. > @@ -957,15 +956,14 @@ void __init noreturn __start_xen(unsigned long mbi_p) > * Undo the temporary-hooking of the l1_identmap. > __2M_text_start > * is contained in this PTE. > */ > - BUG_ON(l2_table_offset((unsigned long)_erodata) == > - l2_table_offset((unsigned long)_stext)); At least when using_2M_mapping() this surely ought to hold? > @@ -1019,6 +1017,8 @@ void __init noreturn __start_xen(unsigned long mbi_p) > : "memory" ); > > bootstrap_map(NULL); > + > + printk("New Xen image base address: 0x%08lx\n", xen_phys_start); I don't see the motivation for adding such a message in this patch, but if, then please use %#lx. > @@ -1082,6 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > if ( !xen_phys_start ) > panic("Not enough memory to relocate Xen."); > + > reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end)); Another stray change. > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -55,7 +55,7 @@ SECTIONS > __2M_text_start = .; /* Start of 2M superpages, mapped RX. */ > #endif > > - . = __XEN_VIRT_START + MB(1); > + . = __XEN_VIRT_START + XEN_IMG_OFFSET; > _start = .; > .text : { > _stext = .; /* Text and read-only data */ > @@ -257,12 +257,14 @@ SECTIONS > .reloc : { > *(.reloc) > } :text > - /* Trick the linker into setting the image size to exactly 16Mb. */ > . = ALIGN(__section_alignment__); > +#endif > + > + /* Trick the linker into setting the image size to exactly 16Mb. */ > .pad : { > . = ALIGN(MB(16)); > + __end_of_image__ = .; I see that you use this symbol in xen/arch/x86/Makefile, but I again don't follow why this logic needs to change. > --- a/xen/include/asm-x86/page.h > +++ b/xen/include/asm-x86/page.h > @@ -288,7 +288,7 @@ extern root_pgentry_t > idle_pg_table[ROOT_PAGETABLE_ENTRIES]; > extern l2_pgentry_t *compat_idle_pg_table_l2; > extern unsigned int m2p_compat_vstart; > extern l2_pgentry_t l2_xenmap[L2_PAGETABLE_ENTRIES], > - l2_bootmap[L2_PAGETABLE_ENTRIES]; > + l2_bootmap[4*L2_PAGETABLE_ENTRIES]; Why do you need 4 of them? I can see why one might not suffice, but two surely should? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.