[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-ia64-devel] [PATCH]fix initialization order of buddy allocator
Hi, Machines with multi NUMA nodes may panic on bootup. Attached patch(for C/S15145), in which I modified the initialization order of buddy allocator, fixes this problem. I tested booting dom0/domVTi and kernel-make on guests. Any comments and feedbacks would be appreciated. I will describe the issue and the cause of it later, but I have a few questions: 1. I moved acpi_table_init(), acpi_numa_init(), and smp_build_cpu_map() to early_setup_arch() from late_setup_arch(). It works and, as far as I read source codes, it seems there is no bad effect. What do you think ? 2. The xenheap area (from xen_pstart to xenheap_phys_end) must exist in node0 from its design? (As far as I know, if xenheap is not in node0, the initialization process of xenheap recursively needs xenheap memory) [Issue detail] I have been testing Xen/IA64 on NEC's IPF server(AsAmA2). CPU: Itanium2(Montecito) 16cpus/32cores Memory: 128GB(16GB/node) OS: SLES10 It had worked fine till at least C/S14077, but after I upgraded to C/S14828, dom0 got panic at boot time with messages like attached at end of this mail. I traced the problem down, and figured out that the reason of the panic was an access to avail[4][23], while avail[4] was 0(that is, it was not allocated). I read source codes and inserted debug codes, and figured out that the root cause of this problem is bad order of initialization of buddy allocator. In current order, when end_boot_allocator() is called, node_memblk[] and xenheap is not initialized. But init_heap_pages()(called by end_boot_allocator() and other functions) calls phys_to_nid(), which needs node_memblk[], and xmalloc_array(), which needs xenheap. So node_memblk[] and xenheap should be initialized before end_boot_allocator(). I haven't confirmed it, but it seems that C/S14106(xen memory allocator: Dynamically allocate per-numa-node metadata) revealed this potential bug. [panic message] : netconsole: not configured, aborting Linux video capture interface: v2.00 Xen virtual console successfully installed as ttyS0 Event-channel device installed. (XEN) *** xen_handle_domain_access: exception table lookup failed, iip=0xf000000004032e10, addr=0xb0, spinning... (XEN) $$$$$ PANIC in domain 0 (k6=0xf000000007b00000): *** xen_handle_domain_access: exception table lookup failed, iip=0xf000000004032e10, addr=0xb0, spinning... (XEN) d 0xf000000007b28080 domid 0 (XEN) vcpu 0xf000000007b00000 vcpu 1 (XEN) (XEN) CPU 1 (XEN) psr : 0000101008226018 ifs : 800000000000060e ip : [<f000000004032e10>] (XEN) ip is at free_heap_pages+0x2d0/0x6c0 (XEN) unat: 0000000000000000 pfs : 000000000000060e rsc : 0000000000000003 (XEN) rnat: 0000000000000538 bsps: 0000000000000000 pr : 000000000002a599 (XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f (XEN) csd : 0000000000000000 ssd : 0000000000000000 (XEN) b0 : f000000004032dd0 b6 : f0000000040abe30 b7 : f000000004002e20 (XEN) f6 : 000000000000000000000 f7 : 1003e0000000000000000 (XEN) f8 : 1003e0000000000002000 f9 : 100058000000000000000 (XEN) f10 : 1003e0000000000002000 f11 : 1003e0000000000000001 (XEN) r1 : f00000000438d720 r2 : 0000000000000000 r3 : f00000201ea6bde9 (XEN) r8 : 0000000000000004 r9 : ffffffffffffffff r10 : 0000000000000000 (XEN) r11 : 0000000000020959 r12 : f000000007b07d30 r13 : f000000007b00000 (XEN) r14 : f00000000419af70 r15 : 00000000000000b0 r16 : 0000000000000001 (XEN) r17 : f000000004251578 r18 : 0000000000000022 r19 : 0000000000000023 (XEN) r20 : 0000000000000001 r21 : f000000004128218 r22 : f000000004190c58 (XEN) r23 : f30000000c4adb64 r24 : f000000004128208 r25 : 0000000006497b93 (XEN) r26 : 0000000000000016 r27 : 0000000000000000 r28 : 0000000000000000 (XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004196bc8 (XEN) (XEN) Call Trace: (XEN) [<f0000000040b2a70>] show_stack+0x80/0xa0 (XEN) sp=f000000007b077e0 bsp=f000000007b015e8 (XEN) [<f000000004089500>] panic_domain+0x120/0x170 (XEN) sp=f000000007b079b0 bsp=f000000007b01580 (XEN) [<f00000000407e1f0>] ia64_do_page_fault+0x640/0x650 (XEN) sp=f000000007b07af0 bsp=f000000007b014f0 (XEN) [<f0000000040ab880>] ia64_leave_kernel+0x0/0x300 (XEN) sp=f000000007b07b30 bsp=f000000007b014f0 (XEN) [<f000000004032e10>] free_heap_pages+0x2d0/0x6c0 (XEN) sp=f000000007b07d30 bsp=f000000007b01480 (XEN) [<f000000004034180>] free_domheap_pages+0x430/0x880 (XEN) sp=f000000007b07d30 bsp=f000000007b01440 (XEN) [<f00000000402f220>] guest_remove_page+0x390/0x580 (XEN) sp=f000000007b07d30 bsp=f000000007b013e0 Thanks, Daisuke Nishimura. diff -r 2b14a1f22eec xen/arch/ia64/linux-xen/setup.c --- a/xen/arch/ia64/linux-xen/setup.c Fri May 25 09:43:21 2007 -0600 +++ b/xen/arch/ia64/linux-xen/setup.c Mon May 28 13:26:25 2007 +0900 @@ -506,13 +506,6 @@ setup_arch (char **cmdline_p) if (early_console_setup(*cmdline_p) == 0) mark_bsp_online(); -#ifdef XEN -} - -void __init -late_setup_arch (char **cmdline_p) -{ -#endif #ifdef CONFIG_ACPI_BOOT /* Initialize the ACPI boot-time table parser */ acpi_table_init(); @@ -525,6 +518,13 @@ late_setup_arch (char **cmdline_p) # endif #endif /* CONFIG_APCI_BOOT */ +#ifdef XEN +} + +void __init +late_setup_arch (char **cmdline_p) +{ +#endif #ifndef XEN find_memory(); #endif diff -r 2b14a1f22eec xen/arch/ia64/xen/xensetup.c --- a/xen/arch/ia64/xen/xensetup.c Fri May 25 09:43:21 2007 -0600 +++ b/xen/arch/ia64/xen/xensetup.c Mon May 28 13:26:25 2007 +0900 @@ -433,12 +433,12 @@ void __init start_kernel(void) alloc_dom0(); - end_boot_allocator(); - init_xenheap_pages(__pa(xen_heap_start), xenheap_phys_end); printk("Xen heap: %luMB (%lukB)\n", (xenheap_phys_end-__pa(xen_heap_start)) >> 20, (xenheap_phys_end-__pa(xen_heap_start)) >> 10); + + end_boot_allocator(); late_setup_arch(&cmdline); _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |