[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] [PATCH]fix initialization order of buddy allocator



Hi,

Machines with multi NUMA nodes may panic on bootup.
Attached patch(for C/S15145), in which I modified
the initialization order of buddy allocator, fixes
this problem.
I tested booting dom0/domVTi and kernel-make on guests.
Any comments and feedbacks would be appreciated.

I will describe the issue and the cause of it later, but
I have a few questions:

1. I moved acpi_table_init(), acpi_numa_init(), and
  smp_build_cpu_map() to early_setup_arch() from late_setup_arch().
  It works and, as far as I read source codes, it seems there is
  no bad effect.
  What do you think ?
2. The xenheap area (from xen_pstart to xenheap_phys_end) must exist
  in node0 from its design?
  (As far as I know, if xenheap is not in node0, the initialization
  process of xenheap recursively needs xenheap memory)


[Issue detail]
I have been testing Xen/IA64 on NEC's IPF server(AsAmA2).

CPU: Itanium2(Montecito) 16cpus/32cores
Memory: 128GB(16GB/node)
OS: SLES10

It had worked fine till at least C/S14077, but after I
upgraded to C/S14828, dom0 got panic at boot time with
messages like attached at end of this mail.

I traced the problem down, and figured out that the reason of
the panic was an access to avail[4][23], while avail[4]
was 0(that is, it was not allocated).

I read source codes and inserted debug codes, and figured out
that the root cause of this problem is bad order of initialization
of buddy allocator.

In current order, when end_boot_allocator() is called, node_memblk[]
and xenheap is not initialized.
But init_heap_pages()(called by end_boot_allocator() and other
functions) calls phys_to_nid(), which needs node_memblk[], and
xmalloc_array(), which needs xenheap.
So node_memblk[] and xenheap should be initialized before
end_boot_allocator().

I haven't confirmed it, but it seems that C/S14106(xen
memory allocator: Dynamically allocate per-numa-node
metadata) revealed this potential bug.


[panic message]
  :
netconsole: not configured, aborting
Linux video capture interface: v2.00
Xen virtual console successfully installed as ttyS0
Event-channel device installed.
(XEN) *** xen_handle_domain_access: exception table lookup failed,
iip=0xf000000004032e10, addr=0xb0, spinning...
(XEN) $$$$$ PANIC in domain 0 (k6=0xf000000007b00000): ***
xen_handle_domain_access: exception table lookup failed,
iip=0xf000000004032e10, addr=0xb0, spinning...
(XEN) d 0xf000000007b28080 domid 0
(XEN) vcpu 0xf000000007b00000 vcpu 1
(XEN)
(XEN) CPU 1
(XEN) psr : 0000101008226018 ifs : 800000000000060e ip  :
[<f000000004032e10>]
(XEN) ip is at free_heap_pages+0x2d0/0x6c0
(XEN) unat: 0000000000000000 pfs : 000000000000060e rsc : 0000000000000003
(XEN) rnat: 0000000000000538 bsps: 0000000000000000 pr  : 000000000002a599
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f000000004032dd0 b6  : f0000000040abe30 b7  : f000000004002e20
(XEN) f6  : 000000000000000000000 f7  : 1003e0000000000000000
(XEN) f8  : 1003e0000000000002000 f9  : 100058000000000000000
(XEN) f10 : 1003e0000000000002000 f11 : 1003e0000000000000001
(XEN) r1  : f00000000438d720 r2  : 0000000000000000 r3  : f00000201ea6bde9
(XEN) r8  : 0000000000000004 r9  : ffffffffffffffff r10 : 0000000000000000
(XEN) r11 : 0000000000020959 r12 : f000000007b07d30 r13 : f000000007b00000
(XEN) r14 : f00000000419af70 r15 : 00000000000000b0 r16 : 0000000000000001
(XEN) r17 : f000000004251578 r18 : 0000000000000022 r19 : 0000000000000023
(XEN) r20 : 0000000000000001 r21 : f000000004128218 r22 : f000000004190c58
(XEN) r23 : f30000000c4adb64 r24 : f000000004128208 r25 : 0000000006497b93
(XEN) r26 : 0000000000000016 r27 : 0000000000000000 r28 : 0000000000000000
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004196bc8
(XEN)
(XEN) Call Trace:
(XEN)  [<f0000000040b2a70>] show_stack+0x80/0xa0
(XEN)                                 sp=f000000007b077e0
bsp=f000000007b015e8
(XEN)  [<f000000004089500>] panic_domain+0x120/0x170
(XEN)                                 sp=f000000007b079b0
bsp=f000000007b01580
(XEN)  [<f00000000407e1f0>] ia64_do_page_fault+0x640/0x650
(XEN)                                 sp=f000000007b07af0
bsp=f000000007b014f0
(XEN)  [<f0000000040ab880>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f000000007b07b30
bsp=f000000007b014f0
(XEN)  [<f000000004032e10>] free_heap_pages+0x2d0/0x6c0
(XEN)                                 sp=f000000007b07d30
bsp=f000000007b01480
(XEN)  [<f000000004034180>] free_domheap_pages+0x430/0x880
(XEN)                                 sp=f000000007b07d30
bsp=f000000007b01440
(XEN)  [<f00000000402f220>] guest_remove_page+0x390/0x580
(XEN)                                 sp=f000000007b07d30
bsp=f000000007b013e0


Thanks,
Daisuke Nishimura.


diff -r 2b14a1f22eec xen/arch/ia64/linux-xen/setup.c
--- a/xen/arch/ia64/linux-xen/setup.c   Fri May 25 09:43:21 2007 -0600
+++ b/xen/arch/ia64/linux-xen/setup.c   Mon May 28 13:26:25 2007 +0900
@@ -506,13 +506,6 @@ setup_arch (char **cmdline_p)
        if (early_console_setup(*cmdline_p) == 0)
                mark_bsp_online();
 
-#ifdef XEN
-}
-
-void __init
-late_setup_arch (char **cmdline_p)
-{
-#endif
 #ifdef CONFIG_ACPI_BOOT
        /* Initialize the ACPI boot-time table parser */
        acpi_table_init();
@@ -525,6 +518,13 @@ late_setup_arch (char **cmdline_p)
 # endif
 #endif /* CONFIG_APCI_BOOT */
 
+#ifdef XEN
+}
+
+void __init
+late_setup_arch (char **cmdline_p)
+{
+#endif
 #ifndef XEN
        find_memory();
 #endif
diff -r 2b14a1f22eec xen/arch/ia64/xen/xensetup.c
--- a/xen/arch/ia64/xen/xensetup.c      Fri May 25 09:43:21 2007 -0600
+++ b/xen/arch/ia64/xen/xensetup.c      Mon May 28 13:26:25 2007 +0900
@@ -433,12 +433,12 @@ void __init start_kernel(void)
 
     alloc_dom0();
 
-    end_boot_allocator();
-
     init_xenheap_pages(__pa(xen_heap_start), xenheap_phys_end);
     printk("Xen heap: %luMB (%lukB)\n",
        (xenheap_phys_end-__pa(xen_heap_start)) >> 20,
        (xenheap_phys_end-__pa(xen_heap_start)) >> 10);
+
+    end_boot_allocator();
 
     late_setup_arch(&cmdline);
 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.