 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] acpidump crashes on some machines
 On 06/20/2012 04:51 PM, Konrad Rzeszutek Wilk wrote: On Wed, Jun 20, 2012 at 02:37:55PM +0200, Andre Przywara wrote: Konrad, thanks for looking at the problem. Replies inline... we have some problems with acpidump running on Xen Dom0. On 64 bit Dom0 it will trigger the OOM killer, on 32 bit Dom0s it will cause a kernel crash. The hypervisor does not matter, I tried 4.1.3-rc2 as well as various unstable versions including 25467, also 32-bit versions of 4.1. The Dom0 kernels were always PVOPS versions, the problems starts with 3.2-rc1~194 and is still in 3.5.0-rc3. Also you need to restrict the Dom0 memory with dom0_mem= The crash says (on a 3.4.3 32bit Dom0 kernel): uruk:~ # ./acpidump32 [ 158.843444] ------------[ cut here ]------------ [ 158.843460] kernel BUG at mm/rmap.c:1027! [ 158.843466] invalid opcode: 0000 [#1] SMP [ 158.843472] Modules linked in: [ 158.843478] [ 158.843483] Pid: 4874, comm: acpidump32 Tainted: G W 3.4.0+ #105 empty empty/S3993 [ 158.843493] EIP: 0061:[<c10b0e27>] EFLAGS: 00010246 CPU: 3 [ 158.843505] EIP is at __page_set_anon_rmap+0x12/0x45 [ 158.843511] EAX: d6022dc0 EBX: dfecb6e0 ECX: b76faf64 EDX: b76faf64 [ 158.843516] ESI: 00000000 EDI: b76faf64 EBP: d6091e8c ESP: d6091e84 [ 158.843522] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 [ 158.843529] CR0: 8005003b CR2: b76faf64 CR3: 17633000 CR4: 00000660 [ 158.843535] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 158.843581] DR6: ffff0ff0 DR7: 00000400 [ 158.843586] Process acpidump32 (pid: 4874, ti=d6090000 task=d60b34f0 task.ti=d6090000) [ 158.843591] Stack: [ 158.843594] dfecb6e0 00000001 d6091ea8 c10b15c4 00000000 d6022dc0 d61fbdd8 d6022dc0 [ 158.843610] 00000000 d6091efc c10aacbe 00000000 99948025 80000001 d8aa1f80 80000001 [ 158.843631] dfefc800 00000000 d8aa1f80 00000000 166b7025 d7f407d0 b76faf64 99948025 [ 158.843649] Call Trace: [ 158.843656] [<c10b15c4>] do_page_add_anon_rmap+0x5b/0x64 [ 158.843664] [<c10aacbe>] handle_pte_fault+0x81d/0xa06 [ 158.843674] [<c10ab0ff>] handle_mm_fault+0x1fa/0x209 [ 158.843683] [<c159e4e8>] ? spurious_fault+0x104/0x104 [ 158.843688] [<c159e881>] do_page_fault+0x399/0x3b4 [ 158.843696] [<c10c639d>] ? filp_close+0x55/0x5f [ 158.843701] [<c10c6408>] ? sys_close+0x61/0xa0 [ 158.843706] [<c159e4e8>] ? spurious_fault+0x104/0x104 [ 158.843714] [<c159c452>] error_code+0x5a/0x60 [ 158.843720] [<c159e4e8>] ? spurious_fault+0x104/0x104 [ 158.843724] Code: e8 45 91 00 00 89 c2 eb 09 2b 50 04 c1 ea 0c 03 50 4c 89 53 08 5b 5e 5d c3 55 89 e5 56 53 89 c3 89 d0 89 ca 8b 70 44 85 f6 75 02<0f> 0b f6 43 04 01 75 27 83 7d 08 00 75 02 8b 36 46 89 73 04 f6 [ 158.843824] EIP: [<c10b0e27>] __page_set_anon_rmap+0x12/0x45 SS:ESP 0069:d6091e84 [ 158.843848] ---[ end trace 4eaa2a86a8e2da24 ]--- [ 158.843854] note: acpidump32[4874] exited with preempt_count 1 On 64bit the OOM goes around, finally killing the login shell: uruk:~ # ./acpidump_inst acpi_map_memory(917504, 131072); opened /dev/mem (fd=3) calling mmap(NULL, 131072, PROT_READ, MAP_PRIVATE, fd, e0000); mmap returned 0xf7571000, function returns 0xf7571000 acpi_map_table(cfef0f64, "XSDT"); acpi_map_memory(3488550756, 36); opened /dev/mem (fd=3) calling mmap(NULL, 3976, PROT_READ, MAP_PRIVATE, fd, cfef0000); mmap returned 0xf76fd000, function returns 0xf76fdf64 having mapped table header reading signature: Welcome to SUSE Linux Enterprise Server 11 SP1 (i586) - Kernel 3.5.0-rc3+ (hvc0). uruk login: ----------- This dump shows that the bug happens the moment acpidump accesses the mmapped ACPI table at @cfef0000 (the lower map at e0000 works).What is the e0000 one? I don't see in your E820 the region being reserved? E0000 is the below 1 MB BIOS area with the ACPI RSDP root pointer.E000:0000 in old DOS speak. The ACPI spec says that the pointer to the tables is hidden somewhere between 896K and 1MB at 16 byte granularity. acpidump scans this area for the ACPI magic number. So mapping /dev/mem is not fully broken, as this part at least works. This is extra unfortunate as in SLES11 acpidump will be called by the kbd init script (querying the BIOS NumLock setting!)Ah. Is the acpidump somewhere easily available to compile? Should I get it from here: http://www.lesswatts.org/projects/acpi/utilities.php Right, it is in the pmtools-20071116.tar.gz archive. Just say make in the acpidump directory. 
 Tell that those 2.6.32 or even 2.6.18 users... 
 Hmm, I couldn't trigger such messages. Do I need some magic config to enable them? So far I have (among others): CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_VIRTUAL=y CONFIG_DEBUG_MEMORY_INIT=y But it does not show on every machine here, so the machine E820 could actually be a differentiator. This particular box was a dual socket Barcelona server with 12GB of memory. This whole PV memory management goes beyond my knowledge, so I'd like to ask for help on this issue. If you need more information (I attached the boot log, which shows the two E820 tables), please ask. I can also quickly do some experiments if needed.This is strange one - the P2M code should fetch the MFN (so it should give you cfef0) whenever anybody asks for that. Lets double-check that. Can you try this little module? Right, it chokes. Mapping memory below 1MB works: # insmod testxenmap.ko pfn=0xf8 # rmmod testxenmap # dmesg ... [ 60.369526] va is 0xffff8800000f8000[ 60.369533] acpi:00000000: 80 dc 0f 00 00 ff 00 00 00 00 00 00 00 00 00 00 ................ [ 60.369536] acpi:00000010: 52 53 44 20 50 54 52 20 4a 50 54 4c 54 44 20 02 RSD PTR JPTLTD . [ 60.369538] acpi:00000020: 20 0f ef cf 24 00 00 00 64 0f ef cf 00 00 00 00 ...$...d....... ....you see the magic "RSD PTR " string here, at 0x20 the 32bit address of the actual tables (0xcfef0f20), which we try next: # insmod testxenmap.ko pfn=0xcfef0 insmod: error inserting 'testxenmap.ko': -1 Invalid parameters # dmesg .... [ 351.964914] ------------[ cut here ]------------[ 351.964924] WARNING: at /src/linux-2.6/xentest/testxenmap.c:24 acpitest_init+0x5e/0x1000 [testxenmap]() [ 351.964926] Hardware name: empty [ 351.964928] We get cfef0 instead of ffffffffffffffff! [ 351.964933] Modules linked in: testxenmap(O+) [last unloaded: testxenmap][ 351.964936] Pid: 4937, comm: insmod Tainted: G W O 3.5.0-rc3+ #106 [ 351.964938] Call Trace:[ 351.964944] [<ffffffffa000a05e>] ? acpitest_init+0x5e/0x1000 [testxenmap] [ 351.964953] [<ffffffff81050747>] warn_slowpath_common+0x80/0x98 [ 351.964956] [<ffffffffa000a000>] ? 0xffffffffa0009fff [ 351.964959] [<ffffffff810507f3>] warn_slowpath_fmt+0x41/0x43 [ 351.964963] [<ffffffffa000a05e>] acpitest_init+0x5e/0x1000 [testxenmap] [ 351.964966] [<ffffffffa000a000>] ? 0xffffffffa0009fff [ 351.964971] [<ffffffff8100215a>] do_one_initcall+0x7a/0x134 [ 351.964976] [<ffffffff81094512>] sys_init_module+0xbf/0x24b [ 351.964982] [<ffffffff816bb826>] cstar_dispatch+0x7/0x21 [ 351.964985] ---[ end trace 4eaa2a86a8e2da24 ]--- [ 351.964987] raw p2m (cfef0) gives us: ffffffffffffffffstarting the kernel without dom0_mem (where acpidump works flawlessly) also makes the module crash, although only at the point dumping the buffer (so this could be a different issue): # insmod testxenmap.ko pfn=0xcfef0 [ 243.071693] va is 0xffff8800cfef0000[ 243.071710] BUG: unable to handle kernel paging request at ffff8800cfef0000 [ 243.071733] IP: [<ffffffff81275a22>] hex_dump_to_buffer+0x19c/0x282 [ 243.071742] PGD 1c0c067 PUD f5b067 PMD fdb067 PTE 0 [ 243.071748] Oops: 0000 [#1] SMP [ 243.071753] CPU 5 [ 243.071757] Modules linked in: testxenmap(O+) [last unloaded: testxenmap] [ 243.071762][ 243.071768] Pid: 4825, comm: insmod Tainted: G W O 3.5.0-rc3+ #106 empty empty/S3993 [ 243.071777] RIP: e030:[<ffffffff81275a22>] [<ffffffff81275a22>] hex_dump_to_buffer+0x19c/0x282 [ 243.071783] RSP: e02b:ffff880312e2fd58 EFLAGS: 00010203 ... Hope that helps and thanks! Andre. [not compile tested] ACK ;-) #include<linux/module.h> #include<linux/kthread.h> #include<linux/pagemap.h> #include<linux/init.h> #include<xen/xen.h> + #include<xen/page.h> +unsigned long pfn = 0xcfef0; +module_param(pfn, ulong, 0644); +MODULE_PARM_DESC(pfn, "pfn to test"); + - unsigned int pfn = 0xcfef0; - unsigned int mfn; + unsigned long mfn; - WARN_ON(pfn != mfn, "We get %lx instead of %lx!\n", pfn, mfn); + WARN(pfn != mfn, "We get %lx instead of %lx!\n", pfn, mfn); - printk(KERN_INFO "va is 0x%lx\n", data); + printk(KERN_INFO "va is 0x%p\n", data); -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel 
 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |