[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] acpidump crashes on some machines
On Wed, Jun 20, 2012 at 02:37:55PM +0200, Andre Przywara wrote: > Hi, > > we have some problems with acpidump running on Xen Dom0. On 64 bit > Dom0 it will trigger the OOM killer, on 32 bit Dom0s it will cause a > kernel crash. > The hypervisor does not matter, I tried 4.1.3-rc2 as well as various > unstable versions including 25467, also 32-bit versions of 4.1. > The Dom0 kernels were always PVOPS versions, the problems starts > with 3.2-rc1~194 and is still in 3.5.0-rc3. > Also you need to restrict the Dom0 memory with dom0_mem= > The crash says (on a 3.4.3 32bit Dom0 kernel): > uruk:~ # ./acpidump32 > [ 158.843444] ------------[ cut here ]------------ > [ 158.843460] kernel BUG at mm/rmap.c:1027! > [ 158.843466] invalid opcode: 0000 [#1] SMP > [ 158.843472] Modules linked in: > [ 158.843478] > [ 158.843483] Pid: 4874, comm: acpidump32 Tainted: G W > 3.4.0+ #105 empty empty/S3993 > [ 158.843493] EIP: 0061:[<c10b0e27>] EFLAGS: 00010246 CPU: 3 > [ 158.843505] EIP is at __page_set_anon_rmap+0x12/0x45 > [ 158.843511] EAX: d6022dc0 EBX: dfecb6e0 ECX: b76faf64 EDX: b76faf64 > [ 158.843516] ESI: 00000000 EDI: b76faf64 EBP: d6091e8c ESP: d6091e84 > [ 158.843522] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > [ 158.843529] CR0: 8005003b CR2: b76faf64 CR3: 17633000 CR4: 00000660 > [ 158.843535] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 158.843581] DR6: ffff0ff0 DR7: 00000400 > [ 158.843586] Process acpidump32 (pid: 4874, ti=d6090000 > task=d60b34f0 task.ti=d6090000) > [ 158.843591] Stack: > [ 158.843594] dfecb6e0 00000001 d6091ea8 c10b15c4 00000000 > d6022dc0 d61fbdd8 d6022dc0 > [ 158.843610] 00000000 d6091efc c10aacbe 00000000 99948025 > 80000001 d8aa1f80 80000001 > [ 158.843631] dfefc800 00000000 d8aa1f80 00000000 166b7025 > d7f407d0 b76faf64 99948025 > [ 158.843649] Call Trace: > [ 158.843656] [<c10b15c4>] do_page_add_anon_rmap+0x5b/0x64 > [ 158.843664] [<c10aacbe>] handle_pte_fault+0x81d/0xa06 > [ 158.843674] [<c10ab0ff>] handle_mm_fault+0x1fa/0x209 > [ 158.843683] [<c159e4e8>] ? spurious_fault+0x104/0x104 > [ 158.843688] [<c159e881>] do_page_fault+0x399/0x3b4 > [ 158.843696] [<c10c639d>] ? filp_close+0x55/0x5f > [ 158.843701] [<c10c6408>] ? sys_close+0x61/0xa0 > [ 158.843706] [<c159e4e8>] ? spurious_fault+0x104/0x104 > [ 158.843714] [<c159c452>] error_code+0x5a/0x60 > [ 158.843720] [<c159e4e8>] ? spurious_fault+0x104/0x104 > [ 158.843724] Code: e8 45 91 00 00 89 c2 eb 09 2b 50 04 c1 ea 0c 03 > 50 4c 89 53 08 5b 5e 5d c3 55 89 e5 56 53 89 c3 89 d0 89 ca 8b 70 44 > 85 f6 75 02 <0f> 0b f6 43 04 01 75 27 83 7d 08 00 75 02 8b 36 46 89 > 73 04 f6 > [ 158.843824] EIP: [<c10b0e27>] __page_set_anon_rmap+0x12/0x45 > SS:ESP 0069:d6091e84 > [ 158.843848] ---[ end trace 4eaa2a86a8e2da24 ]--- > [ 158.843854] note: acpidump32[4874] exited with preempt_count 1 > > > On 64bit the OOM goes around, finally killing the login shell: > uruk:~ # ./acpidump_inst > acpi_map_memory(917504, 131072); > opened /dev/mem (fd=3) > calling mmap(NULL, 131072, PROT_READ, MAP_PRIVATE, fd, e0000); > mmap returned 0xf7571000, function returns 0xf7571000 > acpi_map_table(cfef0f64, "XSDT"); > acpi_map_memory(3488550756, 36); > opened /dev/mem (fd=3) > calling mmap(NULL, 3976, PROT_READ, MAP_PRIVATE, fd, cfef0000); > mmap returned 0xf76fd000, function returns 0xf76fdf64 > having mapped table header > reading signature: > > Welcome to SUSE Linux Enterprise Server 11 SP1 (i586) - Kernel > 3.5.0-rc3+ (hvc0). > > uruk login: > ----------- > This dump shows that the bug happens the moment acpidump accesses > the mmapped ACPI table at @cfef0000 (the lower map at e0000 works). What is the e0000 one? I don't see in your E820 the region being reserved? > > This is extra unfortunate as in SLES11 acpidump will be called by > the kbd init script (querying the BIOS NumLock setting!) Ah. Is the acpidump somewhere easily available to compile? Should I get it from here: http://www.lesswatts.org/projects/acpi/utilities.php > > I bisected the Dom0 kernel to find this one (v3.2-rc~194): > commit 5eef150c1d7e41baaefd00dd56c153debcd86aee > Merge: 315eb8a f3f436e > Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Date: Tue Oct 25 09:17:07 2011 +0200 > > Merge branch 'stable/e820-3.2' of > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen > > * 'stable/e820-3.2' of > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Oh boy. v3.2 .. that is eons ago! :-) > xen: release all pages within 1-1 p2m mappings > xen: allow extra memory to be in multiple regions > xen: allow balloon driver to use more than one memory region > xen/balloon: simplify test for the end of usable RAM > xen/balloon: account for pages released during memory setup > > > I tried to find something obvious, but to no avail. At least the new > E820 looks sane, nothing that would prevent the mapping of the > requested regions. Reverting this commit will not work easily on > newer kernels, also is probably not desirable. The one thing that comes to my mind is the 1-1 mapping having some issues. Can you boot the kernel with 'debug loglevel=8'. That should print something like this: Setting pfn cfef0->cfef7 to 1-1 or such during bootup. > > But it does not show on every machine here, so the machine E820 > could actually be a differentiator. This particular box was a dual > socket Barcelona server with 12GB of memory. > > This whole PV memory management goes beyond my knowledge, so I'd > like to ask for help on this issue. > If you need more information (I attached the boot log, which shows > the two E820 tables), please ask. I can also quickly do some > experiments if needed. This is strange one - the P2M code should fetch the MFN (so it should give you cfef0) whenever anybody asks for that. Lets double-check that. Can you try this little module? [not compile tested] #include <linux/module.h> #include <linux/kthread.h> #include <linux/pagemap.h> #include <linux/init.h> #include <xen/xen.h> #define ACPITEST "0.1" MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>"); MODULE_DESCRIPTION("acpitest"); MODULE_LICENSE("GPL"); MODULE_VERSION(ACPITEST); static int __init acpitest_init(void) { unsigned int pfn = 0xcfef0; unsigned int mfn; void *data; mfn = pfn_to_mfn(pfn); WARN_ON(pfn != mfn, "We get %lx instead of %lx!\n", pfn, mfn); if (pfn != mfn) { printk(KERN_INFO "raw p2m (%lx) gives us: %lx\n", pfn, get_phys_to_machine(pfn)); return -EINVAL; } data = mfn_to_virt(mfn); printk(KERN_INFO "va is 0x%lx\n", data); print_hex_dump_bytes("acpi:", DUMP_PREFIX_OFFSET, data, PAGE_SIZE); return 0; } static void __exit acpitest_exit(void) { } module_init(acpitest_init); module_exit(acpitest_exit); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |