[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [BUG] Kernel BUG in page_alloc.c (mismatched start and end zone) using xl generated e820 map
Hi, We're hitting the kernel BUG below in one of our VMs running on Xen 4.4 and Linux kernel 3.13.0. We use the xl toolstack and are using PCI pass-through to pass network cards and a disk controller. It happens on a variety of our hardware but not all servers and it seems to be related to the e820 map passed by xl. The problem occurs when we put the server under heavy load - the 'dd' command at the top of the stack trace seems to be sufficient to cause the problem if run a few times. We didn't get a problem with previous versions of Xen (we were using 4.2.2) but at that time we were using xend and as I understand it the RAM map provided to the guest is fabricated rather than based upon the real hardware map. root@server1:/home/user0# DD_PERF="$(dd if=/dev/zero of=/data/zeros bs=1M \ count=4096 2>&1 | tail -n 1 | cut -d ',' -f '2 3' ; rm -f /data/zeros)" [ 814.365651] ------------[ cut here ]------------ [ 814.365668] kernel BUG at /build/ci/git/build/Kernel/kernel-trusty-domu/work/ubuntu-precise/mm/page_alloc.c:955! [ 814.365675] invalid opcode: 0000 [#1] SMP [ 814.365681] Modules linked in: drbd lru_cache libcrc32c xen_blkback xen_netback xt_addrtype xt_multiport xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_tcpudp xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_NFLOG nfnetlink_log nfnetlink ipt_ULOG ip6table_filter ip6_tables iptable_filter ip_tables x_tables x86_pkg_temp_thermal dm_multipath coretemp crct10dif_pclmul scsi_dh crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd tmem xenfs xen_privcmd zfs(POF) zunicode(POF) zcommon(POF) znvpair(POF) spl(OF) zavl(POF) dm_mirror dm_region_hash dm_log raid0 multipath linear dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid10 xor igb i2c_algo_bit dca ahci raid6_pq libahci ptp pps_core aufs [ 814.365772] CPU: 0 PID: 9772 Comm: dd Tainted: PF O 3.13.0-34-trusty-domu #60~precise1 [ 814.365779] task: ffff88005d022fc0 ti: ffff880007a22000 task.ti: ffff880007a22000 [ 814.365786] RIP: e030:[<ffffffff81145f84>] [<ffffffff81145f84>] move_freepages+0x104/0x110 [ 814.365799] RSP: e02b:ffff880007a23698 EFLAGS: 00010006 [ 814.365803] RAX: ffff88010a24f000 RBX: 0000000000000000 RCX: 0000000000000001 [ 814.365808] RDX: ffffea000428ffc0 RSI: ffffea0004288000 RDI: ffff88010a24ff00 [ 814.365812] RBP: ffff880007a236a0 R08: ffff88010a24ff00 R09: 0000000000000000 [ 814.365817] R10: 0000000000000000 R11: ffffea00042880a0 R12: ffffea0004288080 [ 814.365821] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000014 [ 814.365833] FS: 00007fed6b790740(0000) GS:ffff880109800000(0000) knlGS:ffff88001f800000 [ 814.365838] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 814.365843] CR2: 00007f8ebd683ab0 CR3: 00000000581af000 CR4: 0000000000002660 [ 814.365848] Stack: [ 814.365851] ffffffff81146003 ffff880007a23718 ffffffff811478eb 0000000000017614 [ 814.365859] ffffffff81009ebd ffff88010a24ff88 ffffffff00000000 ffffea00042880a0 [ 814.365866] ffff88010a24ff00 0000000200000000 0000000000000000 0000000000000002 [ 814.365874] Call Trace: [ 814.365880] [<ffffffff81146003>] ? move_freepages_block+0x73/0x80 [ 814.365887] [<ffffffff811478eb>] __rmqueue+0x39b/0x4a0 [ 814.365896] [<ffffffff81009ebd>] ? xen_force_evtchn_callback+0xd/0x10 [ 814.365902] [<ffffffff81149e5c>] get_page_from_freelist+0x68c/0x930 [ 814.365909] [<ffffffff8114a27b>] __alloc_pages_nodemask+0x17b/0xb60 [ 814.365915] [<ffffffff8100a742>] ? check_events+0x12/0x20 [ 814.365923] [<ffffffff811e17fe>] ? __find_get_block+0xbe/0x230 [ 814.365932] [<ffffffff8115ecc9>] ? zone_statistics+0x89/0xa0 [ 814.365939] [<ffffffff81188983>] alloc_pages_current+0xa3/0x160 [ 814.365946] [<ffffffff811913a5>] new_slab+0x295/0x320 [ 814.365954] [<ffffffff8169a9b7>] __slab_alloc+0x2a8/0x459 [ 814.365960] [<ffffffff811e0d11>] ? alloc_buffer_head+0x21/0x70 [ 814.365968] [<ffffffff81277f0d>] ? jbd2_journal_dirty_metadata+0xcd/0x2d0 [ 814.365975] [<ffffffff81193213>] kmem_cache_alloc+0x183/0x1d0 [ 814.365982] [<ffffffff811e0d11>] alloc_buffer_head+0x21/0x70 [ 814.365990] [<ffffffff811a3406>] ? __mem_cgroup_commit_charge+0x156/0x3d0 [ 814.365996] [<ffffffff811e100a>] alloc_page_buffers+0x3a/0xc0 [ 814.366002] [<ffffffff811e1f2e>] create_empty_buffers+0x1e/0xd0 [ 814.366009] [<ffffffff811e2027>] create_page_buffers+0x47/0x50 [ 814.366016] [<ffffffff811e3081>] __block_write_begin+0x71/0x430 [ 814.366022] [<ffffffff81276723>] ? jbd2__journal_start+0xf3/0x1e0 [ 814.366030] [<ffffffff81230430>] ? __ext4_get_inode_loc+0x3e0/0x3e0 [ 814.366037] [<ffffffff81235dbc>] ? ext4_da_write_begin+0xec/0x2e0 [ 814.366044] [<ffffffff8125dfe9>] ? __ext4_journal_start_sb+0x69/0xe0 [ 814.366050] [<ffffffff81235dfe>] ext4_da_write_begin+0x12e/0x2e0 [ 814.366057] [<ffffffff8123684a>] ? ext4_da_write_end+0xba/0x250 [ 814.366065] [<ffffffff81140d68>] generic_file_buffered_write+0xf8/0x250 [ 814.366073] [<ffffffff81142421>] __generic_file_aio_write+0x1c1/0x3d0 [ 814.366078] [<ffffffff81142688>] generic_file_aio_write+0x58/0xa0 [ 814.366084] [<ffffffff8122be59>] ext4_file_write+0x99/0x400 [ 814.366092] [<ffffffff81097f74>] ? arch_vtime_task_switch+0x94/0xa0 [ 814.366101] [<ffffffff816b044e>] ? xen_hypervisor_callback+0x1e/0x30 [ 814.366108] [<ffffffff81009ef0>] ? xen_clocksource_read+0x20/0x30 [ 814.366115] [<ffffffff811ae43a>] do_sync_write+0x5a/0x90 [ 814.366120] [<ffffffff811aebc4>] vfs_write+0xb4/0x1f0 [ 814.366126] [<ffffffff811af5f9>] SyS_write+0x49/0xa0 [ 814.366132] [<ffffffff816aebff>] tracesys+0xe1/0xe6 [ 814.366136] Code: de 41 d3 e6 4c 89 66 20 4d 89 48 08 4d 63 c6 4c 89 56 10 44 01 f0 49 c1 e0 06 4c 01 c6 48 39 f2 73 96 5b 41 5c 41 5d 41 5e 5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 b8 00 00 [ 814.366191] RIP [<ffffffff81145f84>] move_freepages+0x104/0x110 [ 814.366197] RSP <ffff880007a23698> [ 814.366205] ---[ end trace cbb29943cef93713 ]--- We've annotated the code in page_alloc.c with some debug as shown below together with the log output it produces when the BUG is hit. It seems to happen when move_freepages is called with a page at the top of RAM spanning the end of usable RAM. ----- Code from page_alloc.c with debug output #ifndef CONFIG_HOLES_IN_ZONE /* * page_zone is not safe to call in this context when * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant * anyway as we check zone boundaries in move_freepages_block(). * Remove at a later date when no bug reports exist related to * grouping pages by mobility */ struct zone *zs, *ze; if (page_zone(start_page) != page_zone(end_page)) { zs = page_zone(start_page); ze = page_zone(end_page); printk(KERN_ERR "Input Zone = %s\n", zone->name); printk(KERN_ERR "Input Zone Start PFN = %lx\n", zone->zone_start_pfn); printk(KERN_ERR "Input Zone End PFN = %lx\n", zone_end_pfn(zone)); printk(KERN_ERR "Start Zone = %s\n", zs->name); printk(KERN_ERR "Start PFN = %lx\n", page_to_pfn(start_page)); printk(KERN_ERR "End Zone = %s\n", ze->name); printk(KERN_ERR "End PFN = %lx\n", page_to_pfn(end_page)); } /* BUG_ON(page_zone(start_page) != page_zone(end_page)); */ ----- Debug output when the BUG is hit May 29 23:04:14 server1 kernel: [ 1212.185507] Input Zone Start PFN = 100000 May 29 23:04:14 server1 kernel: [ 1212.185511] Input Zone End PFN = 118000 May 29 23:04:14 server1 kernel: [ 1212.185514] Start Zone = Normal May 29 23:04:14 server1 kernel: [ 1212.185516] Start PFN =10a200 May 29 23:04:14 server1 kernel: [ 1212.185519] End Zone = DMA May 29 23:04:14 server1 kernel: [ 1212.185522] End PFN = 10a3ff Output from dmesg is included below, showing the e820 map provided by xl. If we tweak the e820 sanitize code in libxl_x86.c to align the end of usable RAM with a 2MB (512 page) boundary everything seems fine but I'm not sure this is a good solution. Hope someone can help us to understand the problem and a better solution. [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.13.0-34-trusty-domu (root@zdev-ci-1) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #60~precise1 SMP Fri May 29 00:48:02 BST 2015 (Ubuntu 3.13.0-34.60~precise1-trusty-domu 3.13.11.4) [ 0.000000] Command line: root=/dev/zvol/diskvm/67ec09dd-a0ed-4c51-8b75-cc08efea62fa/bin/1 ro xencons=tty console=tty1 console=hvc0 iommu=soft libata.fua=1 boot=zfs-z rpool=diskvm bootvol=67ec09dd-a0ed-4c51-8b75-cc08efea62fa/bin/1 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] ACPI in unprivileged domain disabled [ 0.000000] Freeing 75dac-80000 pfn range: 41556 pages freed [ 0.000000] Released 41556 pages of unused memory [ 0.000000] Set 565844 page(s) to 1-1 mapping [ 0.000000] Populating 100000-10a254 pfn range: 41556 pages added [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x0000000075dabfff] usable [ 0.000000] Xen: [mem 0x0000000075dac000-0x0000000075dbdfff] ACPI data [ 0.000000] Xen: [mem 0x0000000075dde000-0x000000008fffffff] reserved [ 0.000000] Xen: [mem 0x00000000beffe000-0x00000000beffefff] reserved [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000feefffff] reserved [ 0.000000] Xen: [mem 0x00000000ff800000-0x00000000ffffffff] reserved [ 0.000000] Xen: [mem 0x0000000100000000-0x000000010a253fff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] e820: last_pfn = 0x10a254 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0x75dac max_arch_pfn = 0x400000000 [ 0.000000] Scanning 1 areas for low memory corruption [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff] [ 0.000000] init_memory_mapping: [mem 0x10a000000-0x10a1fffff] [ 0.000000] init_memory_mapping: [mem 0x108000000-0x109ffffff] [ 0.000000] init_memory_mapping: [mem 0x100000000-0x107ffffff] [ 0.000000] init_memory_mapping: [mem 0x00100000-0x75dabfff] [ 0.000000] init_memory_mapping: [mem 0x10a200000-0x10a253fff] [ 0.000000] RAMDISK: [mem 0x023dd000-0x05416fff] [ 0.000000] NUMA turned off [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000010a253fff] [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x10a253fff] [ 0.000000] NODE_DATA [mem 0x10a24f000-0x10a253fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal [mem 0x100000000-0x10a253fff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x0009ffff] [ 0.000000] node 0: [mem 0x00100000-0x75dabfff] [ 0.000000] node 0: [mem 0x100000000-0x10a253fff] [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs [ 0.000000] e820: [mem 0xbefff000-0xfebfffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on Xen [ 0.000000] Xen version: 4.4.3-pre (preserve-AD) [ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:2 nr_node_ids:1 [ 0.000000] PERCPU: Embedded 29 pages/cpu @ffff880109800000 s86080 r8192 d24512 u1048576 [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 515977 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: root=/dev/zvol/diskvm/67ec09dd-a0ed-4c51-8b75-cc08efea62fa/bin/1 ro xencons=tty console=tty1 console=hvc0 iommu=soft libata.fua=1 boot=zfs-z rpool=diskvm bootvol=67ec09dd-a0ed-4c51-8b75-cc08efea62fa/bin/1 [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] software IO TLB [mem 0x103400000-0x107400000] (64MB) mapped at [ffff880103400000-ffff8801073fffff] [ 0.000000] Memory: 1921448K/2096764K available (6860K kernel code, 1077K rwdata, 3200K rodata, 1288K init, 1416K bss, 175316K reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled. [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=2. [ 0.000000] Offload RCU callbacks from all CPUs [ 0.000000] Offload RCU callbacks from CPUs: 0-1. [ 0.000000] NR_IRQS:16640 nr_irqs:288 16 Best wishes, Simon Zynstra is a private limited company registered in England and Wales (registered number 07864369). Our registered office and Headquarters are at The Innovation Centre, Broad Quay, Bath, BA1 1UD. This email, its contents and any attachments are confidential. If you have received this message in error please delete it from your system and advise the sender immediately. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |