[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Dom0 OOM, page allocation failure


  • To: xen-users@xxxxxxxxxxxxx
  • From: mailing lists <thelists@xxxxxxxxx>
  • Date: Tue, 26 Feb 2013 12:14:01 -0500
  • Delivery-date: Tue, 26 Feb 2013 17:15:22 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

Hello,

I'm running into some trouble with what appear on the surface to be OOM issues in Dom0, but I'm not seeing any other evidence.  This typically happens during periods of high I/O, and has occurred during RAID initial sync, and mkfs.ext4ing (as a test, no intention to keep ext4 on this array).  I've found some older posts citing very similar circumstances, however they all seem to be resolved with an updated kernel in the 2.6 tree.  There was one post on a Linode board which made reference to the issue being resolved for only 32-bit Dom0s.  Otherwise, I'm fairly stuck.  Information below regarding HW / SW of my configuration, as well as the lines from dmesg from the start of the issue until current.

Also, I have two very similar systems with the exact same software (but differing hardware), however the problem only occurs for me on the machine below.  The biggest difference between the two machines is that one that fails (Box A) has 10 x 2TB drives, while the one that did not has only 6 x 750GB drives.  Finally, without the Xen loaded (booting directly to kernel 3.7.8+), I do not see the errors.

Any insight would be appreciated!

The setup;
Hardware:
  Processor: Intel Xeon E5-1650
  RAM: 56GB total (memtests clean)
  System drive: 32GB SSD
  -- 4GB swap on the SSD
  Storage drives: 10 x 2TB drives in a software RAID-6 array (mdadm --create /dev/md0 -l6 -n10 -x0 /dev/sd[bcdefghijk])

Software:
  OS: Centos 6.3, 64-bit
  Kernel: 3.7.8
  Xen: 4.2.1
  


Grub boot line:
  title CentOS (3.7.8+)
          root (hd0,0)
          kernel /xen-4.2.1.gz dummy=dummy dom0_mem=4096M noreboot
          module /vmlinuz-3.7.8+ dummy=dummy nopat root=/dev/mapper/vg_xxxx-lv_root nomodeset rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_xxxx/lv_swap   SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_xxxx/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb showopts console=tty0
          module /initrd-3.7.8+.img

Memory status immediately following issue:
  free -m
               total       used       free     shared    buffers     cached
  Mem:          3115       2420        695          0         34         29
  -/+ buffers/cache:       2356        758
  Swap:         4095         24       4071


Error logs:
kernel: md127_raid6: page allocation failure: order:0, mode:0x200000
kernel: Pid: 623, comm: md127_raid6 Not tainted 3.7.8+ #7
kernel: Call Trace:
kernel: [<ffffffff81117a93>] warn_alloc_failed+0xf3/0x140
kernel: [<ffffffff8111aa90>] __alloc_pages_slowpath+0x4b0/0x7b0
kernel: [<ffffffff8111aa90>] ? __alloc_pages_slowpath+0x4b0/0x7b0
kernel: [<ffffffff8111afaa>] __alloc_pages_nodemask+0x21a/0x230
kernel: [<ffffffff8111afaa>] ? __alloc_pages_nodemask+0x21a/0x230
kernel: [<ffffffff8115c9d4>] kmem_getpages+0x64/0x190
kernel: [<ffffffff8115d787>] fallback_alloc+0x197/0x260
kernel: [<ffffffff8115d52a>] ____cache_alloc_node+0x9a/0x160
kernel: [<ffffffff8115defb>] kmem_cache_alloc+0x12b/0x230
kernel: [<ffffffff81150907>] ? dma_pool_alloc+0xc7/0xf0
kernel: [<ffffffffa018bea4>] ioat2_alloc_ring_ent+0x64/0xc0 [ioatdma]
kernel: [<ffffffffa018c043>] reshape_ring+0x143/0x350 [ioatdma]
kernel: [<ffffffffa018c337>] ioat2_check_space_lock+0xe7/0x220 [ioatdma]
kernel: [<ffffffffa018c4d1>] ioat2_dma_prep_memcpy_lock+0x61/0x270 [ioatdma]
kernel: [<ffffffffa002720b>] async_memcpy+0x20b/0x2c4 [async_memcpy]
kernel: [<ffffffffa00e34dd>] async_copy_data+0x9d/0x150 [raid456]
kernel: [<ffffffffa00e3728>] ops_run_biodrain+0x198/0x1d0 [raid456]
kernel: [<ffffffffa00e4688>] __raid_run_ops+0x4e8/0x660 [raid456]
kernel: [<ffffffffa00e81a9>] ? ops_run_io+0x29/0x740 [raid456]
kernel: [<ffffffffa00e1685>] ? handle_stripe_dirtying+0x335/0x450 [raid456]
kernel: [<ffffffffa00e91c3>] handle_stripe+0x903/0xec0 [raid456]
kernel: [<ffffffff81044298>] ? pvclock_clocksource_read+0x58/0xd0
kernel: [<ffffffffa00e9afa>] handle_active_stripes+0x19a/0x280 [raid456]
kernel: [<ffffffffa00e9dd8>] raid5d+0x1f8/0x350 [raid456]
kernel: [<ffffffff8100a350>] ? xen_clocksource_read+0x20/0x30
kernel: [<ffffffff8156d23e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
kernel: [<ffffffff8144e5f7>] md_thread+0x117/0x150
kernel: [<ffffffff810789e0>] ? wake_up_bit+0x40/0x40
kernel: [<ffffffff8144e4e0>] ? md_rdev_init+0x110/0x110
kernel: [<ffffffff8107828e>] kthread+0xce/0xe0
kernel: [<ffffffff8100382e>] ? xen_end_context_switch+0x1e/0x30
kernel: [<ffffffff810781c0>] ? kthread_freezable_should_stop+0x70/0x70
kernel: [<ffffffff81575cec>] ret_from_fork+0x7c/0xb0
kernel: [<ffffffff810781c0>] ? kthread_freezable_should_stop+0x70/0x70
kernel: Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU    0: hi:    0, btch:   1 usd:   0
kernel: CPU    1: hi:    0, btch:   1 usd:   0
kernel: CPU    2: hi:    0, btch:   1 usd:   0
kernel: CPU    3: hi:    0, btch:   1 usd:   0
kernel: CPU    4: hi:    0, btch:   1 usd:   0
kernel: CPU    5: hi:    0, btch:   1 usd:   0
kernel: CPU    6: hi:    0, btch:   1 usd:   0
kernel: CPU    7: hi:    0, btch:   1 usd:   0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU    0: hi:  186, btch:  31 usd:   0
kernel: CPU    1: hi:  186, btch:  31 usd:   0
kernel: CPU    2: hi:  186, btch:  31 usd:  13
kernel: CPU    3: hi:  186, btch:  31 usd:   0
kernel: CPU    4: hi:  186, btch:  31 usd:  30
kernel: CPU    5: hi:  186, btch:  31 usd:   0
kernel: CPU    6: hi:  186, btch:  31 usd:   0
kernel: CPU    7: hi:  186, btch:  31 usd:   0
kernel: Node 0 Normal per-cpu:
kernel: CPU    0: hi:  186, btch:  31 usd:   0
kernel: CPU    1: hi:  186, btch:  31 usd:   0
kernel: CPU    2: hi:  186, btch:  31 usd:   0
kernel: CPU    3: hi:  186, btch:  31 usd:   0
kernel: CPU    4: hi:  186, btch:  31 usd:  30
kernel: CPU    5: hi:  186, btch:  31 usd:   0
kernel: CPU    6: hi:  186, btch:  31 usd:   0
kernel: CPU    7: hi:  186, btch:  31 usd:   0
kernel: active_anon:26829 inactive_anon:8320 isolated_anon:0
kernel: active_file:17836 inactive_file:422075 isolated_file:0
kernel: unevictable:1 dirty:3843 writeback:84236 unstable:0
kernel: free:64174 slab_reclaimable:14968 slab_unreclaimable:35363
kernel: mapped:3896 shmem:77 pagetables:2034 bounce:0
kernel: free_cma:0
kernel: Node 0 DMA free:15836kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15612kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 1948 56488 56488
kernel: Node 0 DMA32 free:218864kB min:1048kB low:1308kB high:1572kB active_anon:97264kB inactive_anon:32512kB active_file:29592kB inactive_file:1492496kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1995492kB mlocked:0kB dirty:13532kB writeback:142000kB mapped:40kB shmem:0kB slab_reclaimable:22544kB slab_unreclaimable:22616kB kernel_stack:16kB pagetables:1128kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 54540 54540
kernel: Node 0 Normal free:21996kB min:29364kB low:36704kB high:44044kB active_anon:10052kB inactive_anon:768kB active_file:41752kB inactive_file:195804kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:55848960kB mlocked:4kB dirty:1840kB writeback:194944kB mapped:15544kB shmem:308kB slab_reclaimable:37328kB slab_unreclaimable:118836kB kernel_stack:2040kB pagetables:7008kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15836kB
kernel: Node 0 DMA32: 2474*4kB 1928*8kB 1171*16kB 597*32kB 415*64kB 181*128kB 100*256kB 61*512kB 30*1024kB 9*2048kB 0*4096kB = 218872kB
kernel: Node 0 Normal: 3589*4kB 608*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 23220kB
kernel: 444756 total pagecache pages
kernel: 4704 pages in swap cache
kernel: Swap cache stats: add 130487, delete 125783, find 902/1215
kernel: Free swap  = 3684144kB
kernel: Total swap = 4194300kB
kernel: 14680048 pages RAM
kernel: 13882525 pages reserved
kernel: 1271965 pages shared
kernel: 282031 pages non-shared
kernel: SLAB: Unable to allocate memory on node 0 (gfp=0x0)
kernel:  cache: ioat2, object size: 128, order: 0
kernel:  node 0: slabs: 6672/6672, objs: 200160/200160, free: 0
kernel: md127_raid6: page allocation failure: order:0, mode:0x0
kernel: Pid: 623, comm: md127_raid6 Not tainted 3.7.8+ #7
kernel: Call Trace:
kernel: [<ffffffff81117a93>] warn_alloc_failed+0xf3/0x140
kernel: [<ffffffff81084ea3>] ? __wake_up+0x53/0x70
kernel: [<ffffffff8111aa90>] __alloc_pages_slowpath+0x4b0/0x7b0
kernel: [<ffffffff8111afaa>] __alloc_pages_nodemask+0x21a/0x230
kernel: [<ffffffff811571b6>] alloc_pages_current+0xb6/0x120
kernel: [<ffffffff8111778e>] __get_free_pages+0xe/0x50
kernel: [<ffffffff8130c4e1>] xen_swiotlb_alloc_coherent+0x51/0x180
kernel: [<ffffffff8115dc53>] ? kmem_cache_alloc_trace+0xb3/0x230
kernel: [<ffffffff81150735>] pool_alloc_page+0xc5/0x1d0
kernel: [<ffffffff811508ba>] dma_pool_alloc+0x7a/0xf0
kernel: [<ffffffffa018be7a>] ioat2_alloc_ring_ent+0x3a/0xc0 [ioatdma]
kernel: [<ffffffffa018c043>] reshape_ring+0x143/0x350 [ioatdma]
kernel: [<ffffffffa018c337>] ioat2_check_space_lock+0xe7/0x220 [ioatdma]
kernel: [<ffffffff8156dcfa>] ? error_exit+0x2a/0x60
kernel: [<ffffffffa018c4d1>] ioat2_dma_prep_memcpy_lock+0x61/0x270 [ioatdma]
kernel: [<ffffffffa002720b>] async_memcpy+0x20b/0x2c4 [async_memcpy]
kernel: [<ffffffffa00e34dd>] async_copy_data+0x9d/0x150 [raid456]
kernel: [<ffffffffa00e3728>] ops_run_biodrain+0x198/0x1d0 [raid456]
kernel: [<ffffffffa00e4688>] __raid_run_ops+0x4e8/0x660 [raid456]
kernel: [<ffffffffa00e81a9>] ? ops_run_io+0x29/0x740 [raid456]
kernel: [<ffffffffa00e11a8>] ? schedule_reconstruction+0x68/0x210 [raid456]
kernel: [<ffffffff81044298>] ? pvclock_clocksource_read+0x58/0xd0
kernel: [<ffffffffa00e1685>] ? handle_stripe_dirtying+0x335/0x450 [raid456]
kernel: [<ffffffffa00e91c3>] handle_stripe+0x903/0xec0 [raid456]
kernel: [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
kernel: [<ffffffff81044298>] ? pvclock_clocksource_read+0x58/0xd0
kernel: [<ffffffffa00e9afa>] handle_active_stripes+0x19a/0x280 [raid456]
kernel: [<ffffffffa00e9dd8>] raid5d+0x1f8/0x350 [raid456]
kernel: [<ffffffff8100a350>] ? xen_clocksource_read+0x20/0x30
kernel: [<ffffffff8156d23e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
kernel: [<ffffffff8144e5f7>] md_thread+0x117/0x150
kernel: [<ffffffff810789e0>] ? wake_up_bit+0x40/0x40
kernel: [<ffffffff8144e4e0>] ? md_rdev_init+0x110/0x110
kernel: [<ffffffff8107828e>] kthread+0xce/0xe0
kernel: [<ffffffff8100382e>] ? xen_end_context_switch+0x1e/0x30
kernel: [<ffffffff810781c0>] ? kthread_freezable_should_stop+0x70/0x70
kernel: [<ffffffff81575cec>] ret_from_fork+0x7c/0xb0
kernel: [<ffffffff810781c0>] ? kthread_freezable_should_stop+0x70/0x70
kernel: Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU    0: hi:    0, btch:   1 usd:   0
kernel: CPU    1: hi:    0, btch:   1 usd:   0
kernel: CPU    2: hi:    0, btch:   1 usd:   0
kernel: CPU    3: hi:    0, btch:   1 usd:   0
kernel: CPU    4: hi:    0, btch:   1 usd:   0
kernel: CPU    5: hi:    0, btch:   1 usd:   0
kernel: CPU    6: hi:    0, btch:   1 usd:   0
kernel: CPU    7: hi:    0, btch:   1 usd:   0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU    0: hi:  186, btch:  31 usd:  65
kernel: CPU    1: hi:  186, btch:  31 usd:   0
kernel: CPU    2: hi:  186, btch:  31 usd: 170
kernel: CPU    3: hi:  186, btch:  31 usd:   1
kernel: CPU    4: hi:  186, btch:  31 usd:  65
kernel: CPU    5: hi:  186, btch:  31 usd:  82
kernel: CPU    6: hi:  186, btch:  31 usd:  34
kernel: CPU    7: hi:  186, btch:  31 usd:  54
kernel: Node 0 Normal per-cpu:
kernel: CPU    0: hi:  186, btch:  31 usd:  30
kernel: CPU    1: hi:  186, btch:  31 usd:   0
kernel: CPU    2: hi:  186, btch:  31 usd: 160
kernel: CPU    3: hi:  186, btch:  31 usd:  22
kernel: CPU    4: hi:  186, btch:  31 usd:  55
kernel: CPU    5: hi:  186, btch:  31 usd:  57
kernel: CPU    6: hi:  186, btch:  31 usd:  15
kernel: CPU    7: hi:  186, btch:  31 usd:  15
kernel: active_anon:6981 inactive_anon:2555 isolated_anon:0
kernel: active_file:17876 inactive_file:94621 isolated_file:0
kernel: unevictable:1 dirty:38 writeback:94542 unstable:0
kernel: free:64120 slab_reclaimable:7901 slab_unreclaimable:279561
kernel: mapped:3873 shmem:57 pagetables:2028 bounce:0
kernel: free_cma:0
kernel: Node 0 DMA free:15836kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15612kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 1948 56488 56488
kernel: Node 0 DMA32 free:218900kB min:1048kB low:1308kB high:1572kB active_anon:27660kB inactive_anon:9832kB active_file:29652kB inactive_file:243504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1995492kB mlocked:0kB dirty:152kB writeback:243344kB mapped:212kB shmem:172kB slab_reclaimable:9528kB slab_unreclaimable:939408kB kernel_stack:32kB pagetables:1148kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:50103 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 54540 54540
kernel: Node 0 Normal free:21744kB min:29364kB low:36704kB high:44044kB active_anon:264kB inactive_anon:388kB active_file:41852kB inactive_file:134980kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:55848960kB mlocked:4kB dirty:0kB writeback:134824kB mapped:15280kB shmem:56kB slab_reclaimable:22076kB slab_unreclaimable:178836kB kernel_stack:2032kB pagetables:6964kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:289419 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15836kB
kernel: Node 0 DMA32: 3280*4kB 2835*8kB 1386*16kB 551*32kB 315*64kB 183*128kB 105*256kB 64*512kB 37*1024kB 1*2048kB 0*4096kB = 218776kB
kernel: Node 0 Normal: 4468*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 21744kB
kernel: 114447 total pagecache pages
kernel: 1877 pages in swap cache
kernel: Swap cache stats: add 158328, delete 156451, find 3052/3981
kernel: Free swap  = 3593000kB
kernel: Total swap = 4194300kB
kernel: 14680048 pages RAM
kernel: 13882525 pages reserved
kernel: 943705 pages shared
kernel: 608751 pages non-shared
init: tty (/dev/tty2) main process ended, respawning
init: tty (/dev/tty2) main process ended, respawning

Thanks,
Bill

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.