[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] OOM problems
On machines running many HVM (stubdom-based) domains, I often see errors like this: [77176.524094] qemu-dm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0 [77176.524102] Pid: 7478, comm: qemu-dm Not tainted 2.6.32.25-g80f7e08 #2 [77176.524109] Call Trace: [77176.524123] [<ffffffff810897fd>] ? T.413+0xcd/0x290 [77176.524129] [<ffffffff81089ad3>] ? __out_of_memory+0x113/0x180 [77176.524133] [<ffffffff81089b9e>] ? out_of_memory+0x5e/0xc0 [77176.524140] [<ffffffff8108d1cb>] ? __alloc_pages_nodemask+0x69b/0x6b0 [77176.524144] [<ffffffff8108d1f2>] ? __get_free_pages+0x12/0x60 [77176.524152] [<ffffffff810c94e7>] ? __pollwait+0xb7/0x110 [77176.524161] [<ffffffff81262b93>] ? n_tty_poll+0x183/0x1d0 [77176.524165] [<ffffffff8125ea42>] ? tty_poll+0x92/0xa0 [77176.524169] [<ffffffff810c8a92>] ? do_select+0x362/0x670 [77176.524173] [<ffffffff810c9430>] ? __pollwait+0x0/0x110 [77176.524178] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524183] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524188] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524193] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524197] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524202] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524207] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524212] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524217] [<ffffffff810c9540>] ? pollwake+0x0/0x60 [77176.524222] [<ffffffff810c8fb5>] ? core_sys_select+0x215/0x350 [77176.524231] [<ffffffff810100af>] ? xen_restore_fl_direct_end+0x0/0x1 [77176.524236] [<ffffffff8100c48d>] ? xen_mc_flush+0x8d/0x1b0 [77176.524243] [<ffffffff81014ffb>] ? xen_hypervisor_callback+0x1b/0x20 [77176.524251] [<ffffffff814b0f5a>] ? error_exit+0x2a/0x60 [77176.524255] [<ffffffff8101485d>] ? retint_restore_args+0x5/0x6 [77176.524263] [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0 [77176.524268] [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0 [77176.524276] [<ffffffff810663d1>] ? ktime_get_ts+0x61/0xd0 [77176.524281] [<ffffffff810c9354>] ? sys_select+0x44/0x120 [77176.524286] [<ffffffff81013f02>] ? system_call_fastpath+0x16/0x1b [77176.524290] Mem-Info: [77176.524293] DMA per-cpu: [77176.524296] CPU 0: hi: 0, btch: 1 usd: 0 [77176.524300] CPU 1: hi: 0, btch: 1 usd: 0 [77176.524303] CPU 2: hi: 0, btch: 1 usd: 0 [77176.524306] CPU 3: hi: 0, btch: 1 usd: 0 [77176.524310] CPU 4: hi: 0, btch: 1 usd: 0 [77176.524313] CPU 5: hi: 0, btch: 1 usd: 0 [77176.524316] CPU 6: hi: 0, btch: 1 usd: 0 [77176.524318] CPU 7: hi: 0, btch: 1 usd: 0 [77176.524322] CPU 8: hi: 0, btch: 1 usd: 0 [77176.524324] CPU 9: hi: 0, btch: 1 usd: 0 [77176.524327] CPU 10: hi: 0, btch: 1 usd: 0 [77176.524330] CPU 11: hi: 0, btch: 1 usd: 0 [77176.524333] CPU 12: hi: 0, btch: 1 usd: 0 [77176.524336] CPU 13: hi: 0, btch: 1 usd: 0 [77176.524339] CPU 14: hi: 0, btch: 1 usd: 0 [77176.524342] CPU 15: hi: 0, btch: 1 usd: 0 [77176.524345] CPU 16: hi: 0, btch: 1 usd: 0 [77176.524348] CPU 17: hi: 0, btch: 1 usd: 0 [77176.524351] CPU 18: hi: 0, btch: 1 usd: 0 [77176.524354] CPU 19: hi: 0, btch: 1 usd: 0 [77176.524358] CPU 20: hi: 0, btch: 1 usd: 0 [77176.524364] CPU 21: hi: 0, btch: 1 usd: 0 [77176.524367] CPU 22: hi: 0, btch: 1 usd: 0 [77176.524370] CPU 23: hi: 0, btch: 1 usd: 0 [77176.524372] DMA32 per-cpu: [77176.524374] CPU 0: hi: 186, btch: 31 usd: 81 [77176.524377] CPU 1: hi: 186, btch: 31 usd: 66 [77176.524380] CPU 2: hi: 186, btch: 31 usd: 49 [77176.524385] CPU 3: hi: 186, btch: 31 usd: 67 [77176.524387] CPU 4: hi: 186, btch: 31 usd: 93 [77176.524390] CPU 5: hi: 186, btch: 31 usd: 73 [77176.524393] CPU 6: hi: 186, btch: 31 usd: 50 [77176.524396] CPU 7: hi: 186, btch: 31 usd: 79 [77176.524399] CPU 8: hi: 186, btch: 31 usd: 21 [77176.524402] CPU 9: hi: 186, btch: 31 usd: 38 [77176.524406] CPU 10: hi: 186, btch: 31 usd: 0 [77176.524409] CPU 11: hi: 186, btch: 31 usd: 75 [77176.524412] CPU 12: hi: 186, btch: 31 usd: 1 [77176.524414] CPU 13: hi: 186, btch: 31 usd: 4 [77176.524417] CPU 14: hi: 186, btch: 31 usd: 9 [77176.524420] CPU 15: hi: 186, btch: 31 usd: 0 [77176.524423] CPU 16: hi: 186, btch: 31 usd: 56 [77176.524426] CPU 17: hi: 186, btch: 31 usd: 35 [77176.524429] CPU 18: hi: 186, btch: 31 usd: 32 [77176.524432] CPU 19: hi: 186, btch: 31 usd: 39 [77176.524435] CPU 20: hi: 186, btch: 31 usd: 24 [77176.524438] CPU 21: hi: 186, btch: 31 usd: 0 [77176.524441] CPU 22: hi: 186, btch: 31 usd: 35 [77176.524444] CPU 23: hi: 186, btch: 31 usd: 51 [77176.524447] Normal per-cpu: [77176.524449] CPU 0: hi: 186, btch: 31 usd: 29 [77176.524453] CPU 1: hi: 186, btch: 31 usd: 1 [77176.524456] CPU 2: hi: 186, btch: 31 usd: 30 [77176.524459] CPU 3: hi: 186, btch: 31 usd: 30 [77176.524463] CPU 4: hi: 186, btch: 31 usd: 30 [77176.524466] CPU 5: hi: 186, btch: 31 usd: 31 [77176.524469] CPU 6: hi: 186, btch: 31 usd: 0 [77176.524471] CPU 7: hi: 186, btch: 31 usd: 0 [77176.524474] CPU 8: hi: 186, btch: 31 usd: 30 [77176.524477] CPU 9: hi: 186, btch: 31 usd: 28 [77176.524480] CPU 10: hi: 186, btch: 31 usd: 0 [77176.524483] CPU 11: hi: 186, btch: 31 usd: 30 [77176.524486] CPU 12: hi: 186, btch: 31 usd: 0 [77176.524489] CPU 13: hi: 186, btch: 31 usd: 0 [77176.524492] CPU 14: hi: 186, btch: 31 usd: 0 [77176.524495] CPU 15: hi: 186, btch: 31 usd: 0 [77176.524498] CPU 16: hi: 186, btch: 31 usd: 0 [77176.524501] CPU 17: hi: 186, btch: 31 usd: 0 [77176.524504] CPU 18: hi: 186, btch: 31 usd: 0 [77176.524507] CPU 19: hi: 186, btch: 31 usd: 0 [77176.524510] CPU 20: hi: 186, btch: 31 usd: 0 [77176.524513] CPU 21: hi: 186, btch: 31 usd: 0 [77176.524516] CPU 22: hi: 186, btch: 31 usd: 0 [77176.524518] CPU 23: hi: 186, btch: 31 usd: 0 [77176.524524] active_anon:5675 inactive_anon:4676 isolated_anon:0 [77176.524526] active_file:146373 inactive_file:153543 isolated_file:480 [77176.524527] unevictable:0 dirty:167539 writeback:322 unstable:0 [77176.524528] free:5017 slab_reclaimable:15640 slab_unreclaimable:8972 [77176.524529] mapped:1114 shmem:7 pagetables:1908 bounce:0[77176.524536] DMA free:9820kB min:32kB low:40kB high:48kB active_anon:4kB inactive_anon:0kB active_file:616kB inactive_file:2212kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12740kB mlocked:0kB dirty:2292kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:72kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3040 all_unreclaimable? no [77176.524541] lowmem_reserve[]: 0 1428 2452 2452[77176.524551] DMA32 free:7768kB min:3680kB low:4600kB high:5520kB active_anon:22696kB inactive_anon:18704kB active_file:584580kB inactive_file:608508kB unevictable:0kB isolated(anon):0kB isolated(file):1920kB present:1462496kB mlocked:0kB dirty:664128kB writeback:1276kB mapped:4456kB shmem:28kB slab_reclaimable:62076kB slab_unreclaimable:32292kB kernel_stack:5120kB pagetables:7620kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1971808 all_unreclaimable? yes [77176.524556] lowmem_reserve[]: 0 0 1024 1024[77176.524564] Normal free:2480kB min:2636kB low:3292kB high:3952kB active_anon:0kB inactive_anon:0kB active_file:296kB inactive_file:3452kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1048700kB mlocked:0kB dirty:3736kB writeback:12kB mapped:0kB shmem:0kB slab_reclaimable:412kB slab_unreclaimable:3488kB kernel_stack:80kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:8192 all_unreclaimable? yes [77176.524569] lowmem_reserve[]: 0 0 0 0[77176.524574] DMA: 4*4kB 25*8kB 11*16kB 7*32kB 8*64kB 8*128kB 8*256kB 3*512kB 0*1024kB 0*2048kB 1*4096kB = 9832kB [77176.524587] DMA32: 742*4kB 118*8kB 3*16kB 3*32kB 2*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7768kB [77176.524600] Normal: 1*4kB 1*8kB 2*16kB 13*32kB 14*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1612kB [77176.524613] 302308 total pagecache pages [77176.524615] 1619 pages in swap cache [77176.524617] Swap cache stats: add 40686, delete 39067, find 24687/26036 [77176.524619] Free swap = 10141956kB [77176.524621] Total swap = 10239992kB [77176.577607] 793456 pages RAM [77176.577611] 436254 pages reserved [77176.577613] 308627 pages shared [77176.577615] 49249 pages non-shared[77176.577620] Out of memory: kill process 5755 (python2.6) score 110492 or a child [77176.577623] Killed process 5757 (python2.6)Depending on what gets nuked by the OOM-killer, I am frequently left with an unusable system that needs to be rebooted. The machine always has plenty of memory available (1.5 GB devoted to dom0, of which >1 GB is always just in "cached" state). For instance, right now, on this same machine: # free total used free shared buffers cached Mem: 1536512 1493112 43400 0 10284 1144904 -/+ buffers/cache: 337924 1198588 Swap: 10239992 74444 10165548I have seen this OOM problem on a wide range of Xen versions, stretching as far back as I can remember, including the most recent 4.1-unstable and 2.6.32 pvops kernel (from yesterday, tested in the hope that they would fix this). I haven't found a way to reliably reproduce it yet, but I suspect that the problem relates to reasonably heavy disk or network activity -- during this last one, I see that a domain was briefly doing ~200 Mbps of downloads. Anyone have any ideas on what this could be? Is RAM getting spontaneously filled because a buffer somewhere grows too quickly, or something like that? What can I try here? -John _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |