[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Xen system hang or freeze
Some thoughts:0. Do you have the default behavior where the guests independent wallclocks are disabled? 1. I have observed visible performance differences from a VM when %steal goes above 1%. It sounds like you have 8 cores. How many VMs do you have? What are their weights and caps?2. The system default of collecting sar every ten minutes is pretty unhelpful for problem diagnosis. I routinely adjust this to interval to five seconds, which for the expense of a lot of disk space, gives a historical dataset that is useful for forensics. On Apr 21, 2009, at 10:10 AM, Nick Anderson wrote: On Tue, Apr 21, 2009 at 08:30:32AM -0400, Peter Booth wrote:It would be interesting to know whether sar data was captured during this time. From this you could track whether there was any process creation or destruction occurring.I just had another lockup this weekend. Sar (from the host) 12:35:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 12:45:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 12:55:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 01:05:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 01:15:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 Average: all 0.00 0.00 0.00 0.00 0.01 99.98 01:25:53 PM LINUX RESTART 01:35:02 PM CPU %user %nice %system %iowait %steal %idle 01:45:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 01:55:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 02:05:01 PM all 0.00 0.00 0.00 0.00 0.01 99.99 sar -b 11:55:01 AM 12.22 0.90 11.32 12.90 257.89 12:05:01 PM 13.97 0.49 13.48 7.68 331.48 12:15:01 PM 18.88 7.30 11.59 161.74 260.17 12:25:01 PM 14.34 1.10 13.23 16.53 438.73 12:35:01 PM 9.01 0.43 8.58 6.96 208.50 12:45:01 PM 8.47 0.35 8.12 5.23 186.03 12:55:01 PM 10.00 1.09 8.91 19.22 245.17 01:05:01 PM 11.89 1.82 10.06 27.76 279.90 01:15:01 PM 10.06 0.34 9.72 5.23 214.62 Average: 17.55 6.12 11.43 385.87 369.74 01:25:53 PM LINUX RESTART 01:35:02 PM tps rtps wtps bread/s bwrtn/s 01:45:01 PM 19.01 7.19 11.83 113.49 273.91 01:55:01 PM 12.23 2.44 9.79 37.42 239.82 02:05:01 PM 16.89 2.79 14.10 47.93 422.02 02:15:01 PM 17.09 1.92 15.17 26.93 495.01 02:25:01 PM 13.91 3.42 10.49 164.83 282.82 02:35:01 PM 12.47 2.05 10.42 28.45 256.32 02:45:01 PM 13.67 1.81 11.87 31.78 340.39 sar -c 12:45:01 PM 0.02 12:55:01 PM 0.02 01:05:01 PM 0.02 01:15:01 PM 0.02 Average: 0.03 01:25:53 PM LINUX RESTART 01:35:02 PM proc/s 01:45:01 PM 0.02 01:55:01 PM 0.02 sar -q 12:55:01 PM 0 147 0.00 0.00 0.00 01:05:01 PM 0 147 0.07 0.03 0.01 01:15:01 PM 0 147 0.00 0.00 0.00 Average: 0 147 0.00 0.00 0.00 01:25:53 PM LINUX RESTART 01:35:02 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 01:45:01 PM 0 147 0.00 0.00 0.00 01:55:01 PM 0 147 0.00 0.00 0.00 sar -r 01:05:01 PM 7312568 1878856 20.44 175416 66532 1044184 0 0.00 0 01:15:01 PM 7311948 1879476 20.45 175416 66544 1044184 0 0.00 0 Average: 7328126 1863298 20.27 175403 67011 1044184 0 0.00 0 01:25:53 PM LINUX RESTART 01:35:02 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 01:45:01 PM 8620940 570484 6.21 64136 36012 1044184 0 0.00 0 01:55:01 PM 8619824 571600 6.22 64972 36028 1044184 0 0.00 0 02:05:01 PM 8618204 573220 6.24 65800 36040 1044184 0 0.00 0 =============================================================== Now perhaps I have missed something but to me that all looks just fine. I should setup something to log ps. But in my guests I see steal pushed through the roof. And its like that for days ahead time. Ive noticed the steal during the lockups before but either I neglected to look back several days or forgot what I saw. I didnt recall steal being at 100% as far back as my logs go. 12:55:01 PM CPU %user %nice %system %iowait %steal %idle 01:05:01 PM all 0.00 0.00 0.00 0.00 100.00 0.00 01:15:01 PM all 0.00 0.00 0.00 0.00 100.00 0.00 Average: all 0.00 0.00 0.00 0.00 100.00 0.00 01:27:49 PM LINUX RESTART 01:35:01 PM CPU %user %nice %system %iowait %steal %idle 01:45:01 PM all 4.04 0.00 1.80 0.64 0.02 93.50 01:55:01 PM all 4.10 0.00 1.76 0.31 0.02 93.80 02:05:01 PM all 5.45 0.00 2.47 0.23 0.02 91.83 02:15:01 PM all 7.03 0.00 3.22 0.22 0.02 89.51 02:25:01 PM all 4.82 0.00 2.31 0.18 0.01 92.6Might also be worth adding a cron entry to append the output of lsof to a file every N minutes (perhaps with logrotate enabled) to see if you can capture what changed in the running system when this "lockup" occurred?Also worth collecting ps output every minute_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users-- Nick Anderson <nick@xxxxxxxxxxxx> http://www.cmdln.org _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |