[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] VM getting hang
On Sat, Jan 24, 2009 at 11:29 PM, gopikrishnan <gopikrishnan@xxxxxxxxxxxx> wrote: > > From the above result, it appears like everything is normal. Can you give any > suggestions? A "normal" device should not trigger =========== sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device ============ In my setup I got similar cases happened several times because of three problems : (1) the disks were simply busy. For example, when using some hosting appliances they'd use a lot of I/O during startup. Putting several hosting domUs on the same dom0 and starting them all at the same has the effect of making startup takes a loooooong time. When this happens : - "iostsat -x 3" on dom0 during the boot process will show that the disk is busy with high throughput - There's no weird messages on syslog - all you have to do is wait patiently (2) problems on the SAN switches/connections or HW raid controller For example, when your SAN switch is rebooted. This would block all disk I/O for some time, and on some cases can lead to data corruption. When this happens : - "iostsat -x 3" on dom0 (on the time the problem occurs) will show that the disk is busy with very low or no throughput - depending on your setup, you might get "rejecting I/O to offline device" messages (check the CONSOLE to be sure, not just /var/log/messages) - sometimes the problem seems to "fix itself" without you having to do anything (3) broken disks or controller Similar to (2), but this can also happen on local storage. Everything seemed to work correctly, but when accessing certain data it would take a loooong time or failed. This one's hardest to diagnose, but sometimes had the similar symptoms as (2) >From your earlier mail I suspect it was (3). Then again, from "After a few >hours (may be 8-10hrs), all these VPS will come up automatically." it can also be (1). To be sure though, you'll need to have some diagnostics when the problem occured : - how was disk throughput at that time (check with "iostat -x 3" or similar commands) - was there any weird messages on the CONSOLE or on /var/log/messages at that time (depending on the problem, it is possible that error messages were not written to /var/log/messages) - what was domU load at that time. Do all domUs uses 100% CPU? Note that some diagnostics had to be done at the time the probelm occured, not AFTER. Good luck! Regards, Fajar _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |