[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen system hang or freeze



It would be interesting to know whether sar data was captured during this time. From this you could track whether there was any process creation or destruction occurring.

Might also be worth adding a cron entry to append the output of lsof to a file every N minutes (perhaps with logrotate enabled) to see if you can capture what changed in the running system when this "lockup" occurred?

 Also worth collecting ps output every minute 
On Apr 3, 2009, at 11:59 AM, Paraic Gallagher wrote:



2009/4/3 Nick Anderson <nick@xxxxxxxxxxxx>
On Fri, Apr 03, 2009 at 03:56:28PM +0100, Paraic Gallagher wrote:
> I am running xen 3.0.3, with CentOS 5.2 based Dom0
> (kernel-xen-2.6.18-92.1.22.el5)
> Recently I have noticed some complete system lockups on a few different
> servers. Neither Dom0 or any of the guests respond to pings, connecting a
> keyboard and monitor to the system only shows a blank screen. Nothing is
> written to logs at time of lockup.

I have seen similar issues with one of my servers. I have yet to nail
down the issue.

Specs:
Distro: Debian Etch
Kernel: 2.6.18-6-xen-amd64
CPU: 2x Quad-Core AMD Opteron(tm) Processor 2350
Memory: 16G
Disk: 3ware 9650LE with 8 drive Raid6
Xen: 3.2 (from debian repo)

All vms are LVM backed. Not running any HVM guests.
 
Thanks for the response. After searching net for few weeks with no luck
in finding similar issues was beginning to think I was going crazy!

Just with some further details.
I have seen the issue on two types of servers Dell PE 1950, and 2950
2x Quad core Intel Xeon E5410@xxxxxxx
Memory 4G and 16G
Disk, PERC 6/i 1.11, 2x250 Raid1, ST3250620NS Rev: 3BKT

All vms are LVM backed on this system except for Dom0.

For a while I was seeing softlockup on cpu scrolling on the console
and thought that may have caused it. Unfortunatly after updating the
kernel the errors went away and I have had another lockup since then.

Ive found a fairly set pattern though no time periods to predict.

A VM typically goes unresponsive first. If left unchecked for long
enough the host will lock. If caught in time I have had limited
success running xm destroy on the domU. Most of the time running xm
destroy on the domU causes the host to lock immediately requiring a
hard reboot.

The most recent lockup was a bit different that what I had in the
past.

The domU locked up (no output on domU console). xm destroy locked
dom0. I rebooted with a remote power strip. dom0 and all domUs came
back up. Nothing in logs as usual. 10 minutes later dom0 was locked
again. I drove to the datacenter and about 30-45 minutes after the
lock the machine became responsive again (according to monitoring
server) I was able to display a website running on a vm. Then the
machine went unresponsive again. Not responding to physical console
access either. Another hard reboot and things are ok.

That was the first time I had ever had so many lockups so close
together. Typically the lockups seem to be 1-2 weeks apart.

I have even tried setting up netconsole on dom0 to try to catch kernel
errors with no success.

This seems to be quite a similar problem from the description, however I haven't
noticed the guest vms locking up prior to Dom0. Something to keep an eye on.

Are you running a particular load on the system at the time or is it somewhat
idle? Seems to be idle in my case before lockup.

rgds,
Paraic.
 


--
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.