[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] nothing but problems with ocfs2 on sles11
I've been having terrible problems with ocfs2 getting corrupted. (Of course this is after I said on this list a couple months ago that I've been using it for a while without issues!) I have two sets of SLES11 servers, that each share their own ocfs2 volume. I started having problems with the original set of servers and opened a ticket with Novell. They wanted me to completely update the systems. Since they were running critical VMs I didn't feel comfortable doing that, so I installed two more servers with their own oscf2 volume. These two I then patched completely. Unfortunately, these two servers starting exhibiting their own corruption problems. Just copying my virtual disk files and running some VMs would cause the ocfs2 to get corrupted. Right now it's at a point where I can't even fix it with fsck.ocfs2. I'm told the ticket has been escalated to the ocfs2 devs. Earlier this week I had a problem with one of the original servers and had to hard restart it. This is a problem I've always had with xen after it runs for a long time, sometimes it will have memory allocation issues, can't start VMs, etc. Worse yet, there's no way to restart it nicely, because VMs will not shut down and you can't get on the console or ssh to shut down the server nicely. Only option (that I know of) is to hard reset the box. Of course this can have side affects. In this case everything came back up ok, but I could see there was corruption. I asked Novell and they said I should unmount the volumes, run fsck.ocfs2 and make sure it's clean, then restart everything. This was on Monday, and since my critical machines were up and running, I couldn't afford to have them down right then. So, 2AM this morning I decided was a good time to down these systems, run the fsck and then get them back up. I thought this would be fairly simple, take 30-60 mins, and get things stable for a while longer while we work on the ocfs2 issue with Novell. Unfortunately, after running fsck.ocfs2 and making sure it was clean, my VMs would not all come back up. I could get 4 or 5 of them up, but not the rest. After unhealthy and very stressful investigation I found that the ocfs2 volume is going read only. I'm waiting for a call back from Novell right now. It seems that once my ocfs2 volume gets corrupted there's no way to fix it or make it stable again. Our storage is on a Xiotech Magnitude 4000 3D. Each xen server is assigned the same vdisk that is used for ocfs2. We use file based disks for our VMs. Performance wise this does the job for us. It makes them very easy to move around, copy for new VMs, etc. What other options should I look at besides ocfs2? I also have a call in to our xiotech admin to create me a new disk that I can assign directly to my server (one for each) so I can copy my VMs and get them up and running. Just in case Novell is not able to get a resolution for me. I'm confident that the VMs will be stable once they are running on "local" storage. Sorry this got so long, but I don't think I can take much more stress around the stability of my xen servers. I've also looked at XenServer, which seems to be really stable and has nice features, but you also lose a lot of portability. Hard for me to explain, but on sles/xen it's incredibly easy to create sles VMs. It's also nice to be able to mount disk files if needed, copy them, etc. If anyone gets this far into the message I'd appreciate any suggestions. Thanks a lot, James _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |