[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] remus failure -xen 4.0.1: xc_restore failed only at some heavy workload


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Kyungjin Yoo <athleta@xxxxxxx>
  • Date: Tue, 14 Sep 2010 12:05:13 -0400
  • Delivery-date: Thu, 23 Sep 2010 05:26:13 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I have done some experiments with remus and had some problems with its failover.

I set up dormO, and dormU like below and backup server is setup as same as primary.

Ubuntu 9.10
Xen 4.0.1-rc2
kernel for dorm0 : 2.6.32.18
kernel for dormU : 2.6.18.8

with idle guest running on dorm0, I run remus on primary server, and destroy guest or remus,
remus failover works and guest from primary server moves to backup server.

but for some workload experiment, I run specweb or kernel compile on the guest and primary server runs remus.
when the guest is destroyed or remus is killed, it doesn't survive at backup server even though it is checkpointing before. there was 'p' state of guest at backup server while checkpointing, but it's disappeared.

Error in xend.log at backup server shows this message.

----

[XXXX-XX-XX 13:56:50 6038] ERROR (XendCheckpoint:357) /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 309, in restore
    forkHelper(cmd, fd, handler.handler, True)
  File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 411, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
[XXXX-XX-XX 13:56:50 6038] ERROR (XendDomain:1175) Restore failed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/xen/xend/XendDomain.py", line 1159, in domain_restore_fd
    dominfo = XendCheckpoint.restore(self, fd, paused=paused, relocating=relocating)
  File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 358, in restore
    raise exn
XendError: /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
 
----

it looks quite same with previous question from Shriram Rajagopalan
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00369.html

and this error seems appeared in xen live migration in the past, since remus shares functions with live migration, and error showed at xen live migration function.

anyone has previous similar experience either with remus or xen live migration?
anyone found any reason or solution for this?

I will appreciate it if anyone can help with this.

Thank you.
Kyungjin.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.