[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
On Wed, Jun 27, 2012 at 9:46 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote: Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process"): This is normal. You are suspending every 100ms. So, when you see ---ss-, you just ended up doing "xl list" right when the guest was suspended. :)
do a xl top and you would see the guest's state oscillate from --b-- to --s-- depending on the checkpoint interval. Or do xl list multiple times. After I killed the remus That is strange.. xl remus has literally no networking support on the remus front. So, it shouldnt affect anything in the guest. In fact I repeated your test on my box , where the guest was continuously pinging a host . Pings continued to work. so did ssh. At the start, the guest prints this on its console: With "-b" option the second argument (localhost|dummy) is ignored. Did you try the command without the -b option, i.e. xl remus -vvv -e domU localhost
But I was partially able to reproduce some of your test results without your patches (i.e. on xen-unstable baseline). See end of mail for more details.
Ah that explains the qemu related calls. My Guest config: (from tests on 32bit PV domU w/ suspend event channel support) kernel = "/home/kernels/vmlinuz-2.6.32.2-xenu" memory = 1024 name = "xltest2" vcpus = 2 vif = [ 'mac=00:16:3e:00:00:01,bridge=eth0' ]
disk = [ 'phy:/dev/drbd1,xvda1,w'] hostname= "rshriram-vm3" root = "/dev/xvda1 ro" extra = "console=xvc0 3" >
on_reboot = 'destroy' on_crash = 'coredump-destroy' NB: This guest kernel has suspend-event-channel support which is available in all suse-kernels I suppose. If you would
just like to use mine, the source tarball (2.6.32.2 version + kernel config)
I also tested with 64-bit 3.3.0 PV kernel (w/o suspend-event channel support) guest config: kernel = "/home/kernels/vmlinuz-3.3.0-rc1-xenu"
memory = 1024 name = "xl-ubuntu-pv64" vcpus = 2 vif = [ 'mac=00:16:3e:00:00:03, bridge=eth0' ] disk = [ 'phy:/dev/vgdrbd/ubuntu-pv64,xvda1,w' ]
hostname= "rshriram-vm1" root = "/dev/xvda1 ro" extra = "console=hvc0 3" With xen-unstable baseline, Test 1. Blackhole replication
command: nohup xl remus -vvv -e -b -i 100 xl-ubuntu-pv64 dummy >blackhole.log 2>&1 & result: works (networking included) debug output: libxl: debug: libxl_dom.c:687:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:691:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:738:libxl__domain_suspend_common_callback: guest acknowledged suspend request
libxl: debug: libxl_dom.c:742:libxl__domain_suspend_common_callback: wait for the guest to suspend libxl: debug: libxl_dom.c:754:libxl__domain_suspend_common_callback: guest has suspended caveat: killing remus doesnt do a proper cleanup i.e if you killed it while the domain was suspended, it leaves it in the suspended state (where libxl waits for guest to suspend)
Its a pain. In xend/python version, I added a handler (SIGUSR1) , so that one could do pkill -USR1 -f remus and gracefully exit remus, without wedging the domU.
* I do not know if adding signal handlers is frowned upon in the xl land :) If there is some protocol in place to handle such things, I would be happy to send a patch that ensures that the guest is "resumed" while doing blackhole replication
Test 2. Localhost replication w/ failover by destroying primary VM command: nohup xl remus -vvv -b -i 100 xl-ubuntu-pv64 localhost >blackhole.log 2>&1 & result: works (networking included)
_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |