[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Migration - Still Issues?
All, I am currently running a xen-2.0-testing snapshot from April 20. I'm having sporadic problems with migration. I have two xen machines, 10.130.2.35 and 10.130.2.36, booting from a read-only, iso image loopback iscsi target from a third machine. I'm using the Cisco iscsi-initiator and iscsi-init module for the boot. The iscsi has been solid so far. The scsi target ends up mounted to /dev/sda in Dom 0 on both the machines. I then use that same read-only mount and, as the following xenU config file shows, gets exported up to /dev/hda when a xenU gets created: kernel = "/boot/kernel-2.6.11-xen-2.0.5-domU" ramdisk = "/boot/initrd" memory = 64 name = "test" vif = [ 'mac=00:55:4F:44:00:01' ] disk = [ 'phy:sda,hda,r' ] dhcp="dhcp" root = "/dev/ram0 ro init=/linuxrc cdroot" Everything boots just fine. The "test" xenU runs flawlessly; I can ssh into it, run whatever. No problems there. And it's surprisingly fast over iscsi, even though I've only got 100 Mbit Ethernet adapters. BUT... I've been migrating between the machines, both live and non-live, with mixed success. Sometimes, I'd say every 1 in 10 migrations, I get the errors posted in the attached xfrd.log files. The .1 file is the source of the migration and the .2 is the destination. The other 9 of 10 times, it migrates just fine. I don't seem to get these problems when I do not export /dev/sda to a domU. For example, if I use just a simple domU (using the same kernel) with no mounts and an initrd file system, I don't have these problems. I saw mailing list messages a while back dealing with migration and the possibility of a crash under heavy network load. Further, I saw a patch that had been applied: <QUOTE> [PATCH] stream fixes for migration I've attached a patch for libxutil/libxc. This fixes one of the hangs = I've seen during migrations. It applies against 2.0 and 2.0-testing. Changes: * Encountering EOF or error when xfrd reads from stream could cause an = infinite loop. * Cleaned up the closing of streams. * Fixed several memory leaks. Signed-off-by: Charles Coffing <ccoffing@xxxxxxxxxx> </QUOTE> The version of 2.0-testing I'm using has this patch applied. But the comments in this patch imply that there are still more "hangs" during migration. Have a stumbled on another one of these? I believe this patch fixed a previous problem, I would get a looping hang under 2.0.5 stable; I haven't seen that after going to 2.0-testing. Am I making incorrect assumptions that I can read-only mount an iscsi target twice? Or could hardware be a factor? For testing, I'm just running cheap-o VIA Rhine 100-TX controllers. I thought I would post this before shelling out for some Intel gig nics and gig switches though. Thank you very much for your help. -James Henderson 2605 [INF] XFRD> Accepted connection from 127.0.0.1:1145 on 2 2759 [INF] XFRD> Xfr service for 127.0.0.1:1145 [DEBUG] Conn_init> flags=1 [DEBUG] Conn_init> write stream... [DEBUG] stream_init>mode=w flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_init> read stream... [DEBUG] stream_init>mode=r flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_sxpr> (xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0 [DEBUG] Conn_sxpr> (xfr.migrate 5 "(domain (id 5) (name test) (memory 63) (maxmem 65536) (state -b---) (cpu 0) (cpu_time 0.137634952) (up_time 15.0879249573) (start_time 1114545755.39) (console (status listening) (id 11) (domain 5) (local_port 11) (remote_port 1) (console_port 9605)) (devices (vif (idx 0) (vif 0) (mac 00:55:4f:44:00:01)(vifname vif5.0) (evtchn 12 3) (index 0)) (vbd (idx 0) (vdev 768) (device 2048)(mode r) (dev hda) (uname phy:sda) (node sda) (index 0))) (config (vm (name test) (memory 64) (image (linux (kernel /boot/kernel-2.6.11-xen-2.0.5-domU) (ramdisk /boot/initrd) (ip :1.2.3.4::::eth0:dhcp) (root '/dev/ram0 ro init=/linuxrc cdroot'))) (device (vbd (uname phy:sda) (dev hda) (mode r))) (device (vif (mac 00:55:4F:44:00:01))))))" 10.130.2.36 8002 1 0)[DEBUG] Conn_sxpr< err=0 [DEBUG] Conn_connect> addr=10.130.2.36:8002 [DEBUG] Conn_init> flags=1 [DEBUG] Conn_init> write stream... [DEBUG] stream_init>mode=w flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_init> read stream... [DEBUG] stream_init>mode=r flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_sxpr> (xfr.err 0)[DEBUG] Conn_sxpr< err=0 [1114545770.483473] xc_linux_save start 5 xc_linux_save start 5 [1114545770.485161] Saving memory pages: iter 1 0% Saving memory pages: iter 1 0%FNI 189 : [1000000c,1020] pte=00be4063, mfn=00000be4, pfn=ffffffff [mfn]=deadbeef 6% 12% 18% 25% 31% 38% 44% 50% 56% 63% 69% 75% 82% 88% 95% 1: sent 16165, skipped 219, 1: sent 16165, skipped 219, delta 6695ms, dom0 21%, target 73%, sent 79Mb/s, dirtied 1Mb/s 260 pages [1114545777.180435] Saving memory pages: iter 2 0% 2: sent 242, skipped 12, 2 0% 2: sent 242, skipped 12, delta 102ms, dom0 20%, target 79%, sent 77Mb/s, dirtied 3Mb/s 12 pages [1114545777.283396] Saving memory pages: iter 3 0% 3: sent 0, skipped 12, r 3 0% 3: sent 0, skipped 12, [DEBUG] Conn_sxpr> (xfr.err 22)[DEBUG] Conn_sxpr< err=0 Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Retry suspend domain (0) Unable to suspend domain. (0) Unable to suspend domain. (0) Domain appears not to have suspended: 0 Domain appears not to have suspended: 0 2759 [WRN] XFRD> Transfer errors: 2759 [WRN] XFRD> state=XFR_STATE err=1 2759 [INF] XFRD> Xfr service err=1 2515 [INF] XFRD> Accepted connection from 10.130.2.35:4227 on 2 2656 [INF] XFRD> Xfr service for 10.130.2.35:4227 [DEBUG] Conn_init> flags=1 [DEBUG] Conn_init> write stream... [DEBUG] stream_init>mode=w flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_init> read stream... [DEBUG] stream_init>mode=r flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_sxpr> (xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0 [DEBUG] Conn_sxpr> (xfr.xfr 5)[DEBUG] Conn_sxpr< err=0 [1114545766.260913] xc_linux_restore start xc_linux_restore start [1114545766.265957] Created domain 5 Created domain 5 (Domain-0 Domain-5)'domain id=5 name=test memory=64 console=9605 image=/boot/kernel-2.6.11-xen-2.0.5-domU'[1114545766.340293] Reloading memory pages: 0% Reloading memory pages: 6% 12% 18% 25% 31% 37% 43% 50% 56% 62% 68% 75% 81% 87% 93% 98% 98%Error when reading from state file Error when reading from state file 2656 [INF] XFRD> Xfr service err=1 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |