[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Xen 4.4.0 Remus Is Not Working



Hi xen-users,

I've recently installed two Dom0 Xen 4.4.0 (xnode1 and xnode2). PV domU migrates between them without any problem. But remus doesn't work. Every time I'm starting remus at first it looks good, but domU never renames it's temporarily "*-incoming" name on receiving dom0 (xnode2). While remus is syncing domU state I'm destroying domU at xnode1 and domU at xnode2 dies also.

Here is an example output of remus:

xnode1 ~ # xl -vvvvv remus tr1 xnode2
Password:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/185)
libxl: debug: libxl.c:709:libxl_domain_remus_start: ao 0x17cdb60: create: how=(nil) callback=(nil) poller=0x17cd880 libxl: debug: libxl_dom.c:1244:libxl__toolstack_save: domain=7 toolstack data size=8 Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/185)
 Savefile contains xl domain config
libxl: debug: libxl.c:736:libxl_domain_remus_start: ao 0x17cdb60: inprogress: poller=0x17cd880, flags=i
libxl-save-helper: debug: starting save: Success
xc: detail: xc_domain_save: starting save of domid 7
xc: detail: Had 0 unexplained entries in p2m table
xc: Saving memory: iter 0 (last sent 0 skipped 0): 4096/65536 6%xc: progress: Reloading memory pages: 4096/65536 6% xc: Saving memory: iter 0 (last sent 0 skipped 0): 8212/65536 12%xc: progress: Reloading memory pages: 8192/65536 12% xc: Saving memory: iter 0 (last sent 0 skipped 0): 11284/65536 17%xc: progress: Reloading memory pages: 11264/65536 17% xc: Saving memory: iter 0 (last sent 0 skipped 0): 15380/65536 23%xc: progress: Reloading memory pages: 15360/65536 23% xc: Saving memory: iter 0 (last sent 0 skipped 0): 18452/65536 28%xc: progress: Reloading memory pages: 18432/65536 28% xc: Saving memory: iter 0 (last sent 0 skipped 0): 22548/65536 34%xc: progress: Reloading memory pages: 22528/65536 34% xc: Saving memory: iter 0 (last sent 0 skipped 0): 25620/65536 39%xc: progress: Reloading memory pages: 25600/65536 39% xc: Saving memory: iter 0 (last sent 0 skipped 0): 29716/65536 45%xc: progress: Reloading memory pages: 29696/65536 45% xc: Saving memory: iter 0 (last sent 0 skipped 0): 32788/65536 50%xc: progress: Reloading memory pages: 32768/65536 50% xc: Saving memory: iter 0 (last sent 0 skipped 0): 36884/65536 56%xc: progress: Reloading memory pages: 36864/65536 56% xc: Saving memory: iter 0 (last sent 0 skipped 0): 40980/65536 62%xc: progress: Reloading memory pages: 40960/65536 62% xc: Saving memory: iter 0 (last sent 0 skipped 0): 44052/65536 67%xc: progress: Reloading memory pages: 44032/65536 67% xc: Saving memory: iter 0 (last sent 0 skipped 0): 48148/65536 73%xc: progress: Reloading memory pages: 48128/65536 73% xc: Saving memory: iter 0 (last sent 0 skipped 0): 51220/65536 78%xc: progress: Reloading memory pages: 51200/65536 78% xc: Saving memory: iter 0 (last sent 0 skipped 0): 55316/65536 84%xc: progress: Reloading memory pages: 55296/65536 84% xc: Saving memory: iter 0 (last sent 0 skipped 0): 58393/65536 89%xc: progress: Reloading memory pages: 58368/65536 89% xc: Saving memory: iter 0 (last sent 0 skipped 0): 62522/65536 95%xc: progress: Reloading memory pages: 62464/65536 95%
xc: Saving memory: iter 0 (last sent 0 skipped 0): 65536/65536  100%
xc: detail: delta 22754ms, dom0 18%, target 0%, sent 94Mb/s, dirtied 0Mb/s 84 pages xc: Saving memory: iter 1 (last sent 65454 skipped 82): 65536/65536 100% xc: detail: delta 31ms, dom0 12%, target 0%, sent 88Mb/s, dirtied 0Mb/s 0 pages
xc: Saving memory: iter 2 (last sent 84 skipped 0): 65536/65536  100%
xc: detail: Start last iteration
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend
xc: progress: Reloading memory pages: 65538/65536  100%
libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback: guest has suspended
xc: detail: SUSPEND shinfo 0007daf8
xc: detail: delta 202ms, dom0 15%, target 0%, sent 0Mb/s, dirtied 25Mb/s 160 pages
xc: Saving memory: iter 3 (last sent 0 skipped 0): 65536/65536  100%
xc: detail: delta 2ms, dom0 0%, target 0%, sent 2621Mb/s, dirtied 2621Mb/s 160 pages
xc: detail: Total pages sent= 65698 (1.00x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback: guest has suspended
xc: detail: SUSPEND shinfo 0007daf8
xc: detail: delta 201ms, dom0 0%, target 0%, sent 0Mb/s, dirtied 26Mb/s 160 pages
xc: Saving memory: iter 4 (last sent 160 skipped 0): 65536/65536  100%
xc: detail: delta 3ms, dom0 0%, target 0%, sent 1900Mb/s, dirtied 1900Mb/s 174 pages
xc: detail: Total pages sent= 65872 (1.01x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
........[skipped]............

-------------------------------------------------------------------------------
xnode1 ~ # xl list
Name ID Mem VCPUs State Time(s) Domain-0 0 8191 4 r----- 200.8 tr1 3 256 2 ---ss- 1.1
-------------------------------------------------------------------------------
xnode2 ~ # xl list
Name ID Mem VCPUs State Time(s) Domain-0 0 8192 4 r----- 728.9 tr1--incoming 11 256 0 --p--- 0.0
-------------------------------------------------------------------------------

then I'm destroying domU at xnode1:
node1 ~ # xl destroy tr1

........[continue of xl remus output at xnode1]............
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend
xc: error: rdexact failed (select returned 0): Internal error
xc: error: Error when reading batch size (110 = Connection timed out): Internal error xc: error: error when buffering batch, finishing (110 = Connection timed out): Internal error libxl: error: libxl_create.c:940:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable libxl: error: libxl_create.c:1022:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: error: libxl.c:1384:libxl__destroy_domid: non-existant domain 14
libxl: error: libxl.c:1348:domain_destroy_callback: unable to destroy guest with domid 14 libxl: error: libxl_create.c:1320:domcreate_destruction_cb: unable to destroy domain 14 following failed creation
migration target: Domain creation failed (code -3).
libxl: error: libxl_dom.c:1151:libxl__domain_suspend_common_callback: guest did not suspend
xc: error: Suspend request failed: Internal error
xc: error: Domain appears not to have suspended: Internal error
xc: detail: Save exit of domid 7 with rc=1
libxl-save-helper: debug: complete r=1: Invalid argument
libxl: error: libxl_dom.c:1406:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Invalid argument libxl: debug: libxl_event.c:1591:libxl__ao_complete: ao 0x17cdb60: complete, rc=-3 libxl: debug: libxl_event.c:1563:libxl__ao__destroy: ao 0x17cdb60: destroy
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
libxl: debug: libxl.c:433:libxl_domain_resume: ao 0x17cdb60: create: how=(nil) callback=(nil) poller=0x17cd880 xc: error: Could not get domain info (3 = No such process): Internal error libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 7: No such process libxl: debug: libxl_event.c:1591:libxl__ao_complete: ao 0x17cdb60: complete, rc=-3 libxl: debug: libxl.c:436:libxl_domain_resume: ao 0x17cdb60: inprogress: poller=0x17cd880, flags=ic libxl: debug: libxl_event.c:1563:libxl__ao__destroy: ao 0x17cdb60: destroy
xc: debug: hypercall buffer: total allocations:298 total releases:298
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:265 misses:2 toobig:31
xnode1 ~ #

And domU dies on both hypervisors. No logs at xnode2.

domU configuration file:
-------------------------------------------------------------------------------
kernel = "/etc/xen/kernels/kernel-3.14.14-gentoo-domU"
memory = 256
name   = "tr1"
disk   = ['drbd:r1,xvda1,w']
root   = "/dev/xvda1 ro"
vcpus=2
-------------------------------------------------------------------------------

I'm currently using 3.14.14 kernel for both dom0 anf domU. Can anybody more experienced help me to make remus work or at least give a hint what steps I can make to debug it deeper?

Thank you,
Konstantin

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.