[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
At 07/18/2014 07:38 PM, Wen Congyang Wrote: > Virtual machine (VM) replication is a well known technique for providing > application-agnostic software-implemented hardware fault tolerance - > "non-stop service". Currently, remus provides this function, but it buffers > all output packets, and the latency is unacceptable. > > In xen summit 2012, We introduce a new VM replication solution: colo > (COarse-grain LOck-stepping virtual machine). The presentation is in > the following URL: > http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service > > Here is the summary of the solution: >>From the client's point of view, as long as the client observes identical > responses from the primary and secondary VMs, according to the service > semantics, then the secondary vm is a valid replica of the primary > vm, and can successfully take over when a hardware failure of the > primary vm is detected. > > This patchset is RFC, and implements the frame of colo: > 1. Both primary vm and secondary vm are running > 2. do checkoint > > This patchset is based on remus-v15, and use migration v1. Only supports hvm > guest now. > > TODO list: > 1. rebase to remus-v17 or newer > 2. support migration v2 > 3. nic/disk replication > 4. support pvm > > Patch 1-3: bugfix > Patch 4-6: temporarily update remus to reuse remus device codes > Patch 7-14: update some APIs which will be used by colo > Patch 15-22: colo related codes > Patch 23: Hack patch, just for test > Patch 24-25: bugfix. We find this bug before rebasing colo to newest xen. > But we don't trigger this bug now. > Patch 26: A patch for qemu-xen I also put the codes in github: https://github.com/wencongyang/xen/tree/colo > > Hong Tao (1): > copy the correct page to memory > > Wen Congyang (24): > csum the correct page > don't zero out ioreq page > don't touch remus in remus_device > rename remus device to checkpoint device > adjust the indentation > Refactor domain_suspend_callback_common() > Update libxl__domain_resume() for colo > Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo > Introduce a new internal API libxl__domain_unpause() > Update libxl__domain_unpause() to support qemu-xen > support to resume uncooperative HVM guests > update datecopier to support sending data only > introduce a new API to aync read data from fd > Update libxl_save_msgs_gen.pl to support return data from xl to xc > Allow slave sends data to master > secondary vm suspend/resume/checkpoint code > primary vm suspend/get_dirty_pfn/resume/checkpoint code > xc_domain_save: flush cache before calling callbacks->postcopy() in > colo mode > COLO: xc related codes > send store mfn and console mfn to xl before resuming secondary vm > implement the cmdline for COLO > HACK: do checkpoint per 20ms > fix vm entry fail > sync mmu before resuming secondary vm > > docs/man/xl.pod.1 | 9 +- > tools/libxc/xc_domain.c | 9 + > tools/libxc/xc_domain_restore.c | 74 +- > tools/libxc/xc_domain_save.c | 66 +- > tools/libxc/xc_resume.c | 20 +- > tools/libxc/xenctrl.h | 2 + > tools/libxc/xenguest.h | 40 + > tools/libxl/Makefile | 3 +- > tools/libxl/libxl.c | 102 ++- > tools/libxl/libxl.h | 3 +- > tools/libxl/libxl_aoutils.c | 81 +- > ...xl_remus_device.c => libxl_checkpoint_device.c} | 266 ++++--- > tools/libxl/libxl_colo.h | 48 ++ > tools/libxl/libxl_colo_restore.c | 882 > +++++++++++++++++++++ > tools/libxl/libxl_colo_save.c | 602 ++++++++++++++ > tools/libxl/libxl_create.c | 131 ++- > tools/libxl/libxl_dom.c | 424 ++++++---- > tools/libxl/libxl_internal.h | 262 ++++-- > tools/libxl/libxl_netbuffer.c | 85 +- > tools/libxl/libxl_nonetbuffer.c | 14 +- > tools/libxl/libxl_qmp.c | 10 + > tools/libxl/libxl_remus_disk_drbd.c | 54 +- > tools/libxl/libxl_save_callout.c | 37 +- > tools/libxl/libxl_save_helper.c | 17 + > tools/libxl/libxl_save_msgs_gen.pl | 74 +- > tools/libxl/libxl_types.idl | 12 +- > tools/libxl/xl_cmdimpl.c | 54 +- > tools/libxl/xl_cmdtable.c | 3 +- > xen/arch/x86/domctl.c | 15 + > xen/arch/x86/hvm/save.c | 6 + > xen/arch/x86/hvm/vmx/vmcs.c | 8 + > xen/arch/x86/hvm/vmx/vmx.c | 8 + > xen/include/asm-x86/hvm/hvm.h | 1 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 1 + > xen/include/public/domctl.h | 1 + > xen/include/xen/hvm/save.h | 2 + > 36 files changed, 2895 insertions(+), 531 deletions(-) > rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (47%) > create mode 100644 tools/libxl/libxl_colo.h > create mode 100644 tools/libxl/libxl_colo_restore.c > create mode 100644 tools/libxl/libxl_colo_save.c > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |