[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
> @@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void > *data) > if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) > return 0; > > - /* REMUS TODO: Deal with disk. */ > + /* Deal with disk. */ > + if (libxl__remus_disk_preresume(dss->remus_state)) > + return 0; > + > return 1; > } > Bug. I think I mentioned this last time. Disk needs to be resumed before the domain is resumed. Just move the domain resume call below the above code snippet. > +typedef struct libxl__remus_disk_type { > + /* checkpointing */ > + int (*postsuspend)(libxl__remus_disk *remus_disk); > + int (*preresume)(libxl__remus_disk *remus_disk); > + int (*commit)(libxl__remus_disk *remus_disk); > + > + /* > + * Return value: > + * 1: the disk is not this type or the script is still running > + * 0: the disk is this type > + * -1: error > + */ > + int (*match)(libxl__domain_suspend_state *dss, > + const libxl_device_disk *disk, > + libxl_async_exec *async_exec, > + void *disk_state); > + > + /* > + * This is synchronous callback. Return value: > + * 0: setup is done > + * -1: error > + * > + */ > + int (*setup)(libxl__remus_disk *remus_disk); > + > + /* > + * Return value: > + * 1: the script is still running > + * 0: the script is done > + * -1: error > + */ > + int (*teardown)(libxl__remus_disk *remus_disk, > + libxl_async_exec *async_exec); > + > + /* the size of the private data */ > + int size; > +} libxl__remus_disk_type; > + This vtable approach is neat. I am fine with the current disk checkpoint approach you have taken. Something that might be worth thinking about: The old remus code used this approach for both the disk and network buffering. Given that this code is going in a similar direction, I suggest hoisting this structure up to an abstract buffer type, with setup, teardown, postsuspend, preresume and commit callbacks. For disks, semantically, setup [..] teardown [..] postsuspend [start flushing buffered writes to backup host] preresume [wait until all writes have been flushed to backup host] commit [no-op] For network devices, semantically, setup [..] teardown [..] postsuspend [no-op] preresume [start_new_epoch - libnl call] commit [release_prev_epoch - libnl call] This way, in domain_suspend_done, the only thing we need to do is foreach remus buffer buffer.postsuspend() Similarly, in resume_callback() foreach remus buffer buffer.preresume() domain_resume() in remus_checkpoint_dm_saved() foreach remus buffer buffer.commit() Lai, I can take an crack at it if you would like. shriram _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |