[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
On 04/04/2014 12:41 AM, Shriram Rajagopalan wrote: >> @@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void >> *data) >> if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) >> return 0; >> >> - /* REMUS TODO: Deal with disk. */ >> + /* Deal with disk. */ >> + if (libxl__remus_disk_preresume(dss->remus_state)) >> + return 0; >> + >> return 1; >> } >> > > Bug. I think I mentioned this last time. Disk needs to be resumed before the > domain is resumed. Just move the domain resume call below the above > code snippet. > > >> +typedef struct libxl__remus_disk_type { >> + /* checkpointing */ >> + int (*postsuspend)(libxl__remus_disk *remus_disk); >> + int (*preresume)(libxl__remus_disk *remus_disk); >> + int (*commit)(libxl__remus_disk *remus_disk); >> + >> + /* >> + * Return value: >> + * 1: the disk is not this type or the script is still running >> + * 0: the disk is this type >> + * -1: error >> + */ >> + int (*match)(libxl__domain_suspend_state *dss, >> + const libxl_device_disk *disk, >> + libxl_async_exec *async_exec, >> + void *disk_state); >> + >> + /* >> + * This is synchronous callback. Return value: >> + * 0: setup is done >> + * -1: error >> + * >> + */ >> + int (*setup)(libxl__remus_disk *remus_disk); >> + >> + /* >> + * Return value: >> + * 1: the script is still running >> + * 0: the script is done >> + * -1: error >> + */ >> + int (*teardown)(libxl__remus_disk *remus_disk, >> + libxl_async_exec *async_exec); >> + >> + /* the size of the private data */ >> + int size; >> +} libxl__remus_disk_type; >> + > > > This vtable approach is neat. I am fine with the current disk > checkpoint approach you have taken. > > Something that might be worth thinking about: > The old remus code used this approach for both the disk and network buffering. > Given that this code is going in a similar direction, I suggest > hoisting this structure > up to an abstract buffer type, with setup, teardown, postsuspend, preresume > and > commit callbacks. > > For disks, semantically, > setup [..] > teardown [..] > postsuspend [start flushing buffered writes to backup host] > preresume [wait until all writes have been flushed to backup host] > commit [no-op] > > For network devices, semantically, > setup [..] > teardown [..] > postsuspend [no-op] > preresume [start_new_epoch - libnl call] > commit [release_prev_epoch - libnl call] > > This way, in domain_suspend_done, the only thing we need to do is > foreach remus buffer > buffer.postsuspend() > > Similarly, in resume_callback() > > foreach remus buffer > buffer.preresume() > domain_resume() > > > in remus_checkpoint_dm_saved() > foreach remus buffer > buffer.commit() > > Lai, I can take an crack at it if you would like. > Your idea is great, I look forward to your work on it. > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel > . > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |