[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2



On 11/05/15 11:48, Hongyang Yang wrote:
>
> On 05/11/2015 05:00 PM, Andrew Cooper wrote:
>> On 11/05/15 07:28, Hongyang Yang wrote:
>>> On 05/09/2015 02:12 AM, Andrew Cooper wrote:
>>>> On 08/05/15 10:33, Yang Hongyang wrote:
>>>>> This patchset implement the Remus support for Migration v2 but
>>>>> without
>>>>> memory compressing.
> [...]
>>
>>>
>>> <last iter of memory>
>>>
>>>> end_of_checkpoint()
>>>> Checkpoint record
>>>
>>> ctx->save.callbacks->postcopy()
>>> this callback should not be omitted, it do some necessary work before
>>> resume
>>> primary (such as call Remus devices preresume callbacks to ensure the
>>> disk
>>> data is consistent) and then resume the primary guest. I think this
>>> callback should be renamed to ctx->save.callbacks->resume().
>>
>> That looks to be a useful cleanup (and answers one of my questions of
>> what exactly postcopy was)
>>
>>>
>>>>               ctx->save.callbacks->checkpoint()
>>>>                                   libxl qemu record
>>>
>>> Maybe we should add another callback to send qemu record instead of
>>> using checkpoint callback. We can call it
>>> ctx->save.callbacks->save_qemu()
>>
>> This is another layering violation.  libxc should not prescribe what
>> libxl might or might not do.  One example we are experimenting with in
>> XenServer at the moment is support for multiple emulators attached to a
>> single domain, which would necessitate two LIBXL_EMULATOR records to be
>> sent per checkpoint.  libxl might also want to send an updated json blob
>> or such.
>
> Ok, so we'd better not introduce save_qemu callback.
>
>>
>>> Then in checkpoint callback, we only call remus devices commit
>>> callbacks(
>>> which will release the network buffer etc...) then decide whether we
>>> need to
>>> do another checkpoint or quit checkpointed stream.
>>> With Remus, checkpoint callback only wait for 200ms(can be specified
>>> by -i)
>>> then return.
>>> With COLO, checkpoint callback will ask COLO proxy if we need to do a
>>> checkpoint, will return when COLO proxy module indicate a checkpoint
>>> is needed.
>>
>> That sounds like COLO wants a should_checkpoint() callback which
>> separates the decision to make a checkpoint from the logic of
>> implementing a checkpoint.
>
> We use checkpoint callback to do should_checkpoint() thing currently.
> libxc will check the return value of checkpoint callback.

But that causes a chicken & egg problem.

I am planning to use a CHECKPOINT record to synchronise the transfer of
ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
record must be in the stream ahead of the checkpoint() callback, as
libxl will then write/read some records in itself.

As a result, the checkpoint() callback itself can't be used to gate
whether a CHECKPOINT record is written by libxc.

>
>>
>>>
>>>>                                   ...
>>>>                                   libxl end-of-checkpoint record
>>>>               ctx->save.callbacks->checkpoint() returns
>>>> start_of_checkpoint()
>>>
>>> ctx->save.callbacks->suspend()
>>>
>>>> <memory>
>>>> end_of_checkpoint()
>>>> Checkpoint record
>>>> etc...
>>>>
>>>> This will eventually allow both libxc and libxl to send checkpoint
>>>> data
>>>> (and by the looks of it, remove the need for postcopy()).  With this
>>>> libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
>>>> current qemu situation, but I would prefer not to be also retrofitting
>>>> libxc checkpoint records when doing the libxl/migv2 work.
>>>>
>>>> Does this look plausible in for Remus (and eventually COLO) support?
>>>
>>> With comments above, I would suggest the save flow as below:
>>>
>>> libxc writes:                   libxl writes:
>>>
>>> live migration:
>>> Image Header
>>> Domain Header
>>> start_of_stream()
>>> start_of_checkpoint()
>>> <live memory>
>>> ctx->save.callbacks->suspend()
>>> <last iter memory>
>>> end_of_checkpoint()
>>> if ( checkpointd )
>>>    End of Checkpoint record
>>>    /*If resotre side receives this record, input fd should be handed to
>>> libxl*/
>>> else
>>>    goto end
>>>
>>> loop of checkpointed stream:
>>> ctx->save.callbacks->resume()
>>> ctx->save.callbacks->save_qemu()
>>>                                  libxl qemu record
>>>                                  ...
>>>                                  libxl end-of-checkpoint record
>>> /*If resotre side receives this record, input fd should be handed to
>>> libxc*/
>>> ctx->save.callbacks->save_qemu() returns
>>> ctx->save.callbacks->checkpoint()
>>> start_of_checkpoint()
>>> ctx->save.callbacks->suspend()
>>> <memory>
>>> end_of_checkpoint()
>>> End of Checkpoint record
>>> goto 'loop of checkpointed stream'
>>>
>>> end:
>>> END record
>>> /*If resotre side receives this record, input fd should be handed to
>>> libxl*/
>>>
>>>
>>> In order to keep it simple, we can keep the current
>>> ctx->save.callbacks->checkpoint() as it is, which do the save_qemu
>>> thing, call
>>> Remus devices commit callbacks and then decide whether we need a
>>> checkpoint. We
>>> can also combine the ctx->save.callbacks->resume() with
>>> ctx->save.callbacks->checkpoint(), with only one checkpoint()
>>> callback, we do
>>> the following things:
>>>   - Call Remus devices preresume callbacks
>>>   - Resume the primary
>>>   - Save qemu records
>>>   - Call Remus devices commit callbacks
>>>   - Decide whether we need a checkpoint
>>>
>>> Overall, there are 3 options for the save flow:
>>> 1. keep the current callbacks, rename postcopy() to resume()
>>> 2. split the checkpoint() callback to save_qemu() and checkpoint()
>>> 3. combine the current postcopy() and checkpoint()
>>> Which one do you think is the best?
>>
>> I have a 4th alternative in mind, but would like your feedback from my
>> comments in this email first.
>
> So what's the 4th alternative?

I have some corrections to my patch series based on David's feedback,
and your comments.  After that, it should hopefully be far easier to
describe.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.