[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2





On 05/11/2015 07:01 PM, Andrew Cooper wrote:
On 11/05/15 11:48, Hongyang Yang wrote:

On 05/11/2015 05:00 PM, Andrew Cooper wrote:
On 11/05/15 07:28, Hongyang Yang wrote:
On 05/09/2015 02:12 AM, Andrew Cooper wrote:
On 08/05/15 10:33, Yang Hongyang wrote:
This patchset implement the Remus support for Migration v2 but
without
memory compressing.
[...]


<last iter of memory>

end_of_checkpoint()
Checkpoint record

ctx->save.callbacks->postcopy()
this callback should not be omitted, it do some necessary work before
resume
primary (such as call Remus devices preresume callbacks to ensure the
disk
data is consistent) and then resume the primary guest. I think this
callback should be renamed to ctx->save.callbacks->resume().

That looks to be a useful cleanup (and answers one of my questions of
what exactly postcopy was)


               ctx->save.callbacks->checkpoint()
                                   libxl qemu record

Maybe we should add another callback to send qemu record instead of
using checkpoint callback. We can call it
ctx->save.callbacks->save_qemu()

This is another layering violation.  libxc should not prescribe what
libxl might or might not do.  One example we are experimenting with in
XenServer at the moment is support for multiple emulators attached to a
single domain, which would necessitate two LIBXL_EMULATOR records to be
sent per checkpoint.  libxl might also want to send an updated json blob
or such.

Ok, so we'd better not introduce save_qemu callback.


Then in checkpoint callback, we only call remus devices commit
callbacks(
which will release the network buffer etc...) then decide whether we
need to
do another checkpoint or quit checkpointed stream.
With Remus, checkpoint callback only wait for 200ms(can be specified
by -i)
then return.
With COLO, checkpoint callback will ask COLO proxy if we need to do a
checkpoint, will return when COLO proxy module indicate a checkpoint
is needed.

That sounds like COLO wants a should_checkpoint() callback which
separates the decision to make a checkpoint from the logic of
implementing a checkpoint.

We use checkpoint callback to do should_checkpoint() thing currently.
libxc will check the return value of checkpoint callback.

But that causes a chicken & egg problem.

I am planning to use a CHECKPOINT record to synchronise the transfer of
ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
record must be in the stream ahead of the checkpoint() callback, as
libxl will then write/read some records in itself.

The record name CHECKPOINT seems do not match the thing what you are
planning to do, In this case I think END-OF-CHECKPOINT which represent the
END of libxc side checkpoint is better, when libxc side checkpoint end,
libxc should transfer the ownership of FD to libxl and let libxl to
handle the following stream. libxl side can also use END-OF-CHECKPOINT
as a sign to hand the ownership of the FD to libxc.


As a result, the checkpoint() callback itself can't be used to gate
whether a CHECKPOINT record is written by libxc.

I was wondering how you will do the FD transfer job?





                                   ...
                                   libxl end-of-checkpoint record
               ctx->save.callbacks->checkpoint() returns
start_of_checkpoint()

ctx->save.callbacks->suspend()

<memory>
end_of_checkpoint()
Checkpoint record
etc...

This will eventually allow both libxc and libxl to send checkpoint
data
(and by the looks of it, remove the need for postcopy()).  With this
libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
current qemu situation, but I would prefer not to be also retrofitting
libxc checkpoint records when doing the libxl/migv2 work.

Does this look plausible in for Remus (and eventually COLO) support?

With comments above, I would suggest the save flow as below:

libxc writes:                   libxl writes:

live migration:
Image Header
Domain Header
start_of_stream()
start_of_checkpoint()
<live memory>
ctx->save.callbacks->suspend()
<last iter memory>
end_of_checkpoint()
if ( checkpointd )
    End of Checkpoint record
    /*If resotre side receives this record, input fd should be handed to
libxl*/
else
    goto end

loop of checkpointed stream:
ctx->save.callbacks->resume()
ctx->save.callbacks->save_qemu()
                                  libxl qemu record
                                  ...
                                  libxl end-of-checkpoint record
/*If resotre side receives this record, input fd should be handed to
libxc*/
ctx->save.callbacks->save_qemu() returns
ctx->save.callbacks->checkpoint()
start_of_checkpoint()
ctx->save.callbacks->suspend()
<memory>
end_of_checkpoint()
End of Checkpoint record
goto 'loop of checkpointed stream'

end:
END record
/*If resotre side receives this record, input fd should be handed to
libxl*/


In order to keep it simple, we can keep the current
ctx->save.callbacks->checkpoint() as it is, which do the save_qemu
thing, call
Remus devices commit callbacks and then decide whether we need a
checkpoint. We
can also combine the ctx->save.callbacks->resume() with
ctx->save.callbacks->checkpoint(), with only one checkpoint()
callback, we do
the following things:
   - Call Remus devices preresume callbacks
   - Resume the primary
   - Save qemu records
   - Call Remus devices commit callbacks
   - Decide whether we need a checkpoint

Overall, there are 3 options for the save flow:
1. keep the current callbacks, rename postcopy() to resume()
2. split the checkpoint() callback to save_qemu() and checkpoint()
3. combine the current postcopy() and checkpoint()
Which one do you think is the best?

I have a 4th alternative in mind, but would like your feedback from my
comments in this email first.

So what's the 4th alternative?

I have some corrections to my patch series based on David's feedback,
and your comments.  After that, it should hopefully be far easier to
describe.

OK, I've addressed all comments on my series and wait for your series
to continue :-)


~Andrew
.


--
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.