[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore





On 05/15/2015 05:27 PM, Ian Campbell wrote:
On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote:

On 05/15/2015 05:09 PM, Ian Campbell wrote:
On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:

On 05/14/2015 09:05 PM, Ian Campbell wrote:
On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
With Remus, the restore flow should be:
the first full migration stream -> { periodically restore stream }

Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
---
    tools/libxc/xc_sr_common.h  |  14 ++++++
    tools/libxc/xc_sr_restore.c | 113 
++++++++++++++++++++++++++++++++++++++++----
    2 files changed, 117 insertions(+), 10 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index f8121e7..3bf27f1 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -208,6 +208,20 @@ struct xc_sr_context
                /* Plain VM, or checkpoints over time. */
                bool checkpointed;

+            /* Currently buffering records between a checkpoint */
+            bool buffer_all_records;
+
+/*
+ * With Remus, we buffer the records sent by the primary at checkpoint,
+ * in case the primary will fail, we can recover from the last
+ * checkpoint state.
+ * This should be enough because primary only send dirty pages at
+ * checkpoint.

I'm not sure how it then follows that 1024 buffers is guaranteed to be
enough, unless there is something on the sending side arranging it to be
so?

There are only few records at every checkpoint in my test, mostly under 10,
probably because I don't do much operations in the Guest. I thought This limit
can be adjusted later by further testing.

For some reason I thought these buffers included the page data, is that
not true? I was expecting the bulk of the records to be dirty page data.

The page data is not stored in this buffer, but it's pointer stored in
this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.

OK, so there are (approximately) as many xc_sr_records as there are
buffered dirty pages? I'd expect this would easily reach 1024 in some
circumstances (e..g run a fork bomb in the domain or something)

No, a record may contain up to 1024 pages, so the record number is less
than dirty page number.


Since you and Andy both have doubts on this, I have to reconsider on this,
perhaps there should be no limit. Even if the 1024 limit works for
most of the cases, there might be cases that exceed the limit. So I will
add another member 'allocated_rec_num' in the context, when the
'buffered_rec_num' exceed the 'allocated_rec_num', I will reallocate the buffer.
The initial buffer size will be 1024 records which will work for most cases.

That seems easy enough to be worth doing even if I was wrong about paged
data.

done.



.




.


--
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.