[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 20/25] tools/libx{l, c}: add back channel to libxc



On Thu, 2015-07-16 at 14:29 +0800, Yang Hongyang wrote:
> On 07/15/2015 09:13 PM, Ian Campbell wrote:
> > On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:
> >> In COLO mode, both VMs are running, and are considered in sync if the
> >> visible network traffic is identical.  After some time, they fall out of
> >> sync.
> >>
> >> At this point, the two VMs have definitely diverged.  Lets call the
> >> primary dirty bitmap set A, while the secondary dirty bitmap set B.
> >>
> >> Sets A and B are different.
> >>
> >> Under normal migration, the page data for set A will be sent form the
> >> primary to the secondary.
> >>
> >> However, the set difference B - A (lets call this C) is out-of-date on
> >> the secondary (with respect to the primary) and will not be sent by the
> >> primary, as it was not memory dirtied by the primary.  The secondary
> >> needs the page data for C to reconstruct an exact copy of the primary at
> >> the checkpoint.
> >>
> >> The secondary cannot calculate C as it doesn't know A.  Instead, the
> >> secondary must send B to the primary, at which point the primary
> >> calculates the union of A and B (lets call this D) which is all the
> >> pages dirtied by both the primary and the secondary, and sends all page
> >> data covered by D.
> >>
> >> In the general case, D is a superset of both A and B.  Without the
> >> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
> >> copy of the primary.
> >
> > When Andy (who wrote this) said this via email I replied [0] including:
> >
> >          According to the paper there is no need to resend because the
> >          secondary already has a non-dirty copy of any memory which is
> >          dirty in B but not A.
> >
> > So it is not the case that a checkpoint _can't_ reconstruct a valid copy
> > of the primary, clearly it is possible, but for some reason this
> > implementation chooses to deviate from the paper and does things in a
> > way where it indeed cannot reconstruct D but I've yet to see a
> > description of _why_ the implementation produced here differs from the
> > paper.
> >
> >> We transfer the dirty bitmap on libxc side, so we need to introduce back
> >> channel to libxc.
> >
> > I'm sure you have good practical reasons why the implementation differs
> > from the design and I would like to know what they are because the back
> > channel is adding extra complexity to libxc and libxl so I want to know
> > why it is justified, as I also said this in [1].
> >
> > Lastly Ian said in [2]:
> >
> >          To be clear, I have no problem if the design has changed since the
> >          paper was written.  I just want:
> >
> >           * A clear high-level explanation of the actually-implemented
> >             arrangements to exist somewhere
> >
> >           * The commit messages, or code, to refer to that explanation
> >
> > IMHO the addition of this extra commit message doesn't really meet at
> > least the first requirement. Please point us to an up to date design
> > document which describes COLO as actually implemented.
> 
> The original design of COLO is that:
> Secondary should maintain an exact copy of Primary memory. At every
> checkpoint, we receive the Primary memory into that copy, then flush
> the memory to Secondary.
> 
> We changed the original design to the current one, according to our
> following concerns:
> 1. The original design needs extra memory on Secondary host. When
>     there's multiple backups on one host, the memory cost is high.
> 2. The memory cache code will be another 1k+, it will make the
>     review more time consuming.
> 
> The best way to accomplish this is the COW which Andrew mentioned earlier,
> but that should be further improvement. We will certainly continue to
> improve this when COW is ready to use.
> 
> Is description above can solve your confusion? If so, I will add to
> the commit log.

I'd much prefer to see this incorporated into an updated design which
supersedes the existing paper and is referenced from the wiki etc.

It doesn't need to be anything like as formal as that paper, of course,
but I think it does need to cover the overall design of COLO as it is in
reality today as a whole and not just as a delta to the paper in the
commit log.

Such a document might also usefully _in_addition_ contain a section
explaining the delta from the original paper and the reasons for those.

Ian.

> >
> > Ian.
> >
> > [0] http://lists.xen.org/archives/html/xen-devel/2015-07/msg00090.html
> > [1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg00148.html
> > [2] http://lists.xen.org/archives/html/xen-devel/2015-07/msg00101.html
> >
> >>
> >> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
> >> commit message:
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >> CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> >> CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> >> CC: Wei Liu <wei.liu2@xxxxxxxxxx>
> >> ---
> >>   tools/libxc/include/xenguest.h   |  8 ++++----
> >>   tools/libxc/xc_domain_restore.c  |  4 ++--
> >>   tools/libxc/xc_domain_save.c     |  4 ++--
> >>   tools/libxc/xc_sr_restore.c      |  2 +-
> >>   tools/libxc/xc_sr_save.c         |  2 +-
> >>   tools/libxl/libxl_save_callout.c | 39 
> >> ++++++++++++++++++++++++++-------------
> >>   tools/libxl/libxl_save_helper.c  |  8 ++++++--
> >>   7 files changed, 42 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/tools/libxc/include/xenguest.h 
> >> b/tools/libxc/include/xenguest.h
> >> index 6e24b6c..4056955 100644
> >> --- a/tools/libxc/include/xenguest.h
> >> +++ b/tools/libxc/include/xenguest.h
> >> @@ -91,13 +91,13 @@ struct save_callbacks {
> >>   int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> >> max_iters,
> >>                      uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
> >>                      struct save_callbacks* callbacks, int hvm,
> >> -                   int checkpointed_stream);
> >> +                   int checkpointed_stream, int back_fd);
> >>
> >>   /* Domain Save v2 */
> >>   int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> >> max_iters,
> >>                       uint32_t max_factor, uint32_t flags,
> >>                       struct save_callbacks* callbacks, int hvm,
> >> -                    int checkpointed_stream);
> >> +                    int checkpointed_stream, int back_fd);
> >>
> >>   /* callbacks provided by xc_domain_restore */
> >>   struct restore_callbacks {
> >> @@ -140,7 +140,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
> >> uint32_t dom,
> >>                         unsigned long *console_mfn, domid_t console_domid,
> >>                         unsigned int hvm, unsigned int pae, int superpages,
> >>                         int checkpointed_stream,
> >> -                      struct restore_callbacks *callbacks);
> >> +                      struct restore_callbacks *callbacks, int back_fd);
> >>
> >>   /* Domain Restore v2 */
> >>   int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
> >> @@ -149,7 +149,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
> >> uint32_t dom,
> >>                          unsigned long *console_mfn, domid_t console_domid,
> >>                          unsigned int hvm, unsigned int pae, int 
> >> superpages,
> >>                          int checkpointed_stream,
> >> -                       struct restore_callbacks *callbacks);
> >> +                       struct restore_callbacks *callbacks, int back_fd);
> >>   /**
> >>    * xc_domain_restore writes a file to disk that contains the device
> >>    * model saved state.
> >> diff --git a/tools/libxc/xc_domain_restore.c 
> >> b/tools/libxc/xc_domain_restore.c
> >> index 3cd3483..63d1e6b 100644
> >> --- a/tools/libxc/xc_domain_restore.c
> >> +++ b/tools/libxc/xc_domain_restore.c
> >> @@ -1515,7 +1515,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
> >> uint32_t dom,
> >>                         unsigned long *console_mfn, domid_t console_domid,
> >>                         unsigned int hvm, unsigned int pae, int superpages,
> >>                         int checkpointed_stream,
> >> -                      struct restore_callbacks *callbacks)
> >> +                      struct restore_callbacks *callbacks, int back_fd)
> >>   {
> >>       DECLARE_DOMCTL;
> >>       xc_dominfo_t info;
> >> @@ -1578,7 +1578,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
> >> uint32_t dom,
> >>           return xc_domain_restore2(
> >>               xch, io_fd, dom, store_evtchn, store_mfn,
> >>               store_domid, console_evtchn, console_mfn, console_domid,
> >> -            hvm,  pae,  superpages, checkpointed_stream, callbacks);
> >> +            hvm,  pae,  superpages, checkpointed_stream, callbacks, 
> >> back_fd);
> >>       }
> >>
> >>       DPRINTF("%s: starting restore of new domid %u", __func__, dom);
> >> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> >> index 0da3cca..b111384 100644
> >> --- a/tools/libxc/xc_domain_save.c
> >> +++ b/tools/libxc/xc_domain_save.c
> >> @@ -803,7 +803,7 @@ static int save_tsc_info(xc_interface *xch, uint32_t 
> >> dom, int io_fd)
> >>   int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
> >> max_iters,
> >>                      uint32_t max_factor, uint32_t flags,
> >>                      struct save_callbacks* callbacks, int hvm,
> >> -                   int checkpointed_stream)
> >> +                   int checkpointed_stream, int back_fd)
> >>   {
> >>       xc_dominfo_t info;
> >>       DECLARE_DOMCTL;
> >> @@ -899,7 +899,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
> >> uint32_t dom, uint32_t max_iter
> >>       {
> >>           return xc_domain_save2(xch, io_fd, dom, max_iters,
> >>                                  max_factor, flags, callbacks, hvm,
> >> -                               checkpointed_stream);
> >> +                               checkpointed_stream, back_fd);
> >>       }
> >>
> >>       DPRINTF("%s: starting save of domid %u", __func__, dom);
> >> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> >> index bf1ee15..504463e 100644
> >> --- a/tools/libxc/xc_sr_restore.c
> >> +++ b/tools/libxc/xc_sr_restore.c
> >> @@ -720,7 +720,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
> >> uint32_t dom,
> >>                          unsigned long *console_gfn, domid_t console_domid,
> >>                          unsigned int hvm, unsigned int pae, int 
> >> superpages,
> >>                          int checkpointed_stream,
> >> -                       struct restore_callbacks *callbacks)
> >> +                       struct restore_callbacks *callbacks, int back_fd)
> >>   {
> >>       struct xc_sr_context ctx =
> >>           {
> >> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> >> index 6102b66..d12e5b1 100644
> >> --- a/tools/libxc/xc_sr_save.c
> >> +++ b/tools/libxc/xc_sr_save.c
> >> @@ -821,7 +821,7 @@ static int save(struct xc_sr_context *ctx, uint16_t 
> >> guest_type)
> >>   int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom,
> >>                       uint32_t max_iters, uint32_t max_factor, uint32_t 
> >> flags,
> >>                       struct save_callbacks* callbacks, int hvm,
> >> -                    int checkpointed_stream)
> >> +                    int checkpointed_stream, int back_fd)
> >>   {
> >>       xen_pfn_t nr_pfns;
> >>       struct xc_sr_context ctx =
> >> diff --git a/tools/libxl/libxl_save_callout.c 
> >> b/tools/libxl/libxl_save_callout.c
> >> index f393abc..f8c6cf0 100644
> >> --- a/tools/libxl/libxl_save_callout.c
> >> +++ b/tools/libxl/libxl_save_callout.c
> >> @@ -27,7 +27,7 @@
> >>    */
> >>   static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
> >>                          const char *mode_arg,
> >> -                       int stream_fd,
> >> +                       int stream_fd, int back_fd,
> >>                          const int *preserve_fds, int num_preserve_fds,
> >>                          const unsigned long *argnums, int num_argnums);
> >>
> >> @@ -50,6 +50,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
> >> libxl__domain_create_state *dcs,
> >>       /* Convenience aliases */
> >>       const uint32_t domid = dcs->guest_domid;
> >>       const int restore_fd = dcs->libxc_fd;
> >> +    const int send_fd = dcs->send_fd;
> >>       libxl__domain_build_state *const state = &dcs->build_state;
> >>
> >>       unsigned cbflags =
> >> @@ -72,7 +73,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
> >> libxl__domain_create_state *dcs,
> >>       shs->need_results = 1;
> >>       shs->toolstack_data_file = 0;
> >>
> >> -    run_helper(egc, shs, "--restore-domain", restore_fd, 0, 0,
> >> +    run_helper(egc, shs, "--restore-domain", restore_fd, send_fd, 0, 0,
> >>                  argnums, ARRAY_SIZE(argnums));
> >>   }
> >>
> >> @@ -96,7 +97,7 @@ void libxl__xc_domain_save(libxl__egc *egc, 
> >> libxl__domain_save_state *dss,
> >>       shs->caller_state = dss;
> >>       shs->need_results = 0;
> >>
> >> -    run_helper(egc, shs, "--save-domain", dss->fd,
> >> +    run_helper(egc, shs, "--save-domain", dss->fd, dss->recv_fd,
> >>                  NULL, 0,
> >>                  argnums, ARRAY_SIZE(argnums));
> >>       return;
> >> @@ -119,14 +120,29 @@ void 
> >> libxl__save_helper_init(libxl__save_helper_state *shs)
> >>   }
> >>
> >>   /*----- helper execution -----*/
> >> +static int dup_fd_helper(libxl__gc *gc, int fd, const char *what)
> >> +{
> >> +    int dup_fd = fd;
> >> +
> >> +    if (fd <= 2) {
> >> +        dup_fd = dup(fd);
> >> +        if (dup_fd < 0) {
> >> +            LOGE(ERROR,"dup %s", what);
> >> +            exit(-1);
> >> +        }
> >> +    }
> >> +    libxl_fd_set_cloexec(CTX, dup_fd, 0);
> >> +
> >> +    return dup_fd;
> >> +}
> >>
> >>   static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
> >> -                       const char *mode_arg, int stream_fd,
> >> +                       const char *mode_arg, int stream_fd, int back_fd,
> >>                          const int *preserve_fds, int num_preserve_fds,
> >>                          const unsigned long *argnums, int num_argnums)
> >>   {
> >>       STATE_AO_GC(shs->ao);
> >> -    const char *args[4 + num_argnums];
> >> +    const char *args[5 + num_argnums];
> >>       const char **arg = args;
> >>       int i, rc;
> >>
> >> @@ -154,6 +170,7 @@ static void run_helper(libxl__egc *egc, 
> >> libxl__save_helper_state *shs,
> >>       *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC_BIN "/" 
> >> "libxl-save-helper";
> >>       *arg++ = mode_arg;
> >>       const char **stream_fd_arg = arg++;
> >> +    const char **back_fd_arg = arg++;
> >>       for (i=0; i<num_argnums; i++)
> >>           *arg++ = GCSPRINTF("%lu", argnums[i]);
> >>       *arg++ = 0;
> >> @@ -178,16 +195,12 @@ static void run_helper(libxl__egc *egc, 
> >> libxl__save_helper_state *shs,
> >>
> >>       pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
> >>       if (!pid) {
> >> -        if (stream_fd <= 2) {
> >> -            stream_fd = dup(stream_fd);
> >> -            if (stream_fd < 0) {
> >> -                LOGE(ERROR,"dup migration stream fd");
> >> -                exit(-1);
> >> -            }
> >> -        }
> >> -        libxl_fd_set_cloexec(CTX, stream_fd, 0);
> >> +        stream_fd = dup_fd_helper(gc, stream_fd, "migration stream fd");
> >>           *stream_fd_arg = GCSPRINTF("%d", stream_fd);
> >>
> >> +        back_fd = dup_fd_helper(gc, back_fd, "migration back channel fd");
> >> +        *back_fd_arg = GCSPRINTF("%d", back_fd);
> >> +
> >>           for (i=0; i<num_preserve_fds; i++)
> >>               if (preserve_fds[i] >= 0) {
> >>                   assert(preserve_fds[i] > 2);
> >> diff --git a/tools/libxl/libxl_save_helper.c 
> >> b/tools/libxl/libxl_save_helper.c
> >> index 4c9d34c..9de5694 100644
> >> --- a/tools/libxl/libxl_save_helper.c
> >> +++ b/tools/libxl/libxl_save_helper.c
> >> @@ -235,6 +235,7 @@ static struct restore_callbacks 
> >> helper_restore_callbacks;
> >>   int main(int argc, char **argv)
> >>   {
> >>       int r;
> >> +    int back_fd;
> >>
> >>   #define NEXTARG (++argv, assert(*argv), *argv)
> >>
> >> @@ -244,6 +245,7 @@ int main(int argc, char **argv)
> >>       if (!strcmp(mode,"--save-domain")) {
> >>
> >>           io_fd =                    atoi(NEXTARG);
> >> +        back_fd =                  atoi(NEXTARG);
> >>           uint32_t dom =             strtoul(NEXTARG,0,10);
> >>           uint32_t max_iters =       strtoul(NEXTARG,0,10);
> >>           uint32_t max_factor =      strtoul(NEXTARG,0,10);
> >> @@ -259,12 +261,14 @@ int main(int argc, char **argv)
> >>           setup_signals(save_signal_handler);
> >>
> >>           r = xc_domain_save2(xch, io_fd, dom, max_iters, max_factor, 
> >> flags,
> >> -                           &helper_save_callbacks, hvm, 
> >> checkpointed_stream);
> >> +                            &helper_save_callbacks, hvm, 
> >> checkpointed_stream,
> >> +                            back_fd);
> >>           complete(r);
> >>
> >>       } else if (!strcmp(mode,"--restore-domain")) {
> >>
> >>           io_fd =                    atoi(NEXTARG);
> >> +        back_fd =                  atoi(NEXTARG);
> >>           uint32_t dom =             strtoul(NEXTARG,0,10);
> >>           unsigned store_evtchn =    strtoul(NEXTARG,0,10);
> >>           domid_t store_domid =      strtoul(NEXTARG,0,10);
> >> @@ -289,7 +293,7 @@ int main(int argc, char **argv)
> >>                                 store_domid, console_evtchn, &console_mfn,
> >>                                 console_domid, hvm, pae, superpages,
> >>                                 checkpointed,
> >> -                              &helper_restore_callbacks);
> >> +                              &helper_restore_callbacks, back_fd);
> >>           helper_stub_restore_results(store_mfn,console_mfn,0);
> >>           complete(r);
> >>
> >
> >
> > .
> >
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.