[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] tools: set migration constraints from cmdline



On Mon, 2013-02-04 at 13:41 +0000, Olaf Hering wrote:
> On Mon, Feb 04, Ian Campbell wrote:
> 
> > On Fri, 2013-02-01 at 19:34 +0000, Olaf Hering wrote:
> > > # HG changeset patch
> > > # User Olaf Hering <olaf@xxxxxxxxx>
> > > # Date 1359746832 -3600
> > > # Node ID 785c8f34e1f802106e53ca4d2c54dce55c8ee166
> > > # Parent  fd711ebdce9a58556d62c2daaf5d49ab06de4a3c
> > > tools: set migration constraints from cmdline
> > > 
> > > Add new options to xm/xl migrate to control the process of migration.
> > > The intention is to optionally abort the migration if it takes too long
> > > to migrate a busy guest due to the high number of dirty pages. Currently
> > > the guest is suspended to transfer the remaining dirty pages. This
> > > transfer can take too long, which can confuse the guest if its suspended
> > > for too long.
> > > 
> > > -M <number>   Number of iterations before final suspend (default: 30)
> > > --max_iters <number>
> > > 
> > > -m <factor>   Max amount of memory to transfer before final suspend 
> > > (default: 3*RAM)
> > > --max_factor <factor>
> > > 
> > > -N            Abort migration instead of doing final suspend.
> > > --no_suspend
> > 
> > Is this last one a debug option? (Having read the patch I think not, but
> > the description here doesn't really explain it)
> 
> No, its not debug. And the naming of that -N option is not really good,
> I agree.
> 
> What it is supposed to do is to avoid the final suspend+migrate+resume
> step when either there were too many iterations or too many pages
> transfered when the guest continues to dirty many pages. Its not
> predictable how long the guest will be suspended to transfer the
> remaining pages to the new host. I think even transfering 1GB RAM at
> 1000/GBs takes maybe 10 seconds and with large guests there may be
> several GB dirty pages, so that a guest may be suspended for a minute.
> This caused issues in a customer environment.
> 
> Instead the migration should be aborted and the guest should continue to
> run on the old host.

That's what I understood from reading the code, can we make the docs say
it too ;-)
> 
> 
> > >  =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
> > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxc/xc_domain_save.c
> > > --- a/tools/libxc/xc_domain_save.c
> > > +++ b/tools/libxc/xc_domain_save.c
> > 
> > The changes to this file only seem to implement the -N and not the other
> > two?
> 
> xc_domain_save already has max_iters and max_factor as arguments.

Ah, I didn't appreciate that.

BTW did you want to add a way to override DEF_MIN_REMAINING?

> 
> > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/libxl.h
> > > --- a/tools/libxl/libxl.h
> > > +++ b/tools/libxl/libxl.h
> > > @@ -500,12 +500,25 @@ int libxl_domain_create_restore(libxl_ct
> > >  void libxl_domain_config_init(libxl_domain_config *d_config);
> > >  void libxl_domain_config_dispose(libxl_domain_config *d_config);
> > > 
> > > -int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
> > > +int libxl_domain_suspend_0x040200(libxl_ctx *ctx, uint32_t domid, int fd,
> > >                           int flags, /* LIBXL_SUSPEND_* */
> > >                           const libxl_asyncop_how *ao_how)
> > >                           LIBXL_EXTERNAL_CALLERS_ONLY;
> > > +#ifdef LIBXL_API_VERSION
> > > +#if LIBXL_API_VERSION == 0x040200
> > > +#define libxl_domain_suspend libxl_domain_suspend_0x040200
> > 
> > int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
> >                          int flags, /* LIBXL_SUSPEND_* */
> >                          int max_iters, int max_factor,
> >                          const libxl_asyncop_how *ao_how)
> >                          LIBXL_EXTERNAL_CALLERS_ONLY;
> > #ifdef LIBXL_API_VERSION
> > #if LIBXL_API_VERSION == 0x040200
> > #define libxl_domain_suspend(ctx, domid, fd, flags, ao_how) \
> >             libxl_domain_suspend(ctx, domid, fd, flags, 0, 0, ao_how) 
> > #endif /* LIBXL_API_VERSION == 0x040200 */
> > #endif /* defined(LIBXL_API_VERSION) */
> > 
> > Should work?
> 
> That may work, have to try it. In the end if we make such a change it
> would serve as example for other upcoming API changes.

Agreed.

> > >  /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )]
> > >   *   If this parameter is true, use co-operative resume. The guest
> > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/libxl_dom.c
> > > --- a/tools/libxl/libxl_dom.c
> > > +++ b/tools/libxl/libxl_dom.c
> > > @@ -1240,6 +1240,7 @@ void libxl__domain_suspend(libxl__egc *e
> > > 
> > >      dss->xcflags = (live ? XCFLAGS_LIVE : 0)
> > >            | (debug ? XCFLAGS_DEBUG : 0)
> > > +          | (dss->xlflags & LIBXL_SUSPEND_NO_FINAL_SUSPEND ? 
> > > XCFLAGS_DOMSAVE_NOSUSPEND : 0)
> > >            | (dss->hvm ? XCFLAGS_HVM : 0);
> > > 
> > >      dss->suspend_eventchn = -1;
> > 
> > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/xl_cmdimpl.c
> > > --- a/tools/libxl/xl_cmdimpl.c
> > > +++ b/tools/libxl/xl_cmdimpl.c
> > > @@ -3172,7 +3172,7 @@ static int save_domain(uint32_t domid, c
> > > 
> > >      save_domain_core_writeconfig(fd, filename, config_data, config_len);
> > > 
> > > -    int rc = libxl_domain_suspend(ctx, domid, fd, 0, NULL);
> > > +    int rc = libxl_domain_suspend(ctx, domid, fd, 0, 0, 0, NULL);
> > 
> > Doesn't this need to pass down the selected values?
> 
> save_domain is different from migrate_domain, I think xl save will
> suspend the guest anyway?

Oh yes, I didn't spot this as the save path (despite it being right
there in the function name).

>  But I notice the -c option for "save", so
> perhaps it would be useful to catch busy guests as well here?

I'm not really sure what the usecase is for save -c. I'd be a bit
inclined to leave it until someone wants it I think.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.