[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 RFC 13/14] tools/libxc: noarch save code




On Jun 19, 2014 2:52 PM, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 19/06/14 03:48, Wen Congyang wrote:
> > At 06/12/2014 02:14 AM, Andrew Cooper Wrote:
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >> Signed-off-by: Frediano Ziglio <frediano.ziglio@xxxxxxxxxx>
> >> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> >> ---
> >> Âtools/libxc/saverestore/save.c | Â545 +++++++++++++++++++++++++++++++++++++++-
> >> Â1 file changed, 544 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
> >> index f6ad734..9ad43a5 100644
> >> --- a/tools/libxc/saverestore/save.c
> >> +++ b/tools/libxc/saverestore/save.c
> >> @@ -1,11 +1,554 @@
> >> +#include <assert.h>
> >> +#include <arpa/inet.h>
> >> +
> >> Â#include "common.h"
> >>
> >> +/*
> >> + * Writes an Image header and Domain header into the stream.
> >> + */
> >> +static int write_headers(struct context *ctx, uint16_t guest_type)
> >> +{
> >> + Â Âxc_interface *xch = ctx->xch;
> >> + Â Âint32_t xen_version = xc_version(xch, XENVER_version, NULL);
> >> + Â Âstruct ihdr ihdr =
> >> + Â Â Â Â{
> >> + Â Â Â Â Â Â.marker Â= IHDR_MARKER,
> >> +      Â.id   Â= htonl(IHDR_ID),
> >> + Â Â Â Â Â Â.version = htonl(IHDR_VERSION),
> >> + Â Â Â Â Â Â.options = htons(IHDR_OPT_LITTLE_ENDIAN),
> >> + Â Â Â Â};
> >> + Â Âstruct dhdr dhdr =
> >> + Â Â Â Â{
> >> +      Â.type    = guest_type,
> >> + Â Â Â Â Â Â.page_shift = XC_PAGE_SHIFT,
> >> + Â Â Â Â Â Â.xen_major Â= (xen_version >> 16) & 0xffff,
> >> + Â Â Â Â Â Â.xen_minor Â= (xen_version) Â Â Â & 0xffff,
> >> + Â Â Â Â};
> >> +
> >> + Â Âif ( xen_version < 0 )
> >> + Â Â{
> >> + Â Â Â ÂPERROR("Unable to obtain Xen Version");
> >> + Â Â Â Âreturn -1;
> >> + Â Â}
> >> +
> >> + Â Âif ( write_exact(ctx->fd, &ihdr, sizeof(ihdr)) )
> >> + Â Â{
> >> + Â Â Â ÂPERROR("Unable to write Image Header to stream");
> >> + Â Â Â Âreturn -1;
> >> + Â Â}
> >> +
> >> + Â Âif ( write_exact(ctx->fd, &dhdr, sizeof(dhdr)) )
> >> + Â Â{
> >> + Â Â Â ÂPERROR("Unable to write Domain Header to stream");
> >> + Â Â Â Âreturn -1;
> >> + Â Â}
> >> +
> >> + Â Âreturn 0;
> >> +}
> >> +
> >> +/*
> >> + * Writes an END record into the stream.
> >> + */
> >> +static int write_end_record(struct context *ctx)
> >> +{
> >> + Â Âstruct record end = { REC_TYPE_END, 0, NULL };
> >> +
> >> + Â Âreturn write_record(ctx, &end);
> >> +}
> >> +
> >> +/*
> >> + * Writes a batch of memory as a PAGE_DATA record into the stream. ÂThe batch
> >> + * is constructed in ctx->save.batch_pfns.
> >> + *
> >> + * This function:
> >> + * - gets the types for each pfn in the batch.
> >> + * - for each pfn with real data:
> >> + * Â - maps and attempts to localise the pages.
> >> + * - construct and writes a PAGE_DATA record into the stream.
> >> + */
> >> +static int write_batch(struct context *ctx)
> >> +{
> >> + Â Âxc_interface *xch = ctx->xch;
> >> + Â Âxen_pfn_t *mfns = NULL, *types = NULL;
> >> + Â Âvoid *guest_mapping = NULL;
> >> + Â Âvoid **guest_data = NULL;
> >> + Â Âvoid **local_pages = NULL;
> >> + Â Âint *errors = NULL, rc = -1;
> >> + Â Âunsigned i, p, nr_pages = 0;
> >> + Â Âunsigned nr_pfns = ctx->save.nr_batch_pfns;
> >> + Â Âvoid *page, *orig_page;
> >> + Â Âuint64_t *rec_pfns = NULL;
> >> + Â Âstruct rec_page_data_header hdr = { 0 };
> >> + Â Âstruct record rec =
> >> + Â Â{
> >> + Â Â Â Â.type = REC_TYPE_PAGE_DATA,
> >> + Â Â};
> >> +
> >> + Â Âassert(nr_pfns != 0);
> >> +
> >> + Â Â/* Mfns of the batch pfns. */
> >> + Â Âmfns = malloc(nr_pfns * sizeof(*mfns));
> >> + Â Â/* Types of the batch pfns. */
> >> + Â Âtypes = malloc(nr_pfns * sizeof(*types));
> >> + Â Â/* Errors from attempting to map the mfns. */
> >> + Â Âerrors = malloc(nr_pfns * sizeof(*errors));
> >> + Â Â/* Pointers to page data to send. ÂEither mapped mfns or local allocations. */
> >> + Â Âguest_data = calloc(nr_pfns, sizeof(*guest_data));
> >> + Â Â/* Pointers to locally allocated pages. ÂNeed freeing. */
> >> + Â Âlocal_pages = calloc(nr_pfns, sizeof(*local_pages));
> > This function is called too many times, so we will allocate/free
> > memory again and again. It may affect the performance.
> >
> > I think we can allocate at setup stage, and only clear guest_data/
> > local_pages here.
>
> We likely can. ÂIt is currently like this because it allowed valgrind to
> do a fantastic job of spotting errors when flushing an incomplete batch
> at the end of each iteration.
>
> It should be possible to consolidate the allocations and use valgrind
> client requests to achieve the same effect, although at this point my
> effort is far more focused to getting something which works correctly
> ready in the 4.5 timeframe.
>
> >
> >> +
> >> + Â Âif ( !mfns || !types || !errors || !guest_data || !local_pages )
> >> + Â Â{
> >> + Â Â Â ÂERROR("Unable to allocate arrays for a batch of %u pages",
> >> + Â Â Â Â Â Â Ânr_pfns);
> >> + Â Â Â Âgoto err;
> >> + Â Â}
> >> +
> >> + Â Âfor ( i = 0; i < nr_pfns; ++i )
> >> + Â Â{
> >> + Â Â Â Âtypes[i] = mfns[i] = ctx->ops.pfn_to_gfn(ctx, ctx->save.batch_pfns[i]);
> >> +
> >> + Â Â Â Â/* Likely a ballooned page. */
> >> + Â Â Â Âif ( mfns[i] == INVALID_MFN )
> >> + Â Â Â Â Â Âset_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
> >> + Â Â}
> >> +
> >> + Â Ârc = xc_get_pfn_type_batch(xch, ctx->domid, nr_pfns, types);
> >> + Â Âif ( rc )
> >> + Â Â{
> >> + Â Â Â ÂPERROR("Failed to get types for pfn batch");
> >> + Â Â Â Âgoto err;
> >> + Â Â}
> >> + Â Ârc = -1;
> >> +
> >> + Â Âfor ( i = 0; i < nr_pfns; ++i )
> >> + Â Â{
> >> + Â Â Â Âswitch ( types[i] )
> >> + Â Â Â Â{
> >> + Â Â Â Âcase XEN_DOMCTL_PFINFO_BROKEN:
> >> + Â Â Â Âcase XEN_DOMCTL_PFINFO_XALLOC:
> >> + Â Â Â Âcase XEN_DOMCTL_PFINFO_XTAB:
> >> + Â Â Â Â Â Âcontinue;
> >> + Â Â Â Â}
> >> +
> >> + Â Â Â Âmfns[nr_pages++] = mfns[i];
> >> + Â Â}
> >> +
> >> + Â Âif ( nr_pages > 0 )
> >> + Â Â{
> >> + Â Â Â Âguest_mapping = xc_map_foreign_bulk(
> >> + Â Â Â Â Â Âxch, ctx->domid, PROT_READ, mfns, errors, nr_pages);
> >> + Â Â Â Âif ( !guest_mapping )
> >> + Â Â Â Â{
> >> + Â Â Â Â Â ÂPERROR("Failed to map guest pages");
> >> + Â Â Â Â Â Âgoto err;
> >> + Â Â Â Â}
> > To support remus, we will map/unmap guest memory again and again. It
> > also affects the performance. We can cache guest mapping here.
> >
> > Thanks
> > Wen Congyang
> >
> >
>
> I am not aware of the current code caching mappings into the guest.
> 64bit toolstacks would be fine setting up mappings for every gfn and
> reusing the mappings, but this really won't work for 32bit toolstacks.
>
> What remus does appear to do is have a limited cache of
> previously-written pages for XOR+RLE compression.
>

I think Wen is referring to the 'mapping/unmapping' in batches of 4MB. This slows down the time to checkpoint by 2X generally and tends to get worse as the guest's memory size increases beyond 1GB.

What we used to do earlier (not in mainline libxc code), is to map the entire domain memory into libxc before the infinite loop. This, as you said has some restrictions, but halves the checkpointing time (ie halves the time for which a guest execution is suspended).

Shriram

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.