[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 2] xenalyze: decode PV_HYPERCALL_V2 records



On Mon, Oct 1, 2012 at 6:52 PM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
> Newer version of Xen produce TRC_PV_HYPERCALL_V2 records instead of
> the older TRC_PV_HYPERCALL format.  This updated format doesn't
> included the IP but it does include select hypercall arguments.
>
> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>

Thanks -- I did a bunch of optimization work on xenalyze late last
year, so I'm afraid I'm going to be pretty picky about some things to
avoid losing that effort. :-)

> diff --git a/pv.h b/pv.h
> new file mode 100644
> --- /dev/null
> +++ b/pv.h
> @@ -0,0 +1,41 @@
> +/*
> + * PV event decoding.
> + *
> + * Copyright (C) 2012 Citrix Systems R&D Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + */
> +#ifndef __PV_H
> +
> +#include "analyze.h"
> +#include "trace.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define ARG_MISSING 0x0
> +#define ARG_32BIT 0x1
> +#define ARG_64BIT 0x2
> +
> +#define MMU_UPDATE_PREEMPTED          (~(~0U>>1))
> +
> +static inline uint32_t pv_hypercall_op(const struct record_info *ri)
> +{
> +    return ri->d[0] & ~TRC_PV_HYPERCALL_V2_ARG_MASK;
> +}
> +
> +static inline int pv_hypercall_arg_present(const struct record_info *ri, int 
> arg)
> +{
> +    return (ri->d[0] >> (20 + 2*arg)) & 0x3;
> +}

Try to avoid an integer multiply here; (arg<<1) or (arg+arg) please.

> @@ -6523,6 +6537,168 @@ void pv_summary(struct pv_data *pv) {
>      }
>  }
>
> +uint64_t pv_hypercall_arg(const struct record_info *ri, int arg)
> +{
> +    int i, word;
> +
> +    for (i = 0, word = 1; i < 6 && word < ri->extra_words; i++) {
> +        int present = pv_hypercall_arg_present(ri, i);
> +
> +        /* Is this the argument we're looking for? */
> +        if (i == arg) {
> +            switch (present) {
> +            case ARG_MISSING:
> +                return 0;
> +            case ARG_32BIT:
> +                return ri->d[word];
> +            case ARG_64BIT:
> +                return ((uint64_t)ri->d[word + 1] << 32) | ri->d[word];
> +            }
> +        }
> +
> +        /* Skip over any words for this argument. */
> +        word += present;
> +    }

So at the moment, this is only called if you're in dump mode, which is
fairly slow already; but it may at some point be called in summary
mode, which I'd like to keep pretty fast.  This little loop will
basically be O(n^2) with the number of recorded arguments.  Since
we're expecting the hypercall arguments xenalyze asks for to match the
arguments put into the trace, it seems like we should be able to do
better.

Would it make sense to do something like pass in an array of what
arguments it expects to get, and then have the function double-check
that they match properly, and then replaces the "expected argument
number" with "traced value" (returning an error if they don't match)?

Alternately, since pv_hypercall_arg() returns 0 for a non-existent arg
anyway, couldn't you have an array, initialized to 0, and filled in as
appropriate?

> +static void pv_hypercall_print_args(const struct record_info *ri)
> +{
> +    int i, word;
> +
> +    for (i = 0, word = 1; i < 6 && word < ri->extra_words; i++) {
> +        int present = pv_hypercall_arg_present(ri, i);
> +
> +        switch (present) {
> +        case ARG_MISSING:
> +            printf(" ??");
> +            break;
> +        case ARG_32BIT:
> +            printf(" %08x", ri->d[word]);
> +            break;
> +        case ARG_64BIT:
> +            printf(" %016"PRIu64"", ((uint64_t)ri->d[word + 1] << 32) | 
> ri->d[word]);
> +            break;
> +        }
> +
> +        word += present;
> +    }
> +}
> +
> +static const char *grant_table_op_cmd_to_str(uint32_t cmd)
> +{
> +    const char *cmd_str[] = {
> +        "map_grant_ref", "unmap_grant_ref", "setup_table", "dump_table",
> +        "transfer", "copy", "query_size", "unmap_and_replace",
> +        "set_version", "get_status_frames", "get_version", "swap_grant_ref",
> +    };
> +    static char buf[32];
> +
> +    if (cmd <= 11)
> +        return cmd_str[cmd];
> +
> +    snprintf(buf, sizeof(buf), "unknown (%d)", cmd);

Just FYI, in my analysis, time spent in libc's vprintf() takes up the
vast majority of the time of xenalyze's dump mode -- even just doing
things like printing %s (which you would think would be more or less a
strcpy).  In this case (and the others in this patch) the sprintf()
should be the uncommon case, so they're fine.  But just a heads-up
that I'll be picky about sprintfs() in hot paths. :-)

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.