[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 1/2] xen/memory : Add a stats_table resource type


  • To: Matias Ezequiel Vara Larsen <matiasevara@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 13 Dec 2022 18:02:55 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tCM7S0P3p1mzg3sbyO5EYEk6eyDEULRwMujHuRSaYMY=; b=ZL0K9ewv6wzwboibBBiv/gqV8K3krZ431fHEI3kcQ703ZY7/1Fr2owJKvhERZDQ8lCGkL587XGktNOv4WX/sf9WPS2ZFHAMTr+dyqIwcT7PzutV5fZXQzowTcI6wGK0CmE4eOCHQllZTFZx4HgLFFsip5cKAr07FoHOSIZGAMMAs/TTuVzIiujOGNyyawIhUCOde0aZ5jnwKjVbbSi9ZvBlxs+0xFfO9sWvmcumHlAROfD0hhs56N+WwCqKSGpaePZxy0S0cxjUCg4ssvVikCNZxLOL3TJPxLtNzVoF2FqHk547p95CU2hRLqyGa1Cinqy/ZVT4Cf6rgQrsL5yeDHA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DOcy0X4scPSfzOGnEeWz3Hm4ejvsz2qHYua6F+El4ZV4RJdc4aY/4dsZj0CR44siGBzJ0o1uNcPdn7CWcLEs4MdAIixnIRi1fompKb+gRGEimFnSxRFXTa3o9Jtp+TpO5u252pVmmo3iHq0fcnabibCk64lroL4lylZlWq1jSGbEJK+1McfiwyQIL3mKT6k+Nd5KKp0B8sg3oWWr0e5O9+3qm1YNGpk+QKDR40rtGuXaYdJAI5pxsLi1hBkc3t2nGqvGfzLNutY2lU/cUv/fedQBy+ut1O9iPwBBlkkn7J7VnLmigDYjWEPawYT4ndt4fYEG9qZcJW94B0WJD9O98g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Matias Ezequiel Vara Larsen <matias.vara@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 13 Dec 2022 17:03:06 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.10.2022 14:39, Matias Ezequiel Vara Larsen wrote:
> This commit proposes a new mechanism to query the RUNSTATE_running counter for
> a given vcpu from a dom0 userspace application. This commit proposes to expose
> that counter by using the acquire_resource interface. The current mechanism
> relies on the XEN_DOMCTL_getvcpuinfo and holds a single global domctl_lock for
> the entire hypercall; and iterate over every vcpu in the system for every
> update thus impacting operations that share that lock.
> 
> This commit proposes to expose vcpu RUNSTATE_running via the
> xenforeignmemory interface thus preventing to issue the hypercall and holding
> the lock. For that purpose, a new resource type named stats_table is added. 
> The
> first frame of this resource stores per-vcpu counters. The frame has one entry
> of type struct vcpu_stats per vcpu. The allocation of this frame only happens
> if the resource is requested. The frame is released after the domain is
> destroyed.
> 
> Note that the updating of this counter is in a hot path, thus, in this commit,
> copying only happens if it is specifically required.
> 
> Note that the exposed structure is extensible in two ways. First, the 
> structure
> vcpu_stats can be extended with new per-vcpu counters while it fits in a 
> frame.

I'm afraid I don't see how this is "extensible". I would recommend that
you outline for yourself how a change would look like to actually add
such a 2nd counter. While doing that keep in mind that whatever changes
you make may not break existing consumers.

It's also not clear what you mean with "fits in a frame": struct
shared_vcpustatspage is a container for an array with a single element.
I may guess (looking at just the public interface) that this really is
meant to be a flexible array (and hence should be marked as such - see
other uses of XEN_FLEX_ARRAY_DIM in the public headers). Yet if that's
the case, then a single page already won't suffice for a domain with
sufficiently many vCPU-s.

> Second, new frames can be added in case new counters are required.

Are you talking of "new counters" here which aren't "new per-vcpu
counters"? Or else what's the difference from the 1st way?

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -741,6 +741,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
>  
>      ioreq_server_destroy_all(d);
>  
> +    stats_free_vcpu_mfn(d);

How come this lives here? Surely this new feature should be not only
guest-type independent, but also arch-agnostic? Clearly you putting
the new data in struct domain (and not struct arch_domain or yet
deeper in the hierarchy) indicates you may have been meaning to make
it so.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1078,6 +1078,12 @@ unsigned int ioreq_server_max_frames(const struct 
> domain *d)
>      return nr;
>  }
>  
> +unsigned int stats_table_max_frames(const struct domain *d)
> +{
> +    /* One frame per 512 vcpus. */
> +    return 1;
> +}

As alluded to earlier already - 1 isn't going to be suitable for
arbitrary size domains. (Yes, HVM domains are presently limited to
128 vCPU-s, but as per above this shouldn't be a HVM-only feature.)

> @@ -1162,6 +1171,88 @@ static int acquire_vmtrace_buf(
>      return nr_frames;
>  }
>  
> +void stats_free_vcpu_mfn(struct domain * d)
> +{
> +    struct page_info *pg = d->vcpustats_page.pg;
> +
> +    if ( !pg )
> +        return;
> +
> +    d->vcpustats_page.pg = NULL;
> +
> +    if ( d->vcpustats_page.va )
> +        unmap_domain_page_global(d->vcpustats_page.va);
> +
> +    d->vcpustats_page.va = NULL;

We ought to gain UNMAP_DOMAIN_PAGE_GLOBAL() for purposes like this one,
paralleling UNMAP_DOMAIN_PAGE().

> +    put_page_alloc_ref(pg);
> +    put_page_and_type(pg);
> +}
> +
> +static int stats_vcpu_alloc_mfn(struct domain *d)
> +{
> +    struct page_info *pg;
> +
> +    pg = alloc_domheap_page(d, MEMF_no_refcount);
> +
> +    if ( !pg )
> +        return -ENOMEM;
> +
> +    if ( !get_page_and_type(pg, d, PGT_writable_page) ) {

Style: Brace placement (more elsewhere).

> +        put_page_alloc_ref(pg);

This is not allowed when what you may put is the last reference.
See other examples we have in the tree.

> +        return -ENODATA;
> +    }
> +
> +    d->vcpustats_page.va = __map_domain_page_global(pg);
> +    if ( !d->vcpustats_page.va )
> +        goto fail;
> +
> +    d->vcpustats_page.pg = pg;
> +    clear_page(d->vcpustats_page.va);

I guess this should happen before you globally announce the
address.

> +    return 1;

Functions returning -errno on error want to return 0 on success,
unless e.g. multiple success indicators are needed.

> +fail:

Style: Label indentation.

> +    put_page_alloc_ref(pg);
> +    put_page_and_type(pg);
> +
> +    return -ENOMEM;
> +}
> +
> +static int acquire_stats_table(struct domain *d,
> +                                unsigned int id,
> +                                unsigned int frame,
> +                                unsigned int nr_frames,
> +                                xen_pfn_t mfn_list[])

Style: Indentation.

> +{
> +    mfn_t mfn;
> +    int rc;
> +    unsigned int i;
> +
> +    if ( !d )
> +        return -ENOENT;
> +
> +    for ( i = 0; i < nr_frames; i++ )
> +    {
> +        switch ( i )
> +        {
> +        case XENMEM_resource_stats_frame_vcpustats:

Isn't this supposed to be indexed by "id" (which presently you ignore
altogether, which can't be right)?

> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -264,6 +264,8 @@ static inline void vcpu_runstate_change(
>  {
>      s_time_t delta;
>      struct sched_unit *unit = v->sched_unit;
> +    shared_vcpustatspage_t * vcpustats_va;

Style: Stray blank (more elsewhere).

> @@ -287,6 +289,20 @@ static inline void vcpu_runstate_change(
>      }
>  
>      v->runstate.state = new_state;
> +
> +    vcpustats_va = (shared_vcpustatspage_t*)d->vcpustats_page.va;

There should be no need for a cast here.

> +    if ( vcpustats_va )
> +    {
> +     vcpustats_va->vcpu_info[v->vcpu_id].version =

Style: Hard tab.

> +         version_update_begin(vcpustats_va->vcpu_info[v->vcpu_id].version);
> +        smp_wmb();
> +        memcpy(&vcpustats_va->vcpu_info[v->vcpu_id].runstate_running_time,
> +               &v->runstate.time[RUNSTATE_running],
> +               sizeof(v->runstate.time[RUNSTATE_running]));

Why memcpy() and not a plain assignment?

> +        smp_wmb();
> +        vcpustats_va->vcpu_info[v->vcpu_id].version =
> +            version_update_end(vcpustats_va->vcpu_info[v->vcpu_id].version);
> +    }

Overall latching &vcpustats_va->vcpu_info[v->vcpu_id] into a helper
variable would likely help readability quite a bit.

> --- a/xen/include/public/vcpu.h
> +++ b/xen/include/public/vcpu.h
> @@ -235,6 +235,22 @@ struct vcpu_register_time_memory_area {
>  typedef struct vcpu_register_time_memory_area 
> vcpu_register_time_memory_area_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>  
> +struct vcpu_stats{

Style: Missing blank.

> +    /* If the least-significant bit of the version number is set then an 
> update

Style: Comment layout.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.