|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH v2 1/2] xen/memory : Add a stats_table resource type
On 07.10.2022 14:39, Matias Ezequiel Vara Larsen wrote:
> This commit proposes a new mechanism to query the RUNSTATE_running counter for
> a given vcpu from a dom0 userspace application. This commit proposes to expose
> that counter by using the acquire_resource interface. The current mechanism
> relies on the XEN_DOMCTL_getvcpuinfo and holds a single global domctl_lock for
> the entire hypercall; and iterate over every vcpu in the system for every
> update thus impacting operations that share that lock.
>
> This commit proposes to expose vcpu RUNSTATE_running via the
> xenforeignmemory interface thus preventing to issue the hypercall and holding
> the lock. For that purpose, a new resource type named stats_table is added.
> The
> first frame of this resource stores per-vcpu counters. The frame has one entry
> of type struct vcpu_stats per vcpu. The allocation of this frame only happens
> if the resource is requested. The frame is released after the domain is
> destroyed.
>
> Note that the updating of this counter is in a hot path, thus, in this commit,
> copying only happens if it is specifically required.
>
> Note that the exposed structure is extensible in two ways. First, the
> structure
> vcpu_stats can be extended with new per-vcpu counters while it fits in a
> frame.
I'm afraid I don't see how this is "extensible". I would recommend that
you outline for yourself how a change would look like to actually add
such a 2nd counter. While doing that keep in mind that whatever changes
you make may not break existing consumers.
It's also not clear what you mean with "fits in a frame": struct
shared_vcpustatspage is a container for an array with a single element.
I may guess (looking at just the public interface) that this really is
meant to be a flexible array (and hence should be marked as such - see
other uses of XEN_FLEX_ARRAY_DIM in the public headers). Yet if that's
the case, then a single page already won't suffice for a domain with
sufficiently many vCPU-s.
> Second, new frames can be added in case new counters are required.
Are you talking of "new counters" here which aren't "new per-vcpu
counters"? Or else what's the difference from the 1st way?
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -741,6 +741,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
>
> ioreq_server_destroy_all(d);
>
> + stats_free_vcpu_mfn(d);
How come this lives here? Surely this new feature should be not only
guest-type independent, but also arch-agnostic? Clearly you putting
the new data in struct domain (and not struct arch_domain or yet
deeper in the hierarchy) indicates you may have been meaning to make
it so.
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1078,6 +1078,12 @@ unsigned int ioreq_server_max_frames(const struct
> domain *d)
> return nr;
> }
>
> +unsigned int stats_table_max_frames(const struct domain *d)
> +{
> + /* One frame per 512 vcpus. */
> + return 1;
> +}
As alluded to earlier already - 1 isn't going to be suitable for
arbitrary size domains. (Yes, HVM domains are presently limited to
128 vCPU-s, but as per above this shouldn't be a HVM-only feature.)
> @@ -1162,6 +1171,88 @@ static int acquire_vmtrace_buf(
> return nr_frames;
> }
>
> +void stats_free_vcpu_mfn(struct domain * d)
> +{
> + struct page_info *pg = d->vcpustats_page.pg;
> +
> + if ( !pg )
> + return;
> +
> + d->vcpustats_page.pg = NULL;
> +
> + if ( d->vcpustats_page.va )
> + unmap_domain_page_global(d->vcpustats_page.va);
> +
> + d->vcpustats_page.va = NULL;
We ought to gain UNMAP_DOMAIN_PAGE_GLOBAL() for purposes like this one,
paralleling UNMAP_DOMAIN_PAGE().
> + put_page_alloc_ref(pg);
> + put_page_and_type(pg);
> +}
> +
> +static int stats_vcpu_alloc_mfn(struct domain *d)
> +{
> + struct page_info *pg;
> +
> + pg = alloc_domheap_page(d, MEMF_no_refcount);
> +
> + if ( !pg )
> + return -ENOMEM;
> +
> + if ( !get_page_and_type(pg, d, PGT_writable_page) ) {
Style: Brace placement (more elsewhere).
> + put_page_alloc_ref(pg);
This is not allowed when what you may put is the last reference.
See other examples we have in the tree.
> + return -ENODATA;
> + }
> +
> + d->vcpustats_page.va = __map_domain_page_global(pg);
> + if ( !d->vcpustats_page.va )
> + goto fail;
> +
> + d->vcpustats_page.pg = pg;
> + clear_page(d->vcpustats_page.va);
I guess this should happen before you globally announce the
address.
> + return 1;
Functions returning -errno on error want to return 0 on success,
unless e.g. multiple success indicators are needed.
> +fail:
Style: Label indentation.
> + put_page_alloc_ref(pg);
> + put_page_and_type(pg);
> +
> + return -ENOMEM;
> +}
> +
> +static int acquire_stats_table(struct domain *d,
> + unsigned int id,
> + unsigned int frame,
> + unsigned int nr_frames,
> + xen_pfn_t mfn_list[])
Style: Indentation.
> +{
> + mfn_t mfn;
> + int rc;
> + unsigned int i;
> +
> + if ( !d )
> + return -ENOENT;
> +
> + for ( i = 0; i < nr_frames; i++ )
> + {
> + switch ( i )
> + {
> + case XENMEM_resource_stats_frame_vcpustats:
Isn't this supposed to be indexed by "id" (which presently you ignore
altogether, which can't be right)?
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -264,6 +264,8 @@ static inline void vcpu_runstate_change(
> {
> s_time_t delta;
> struct sched_unit *unit = v->sched_unit;
> + shared_vcpustatspage_t * vcpustats_va;
Style: Stray blank (more elsewhere).
> @@ -287,6 +289,20 @@ static inline void vcpu_runstate_change(
> }
>
> v->runstate.state = new_state;
> +
> + vcpustats_va = (shared_vcpustatspage_t*)d->vcpustats_page.va;
There should be no need for a cast here.
> + if ( vcpustats_va )
> + {
> + vcpustats_va->vcpu_info[v->vcpu_id].version =
Style: Hard tab.
> + version_update_begin(vcpustats_va->vcpu_info[v->vcpu_id].version);
> + smp_wmb();
> + memcpy(&vcpustats_va->vcpu_info[v->vcpu_id].runstate_running_time,
> + &v->runstate.time[RUNSTATE_running],
> + sizeof(v->runstate.time[RUNSTATE_running]));
Why memcpy() and not a plain assignment?
> + smp_wmb();
> + vcpustats_va->vcpu_info[v->vcpu_id].version =
> + version_update_end(vcpustats_va->vcpu_info[v->vcpu_id].version);
> + }
Overall latching &vcpustats_va->vcpu_info[v->vcpu_id] into a helper
variable would likely help readability quite a bit.
> --- a/xen/include/public/vcpu.h
> +++ b/xen/include/public/vcpu.h
> @@ -235,6 +235,22 @@ struct vcpu_register_time_memory_area {
> typedef struct vcpu_register_time_memory_area
> vcpu_register_time_memory_area_t;
> DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
>
> +struct vcpu_stats{
Style: Missing blank.
> + /* If the least-significant bit of the version number is set then an
> update
Style: Comment layout.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |