[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 1/2] xen/memory : Add a stats_table resource type


  • To: Matias Ezequiel Vara Larsen <matiasevara@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 16 Feb 2023 16:10:08 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=y8zktC+YTutI+6oJLdvcWE3CfjXCaj18oHFnrK2D7iI=; b=TAIuH6O63EWoq/DbrVRkgQeik2ZAyb4PzSUKH8mPnSp2T90Ox/g0YYD5M20tXw73oc4KobGTs8FrRxGedZVyWE+iwxJwJFbqEFFxvvwfcLNbAVxeU2QRJ4focwZZp3XtJTGjySA3ab/T6lCEkuWF3RpzeYeuG/NdPJqdI2aZXWBRyRnKgBcP1mk9JO+n+VcNPKMkMV35L52SWrI6ScYzk4ffAlqg7RZ2BHaf1qmFq97Kb6qvhWoN0Tv+W7Lp2fCCms9STKLD2FL/6ykHyFYAMwfHUgHhhpqKddQrsfITcLTWN3ZWf1dkd4hOolDUfqOUQYPjgkMJLx5j/FKv3Ff3Ng==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NMxAKk8eIaoITgXRSITXPURn777shLwDdi340fuukL0O7pnAUcfWbtfHCUNJ8o3FgdnT11BUqtlrt9VzpA5ahzx0uhqnkyNI2U1+0cW8pOwlTzR1fgL68dO8TSPtyHac0SsuscWAowcI5lKyt5xcmHHLktIhk1HK2MsN8ygv+gHQzF1iQlOrtxfDP7Wdj5iYxp47+MAswVQYn1eOgupSw4cCq675cfuho20qhYHNq+Lww1YsS40U0JZxyP7TyQUVmO0KIirXmeWMQmb7Did5FesSFORdtIkxEwcaBavESJ+J+NGOCadad0bcp5hRcQ4ZVCeSuXNeVyrizHRlSp4GGw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Matias Ezequiel Vara Larsen <matias.vara@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 16 Feb 2023 15:10:18 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.02.2023 15:48, Matias Ezequiel Vara Larsen wrote:
> On Tue, Dec 13, 2022 at 06:02:55PM +0100, Jan Beulich wrote:
>> On 07.10.2022 14:39, Matias Ezequiel Vara Larsen wrote:
>>> This commit proposes a new mechanism to query the RUNSTATE_running counter 
>>> for
>>> a given vcpu from a dom0 userspace application. This commit proposes to 
>>> expose
>>> that counter by using the acquire_resource interface. The current mechanism
>>> relies on the XEN_DOMCTL_getvcpuinfo and holds a single global domctl_lock 
>>> for
>>> the entire hypercall; and iterate over every vcpu in the system for every
>>> update thus impacting operations that share that lock.
>>>
>>> This commit proposes to expose vcpu RUNSTATE_running via the
>>> xenforeignmemory interface thus preventing to issue the hypercall and 
>>> holding
>>> the lock. For that purpose, a new resource type named stats_table is added. 
>>> The
>>> first frame of this resource stores per-vcpu counters. The frame has one 
>>> entry
>>> of type struct vcpu_stats per vcpu. The allocation of this frame only 
>>> happens
>>> if the resource is requested. The frame is released after the domain is
>>> destroyed.
>>>
>>> Note that the updating of this counter is in a hot path, thus, in this 
>>> commit,
>>> copying only happens if it is specifically required.
>>>
>>> Note that the exposed structure is extensible in two ways. First, the 
>>> structure
>>> vcpu_stats can be extended with new per-vcpu counters while it fits in a 
>>> frame.
>>
>> I'm afraid I don't see how this is "extensible". I would recommend that
>> you outline for yourself how a change would look like to actually add
>> such a 2nd counter. While doing that keep in mind that whatever changes
>> you make may not break existing consumers.
>>
>> It's also not clear what you mean with "fits in a frame": struct
>> shared_vcpustatspage is a container for an array with a single element.
>> I may guess (looking at just the public interface) that this really is
>> meant to be a flexible array (and hence should be marked as such - see
>> other uses of XEN_FLEX_ARRAY_DIM in the public headers). Yet if that's
>> the case, then a single page already won't suffice for a domain with
>> sufficiently many vCPU-s.
>>
> 
> I taclked this by using "d->max_vcpus" to calculate the number of required 
> frames
> to allocate for a given guest. Also, I added a new type-specific resource 
> named
> XENMEM_resource_stats_table_id_vcpustats to host per-vcpu counters. I
> completely forgot the "id" in the previous series.

May I suggest that before you submit a new version of your patches, you
make yourself (and then perhaps submit for commenting) a layout of the
data structures you want to introduce, including how they interact and
what "granularity" (global, per-domain, per-vCPU, per-pCPU, or alike)
they are. While doing that, as previously suggested, put yourself in
the position of someone later wanting to add another counter. With the
initial logic there, such an extension should then end up being pretty
mechanical, or else the arrangement likely needs further adjustment.

>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -741,6 +741,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
>>>  
>>>      ioreq_server_destroy_all(d);
>>>  
>>> +    stats_free_vcpu_mfn(d);
>>
>> How come this lives here? Surely this new feature should be not only
>> guest-type independent, but also arch-agnostic? Clearly you putting
>> the new data in struct domain (and not struct arch_domain or yet
>> deeper in the hierarchy) indicates you may have been meaning to make
>> it so.
>>
> 
> The whole feature shall to be guest-type independent and also arch-agnostic.
> Would it be better to put it at xen/common/domain.c:domain_kill()?

Likely, and the earlier this is (safely) possible, the better.

>>> @@ -1162,6 +1171,88 @@ static int acquire_vmtrace_buf(
>>>      return nr_frames;
>>>  }
>>>  
>>> +void stats_free_vcpu_mfn(struct domain * d)
>>> +{
>>> +    struct page_info *pg = d->vcpustats_page.pg;
>>> +
>>> +    if ( !pg )
>>> +        return;
>>> +
>>> +    d->vcpustats_page.pg = NULL;
>>> +
>>> +    if ( d->vcpustats_page.va )
>>> +        unmap_domain_page_global(d->vcpustats_page.va);
>>> +
>>> +    d->vcpustats_page.va = NULL;
>>
>> We ought to gain UNMAP_DOMAIN_PAGE_GLOBAL() for purposes like this one,
>> paralleling UNMAP_DOMAIN_PAGE().
>>
> 
> I do not understand this comment. Could you elaborate it?

The last four lines of code would better be collapsed to a single one,
using the mentioned yet-to-be-introduced construct. I assume you did
look up UNMAP_DOMAIN_PAGE() to spot its difference from
unmap_domain_page()?

>>> +static int stats_vcpu_alloc_mfn(struct domain *d)
>>> +{
>>> +    struct page_info *pg;
>>> +
>>> +    pg = alloc_domheap_page(d, MEMF_no_refcount);
>>> +
>>> +    if ( !pg )
>>> +        return -ENOMEM;
>>> +
>>> +    if ( !get_page_and_type(pg, d, PGT_writable_page) ) {
>>
>> Style: Brace placement (more elsewhere).
>>
>>> +        put_page_alloc_ref(pg);
>>
>> This is not allowed when what you may put is the last reference.
>> See other examples we have in the tree.
>>
> 
> I do not understand this comment. Could you point me to an example? I used
> ioreq_server_alloc_mfn() as example but it may not be a good example. 

That's an okay example; what's not okay is that you altered what is
done there. There is a reason that the other function doesn't use
put_page_alloc_ref() like you do. And I would assume you've looked
up put_page_alloc_ref() and found the comment there that explains
things.

>>> +        return -ENODATA;
>>> +    }
>>> +
>>> +    d->vcpustats_page.va = __map_domain_page_global(pg);
>>> +    if ( !d->vcpustats_page.va )
>>> +        goto fail;
>>> +
>>> +    d->vcpustats_page.pg = pg;
>>> +    clear_page(d->vcpustats_page.va);
>>
>> I guess this should happen before you globally announce the
>> address.
>>
> 
> If I understand correctly, I should invoke clear_page() before I assign the
> address to "d->vcpustats_page.va". Am I right?

Yes.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.