[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] SYSCTL_numainfo.memsize: Switch spanned to present memory



On 03/12/2024 12:37, Jan Beulich wrote:
On 03.12.2024 12:12, Bernhard Kaindl wrote:
This the 2nd part of my submission to fix the NUMA node memsize
retured in xen_sysctl_meminfo[].memsize by the XEN_SYSCTL_numainfo
hypercall to not count MMIO memory holes etc but only memory pages.

For this, we introduced NODE_DATA->node_present_pages as a prereq.
With the prereq merged in master, I send this 2nd part for review:

This RFC is for changing the value of xen_sysctl_meminfo[]->memsize
from NODE_DATA->node_spanned_pages << PAGE_SHIFT
   to NODE_DATA->node_present_pages << PAGE_SHIFT
for returing total present NUMA node memory instead of spanned range.

Sample of struct xen_sysctl_meminfo[].* as presented by in xl info -n:

xl info -n:
[...]
node:    memsize    memfree    distances
    0:  -> 67584 <-   60672      10,21
    1:     65536      60958      21,10

The -> memsize <- marked here is the value that we'd like to fix:
The current value based node_spanned_pages is often 2TB too large.

We're currently not using these often false memsize values in XenServer
according to my code review and and Andrew seemed to confirm this as well.

I think that the same is likely true for other Xen toolstacks, but of course
to review this change or propose an alternaive is the purpose of this RFC.

Thanks,
Bernhard

All of the above reads like a cover letter. What's missing is a patch
description, part of which would be to clarify whether the field is
indeed unused except for display purposes, or why respective users would
at least not regress from this change. What's also unclear is what
comments you're actually after (i.e. what question(s) you want to have
answered), seeing this is tagged RFC.
[...]
Jan

Hi Jan!

The answer I'm looking for is which users to check, or to check with.

For example, I know that Xapi can use xen_sysctl_meminfo[].memfree to
get a preference about the NUMA node use use when creating a domain
(when the new mode `numa_affinity_policy.best_effort` is enabled):
https://xapi-project.github.io/new-docs/toolstack/features/NUMA/

A potential use of xen_sysctl_meminfo.memsize in Xen toolstacks is
less clear to me:

The only potential use would be if some Xen toolstack would not like
to solely rely on [nid].memfree for NUMA placement.

The question is if there are other NUMA aware toolstacks besides Xapi,
that would try to use it for e.g. planning the placement of domains.

My in the Xapi and Xen repos only turned up a debug printf() in
xen-api's xen-api/xenopsd and in xen only the output of xl info -n.

It seems questionable to me that any other toolstacks would rely on it,
especially as the value it returns currently is offset even 2GB on some
machines. I'd expect that this bug would have affected code using it.

The answers I am looking for are acknowledgements of that or references which users might use .memsize currently (that could be affected).

Alternatively, I'd hope to get an idea what would be the method to create a new revision of the numainfo hypercall:

I guess it would be to add a new #define XEN_SYSCTL_numainfo_v2,
and if v2 is called, return [].memsize using [nid].node_present_pages instead?

Kind regards,
Bernhard



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.