[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] Fix XEN_SYSCTL_numainfo[node].memsize for memory holes
On 24.09.2024 12:14, Bernhard Kaindl wrote: > Fix a long-standing issue (known at since 2014) with the numainfo call. > > The Hypercall `XEN_SYSCTL_numainfo` returns `memsize` for each NUMA node: > > xl info -n: > node: memsize memfree distances > 0: -> 67584 <- 60672 10,21 <- memsize is off by 2048 MB > 1: 65536 60958 21,10 > > So far, `memsize` is calculated from `NODE_DATA->node_spanned_pages`. > It includes memory holes, leading to wrong memsize on x86. Depending on what "memsize" means, it is or is not wrong that way. I'm not sure we can change it like that, at the very least not without bumping the interface version and proving that in-tree uses (if any) are either unaffected or improved. > This patch gets the sum of E820_RAM entries for each NUMA node on boot, > stores it in NODE_DATA->node_present_pages and uses it for `memsize`. > > It also increases it like `total_pages` on memory_add() for memory hotplug. > > The new NODE_DATA->node_present_pages can be slighly lower than the > physical node's RAM due to reserved memory for some of the NUMA nodes. The introduction and maintenance of ->node_present_pages wants to be a separate, prereq change imo. > --- a/xen/arch/x86/x86_64/mm.c > +++ b/xen/arch/x86/x86_64/mm.c > @@ -1333,6 +1333,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, > unsigned int pxm) > /* We can't revert any more */ > share_hotadd_m2p_table(&info); > transfer_pages_to_heap(&info); > + /* Update the node's present pages (like the total_pages of the system) > */ > + NODE_DATA(node)->node_present_pages += epfn - spfn; Nit: Blank line ahead of the insertion please. > --- a/xen/common/numa.c > +++ b/xen/common/numa.c > @@ -504,10 +504,22 @@ void __init setup_node_bootmem(nodeid_t nodeid, paddr_t > start, paddr_t end) > { > unsigned long start_pfn = paddr_to_pfn(start); > unsigned long end_pfn = paddr_to_pfn(end); > + paddr_t map_start, map_end; > + int i = 0, err; arch_get_ram_range()'s first parameter is unsigned int. > NODE_DATA(nodeid)->node_start_pfn = start_pfn; > NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn; > > + /* Add RAM pages within the node to get the present pages for memsize > infos */ > + NODE_DATA(nodeid)->node_present_pages = 0; > + while ( (err = arch_get_ram_range(i++, &map_start, &map_end)) != -ENOENT > ) { > + if ( err || map_start >= end || map_end <= start ) > + continue; /* Skip non-RAM and maps outside of the node's memory > range */ > + /* Add memory that is in the node's memory range (within start and > end): */ > + map_start = max(map_start, start); > + map_end = min(map_end, end); > + NODE_DATA(nodeid)->node_present_pages += (map_end - map_start) >> > PAGE_SHIFT; > + } Style (whole block): Brace placement, line length. I'm also not convinced the actual calculation is correct: Neither map_start nor map_end need to be page aligned aiui, and hence the present result doesn't necessarily give the actual number of pages (that are usable, and hence meaningful to the consumer of the field). Blank line here please. > @@ -675,7 +687,7 @@ static void cf_check dump_numa(unsigned char key) > mfn_t mfn = _mfn(node_start_pfn(i) + 1); > > printk("NODE%u start->%lu size->%lu free->%lu\n", > - i, node_start_pfn(i), node_spanned_pages(i), > + i, node_start_pfn(i), node_present_pages(i), "size" here really can mean two things. I'd suggest to keep printing node_spanned_pages() and add printing of node_present_pages(). Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |