[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/NUMA: improve memnode_shift calculation for multi node system
On Tue, Sep 27, 2022 at 06:20:35PM +0200, Jan Beulich wrote: > SRAT may describe individual nodes using multiple ranges. When they're > adjacent (with or without a gap in between), only the start of the first > such range actually needs accounting for. Furthermore the very first > range doesn't need considering of its start address at all, as it's fine > to associate all lower addresses (with no memory) with that same node. > For this to work, the array of ranges needs to be sorted by address - > adjust logic accordingly in acpi_numa_memory_affinity_init(). Speaking for myself (due to the lack of knowledge of the NUMA stuff) I would benefit from a bit of context on why and how memnode_shift is used. > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > On my Dinar and Rome systems this changes memnodemapsize to a single > page. Originally they used 9 / 130 pages (with shifts going from 8 / 6 > to 15 / 16) respectively, resulting from lowmem gaps [A0,FF] / [A0,BF]. > > This goes on top of "x86/NUMA: correct memnode_shift calculation for > single node system". > > --- a/xen/arch/x86/numa.c > +++ b/xen/arch/x86/numa.c > @@ -127,7 +127,8 @@ static int __init extract_lsb_from_nodes > epdx = paddr_to_pdx(nodes[i].end - 1) + 1; > if ( spdx >= epdx ) > continue; > - bitfield |= spdx; > + if ( i && (!nodeids || nodeids[i - 1] != nodeids[i]) ) > + bitfield |= spdx; > if ( !i || !nodeids || nodeids[i - 1] != nodeids[i] ) > nodes_used++; > if ( epdx > memtop ) > --- a/xen/arch/x86/srat.c > +++ b/xen/arch/x86/srat.c > @@ -312,6 +312,7 @@ acpi_numa_memory_affinity_init(const str > unsigned pxm; > nodeid_t node; > unsigned int i; > + bool next = false; > > if (srat_disabled()) > return; > @@ -413,14 +414,37 @@ acpi_numa_memory_affinity_init(const str > node, pxm, start, end - 1, > ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : ""); > > - node_memblk_range[num_node_memblks].start = start; > - node_memblk_range[num_node_memblks].end = end; > - memblk_nodeid[num_node_memblks] = node; > + /* Keep node_memblk_range[] sorted by address. */ > + for (i = 0; i < num_node_memblks; ++i) > + if (node_memblk_range[i].start > start || > + (node_memblk_range[i].start == start && Maybe I'm confused, but won't .start == start means we have overlapping ranges? > + node_memblk_range[i].end > end)) > + break; > + > + memmove(&node_memblk_range[i + 1], &node_memblk_range[i], > + (num_node_memblks - i) * sizeof(*node_memblk_range)); > + node_memblk_range[i].start = start; > + node_memblk_range[i].end = end; > + > + memmove(&memblk_nodeid[i + 1], &memblk_nodeid[i], > + (num_node_memblks - i) * sizeof(*memblk_nodeid)); > + memblk_nodeid[i] = node; > + > if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { > - __set_bit(num_node_memblks, memblk_hotplug); > + next = true; > if (end > mem_hotplug) > mem_hotplug = end; > } > + for (; i <= num_node_memblks; ++i) { > + bool prev = next; > + > + next = test_bit(i, memblk_hotplug); > + if (prev) > + __set_bit(i, memblk_hotplug); > + else > + __clear_bit(i, memblk_hotplug); Nit: I think you could avoid doing the clear for the last bit, ie: else if (i != num_node_memblks) __clear_bit(...); But I'm not sure it's worth adding the logic, just makes it more complicated to follow. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |