[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] x86/NUMA: improve memnode_shift calculation for multi node system


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 27 Sep 2022 18:20:35 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Xl2C6p9PWbUD9e4TXc1jSPxAzi12PvEP2PHorAUQ7xw=; b=JDKhCNe5oQ3SL/dQoc8+PhGajadutM/FwrhdYyeFnrjOhWYw/x15h59EuQ3VCc61ZVEVkTtH/IpP2Y/zZ8kItdF4jf9nFiIpirnPvrBv7iIlYcTta+E5cnOBZ0n3nHtkUL6bqKLHazBVvwGMqlMwj+TwJ6BbvsCUXis4CfNWH8h031P8FcO+AYJIZbsRWbBRxPqrfskMhNM+AuzkWLP9b9Hw6cbZmc+SH58kNU1n3owVPmO0Prikowsi44b8ExliMoYoy/HORYHucmHKlRnjlpVxTcxPBIwZ0tp1ExRswz5yBPjADsfTRC4n+9k56jaNN4yojA27LOWE1Ocj/8zvvQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dwSjnMmWzlFYE1uyqgmhxailnwwf0bXm3l+o6stM8IUokyaTCWbdM7VJTzCnZbScGiIQw5YwjrLln4qGVQt1czE2D4ewTwl4tji/tCnzp9JPqrYgHN05LM1dneErKHKAZXiiapmAa7QnpypXg3Mec97rs4QsidJpK2g1dxD5hssgG/yf6jcoVZiuHOg1OcgqOBJYxx7gRTQk+6k0qxmfAIdrV9C4YnwnzAduIbaKR18rEHvNpCDuVAn+BFpSUohTB89eXg7zcCvcpivq3cu5wFltGQIDxyDJ+VU+TFt2v4fKM+aEqJlCxaBEczavtkfk4O0i3Rz5jN83ZYxiwDVkwQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 27 Sep 2022 16:20:47 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

SRAT may describe individual nodes using multiple ranges. When they're
adjacent (with or without a gap in between), only the start of the first
such range actually needs accounting for. Furthermore the very first
range doesn't need considering of its start address at all, as it's fine
to associate all lower addresses (with no memory) with that same node.
For this to work, the array of ranges needs to be sorted by address -
adjust logic accordingly in acpi_numa_memory_affinity_init().

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
On my Dinar and Rome systems this changes memnodemapsize to a single
page. Originally they used 9 / 130 pages (with shifts going from 8 / 6
to 15 / 16) respectively, resulting from lowmem gaps [A0,FF] / [A0,BF].

This goes on top of "x86/NUMA: correct memnode_shift calculation for
single node system".

--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -127,7 +127,8 @@ static int __init extract_lsb_from_nodes
         epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
         if ( spdx >= epdx )
             continue;
-        bitfield |= spdx;
+        if ( i && (!nodeids || nodeids[i - 1] != nodeids[i]) )
+            bitfield |= spdx;
         if ( !i || !nodeids || nodeids[i - 1] != nodeids[i] )
             nodes_used++;
         if ( epdx > memtop )
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -312,6 +312,7 @@ acpi_numa_memory_affinity_init(const str
        unsigned pxm;
        nodeid_t node;
        unsigned int i;
+       bool next = false;
 
        if (srat_disabled())
                return;
@@ -413,14 +414,37 @@ acpi_numa_memory_affinity_init(const str
               node, pxm, start, end - 1,
               ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
 
-       node_memblk_range[num_node_memblks].start = start;
-       node_memblk_range[num_node_memblks].end = end;
-       memblk_nodeid[num_node_memblks] = node;
+       /* Keep node_memblk_range[] sorted by address. */
+       for (i = 0; i < num_node_memblks; ++i)
+               if (node_memblk_range[i].start > start ||
+                   (node_memblk_range[i].start == start &&
+                    node_memblk_range[i].end > end))
+                       break;
+
+       memmove(&node_memblk_range[i + 1], &node_memblk_range[i],
+               (num_node_memblks - i) * sizeof(*node_memblk_range));
+       node_memblk_range[i].start = start;
+       node_memblk_range[i].end = end;
+
+       memmove(&memblk_nodeid[i + 1], &memblk_nodeid[i],
+               (num_node_memblks - i) * sizeof(*memblk_nodeid));
+       memblk_nodeid[i] = node;
+
        if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
-               __set_bit(num_node_memblks, memblk_hotplug);
+               next = true;
                if (end > mem_hotplug)
                        mem_hotplug = end;
        }
+       for (; i <= num_node_memblks; ++i) {
+               bool prev = next;
+
+               next = test_bit(i, memblk_hotplug);
+               if (prev)
+                       __set_bit(i, memblk_hotplug);
+               else
+                       __clear_bit(i, memblk_hotplug);
+       }
+
        num_node_memblks++;
 }
 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.