[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] x86/NUMA: correct off-by-1 in node map size calculation


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 27 Sep 2022 16:13:59 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KBwtU4JRJ9juidOk2OLyGM7RoZQpHcQkjF5H2maw6Qc=; b=PjoSQhmfI2PhhbN4JT/bFZYz9ep7I0o4kMkjtun8/8gVmQs//YaShcjb1q2G4vFlv6gpttHnb4w+r2x3RDGHr7KbaIvRLvlBiQA+eWJape6FG1FkeuY0XpdggkGnjn0mRVs2EDv8EePvs1OrFHjeu5YOtRilcrYAz0pZi9GyKG2CI3EQrfH0jM+h3k/THrSmWQl+1hP/CYuB7TXbzDif5gA5kETzLnlWd+ieT32WRRvqRHXYtPhB2VWqLscEfefBpFiIY/jmo4ivbN/Tq+d1ari+4lM4XcHyjQFjQUMuNQUeqzIjQ2HVcLrWYKwDoF9vQKq8ov8aZDLIPN5qlyOIyQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=C5O/9Ld5XvMqhhS7Aky6V5V/3rEtLCZRRmKWK4sWZUbqH/vCXVsxpO2i7/NyUjuWrM5eRJs8JhjeKKx/fRPbk5u9yX0FExONmgXOFEFZo1oBRnjWT/DceIiIHcy29TSEqwdJh9junW0qArYmXzr2GxIdpZbSig+RPmrXQKyevDh0FaM7pCKFukB/F2hLVZ4rVw9XuS3UJ399jkQB/RuLlLQ2uq4/edRS3ousiyHG7RKrrAEShgpIfsq/ut7bQdOk3ZOXRcvyzyHRc5ftony5IV77xPqF2Zx8smw+KscRYXhhnKNFvJ7MqV+YUwc3PcvhmgXaE2QTDMntYrZHAX04HA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Delivery-date: Tue, 27 Sep 2022 14:14:13 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

extract_lsb_from_nodes() accumulates "memtop" from all PDXes one past
the covered ranges. Hence the maximum address which can validly by used
to index the node map is one below this value, and we may currently set
up a node map with an unused (and never initialized) trailing entry. In
boundary cases this may also mean we dynamically allocate a page when
the static (64-entry) map would suffice.

While there also correct the comment ahead of the function, for it to
match the actual code: Linux commit 54413927f022 ("x86-64:
x86_64-make-the-numa-hash-function-nodemap-allocation fix fix") removed
the ORing in of the end address before we actually cloned their code.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
Really the shift value may end up needlessly small when there's
discontiguous memory. Within a gap, any address can be taken for the
node boundary, and hence neither the end of the lower range nor the
start of the higher range necessarily is the best address to use. For
example with these two node ranges (numbers are frame addresses)

[10000,17fff]
[28000,2ffff]

we'd calculate the shift as 15 when 16 or even 17 (because the start of
the 1st range can also be ignored) would do. I haven't tried to properly
prove it yet, but it looks to me as if the top bit of the XOR of lower
range (inclusive) end and higher range start would be what would want
accumulating (of course requiring the entries to be sorted, or to be
processed in address order). This would then "naturally" exclude lowest
range start and highest range end.

--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -110,7 +110,7 @@ static int __init allocate_cachealigned_
 }
 
 /*
- * The LSB of all start and end addresses in the node map is the value of the
+ * The LSB of all start addresses in the node map is the value of the
  * maximum possible shift.
  */
 static int __init extract_lsb_from_nodes(const struct node *nodes,
@@ -135,7 +135,7 @@ static int __init extract_lsb_from_nodes
         i = BITS_PER_LONG - 1;
     else
         i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
-    memnodemapsize = (memtop >> i) + 1;
+    memnodemapsize = ((memtop - 1) >> i) + 1;
     return i;
 }
 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.