[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Logical NUMA error during boot, and RFC patch



On 28/06/12 10:51, Jan Beulich wrote:
>>>> On 27.06.12 at 21:10, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> XenServer have recently acquired a quad-socket AMD Interlagos server and
>> I have been playing around with it, and discovered a logical error in
>> how Xen detects numa nodes.
>>
>> The server has 8 NUMA nodes, 4 of which have memory attached (the even
>> nodes - see SRAT.dsl attached).  This means that that
>> node_set_online(nodeid) gets called for each node with memory attached. 
>> Later, in srat_detect_node(), node gets set to 0 if it was NUMA_NO_NODE,
>> or if not node_online().  This leads to all the processors on the odd
>> nodes being assigned to node 0, even though the odd nodes are present
>> (see interlagos-xl-info-n.log)
>>
>> I present an RFC patch which changes srat_detect_node() to call
>> node_set_online() for each node, which appears to fix the logic.
>>
>> Is this a sensible place to set the node online, or is there a better
>> way to fix this logic?
> While the place looks sensible, it has the possible problem of
> potentially adding bits to the online map pretty late in the game.
>
> As the memory-related invocations of node_set_online() come
> out of numa_initmem_init()/acpi_scan_nodes(), perhaps the
> (boot time) CPU-related ones should be done there too (I'd
> still keep the adjustment you're already doing, to also cover
> hotplug CPUs)?
>
> Jan
>

I have been doing quite a bit more testing this morning, and have come
to some sad conclusions.

This specific server is a Dell R815 loner, with 8x4GiB DIMMs, with 2
DIMMs hanging off each socket.  As each socket is an interlagos
processor, there are 4 memory controllers (with 8 DIMM slots as they are
dual channel)

What this means is that per socket, one node has half of its available
DIMMs filled, and the other node has no memory.  The performance
implications are severe, but as it appears that almost all combinations
of RAM you can select on the Dell website will leads to poor or worse
performance, I can foresee many systems like this in the future.  (I
don't wish to single Dell out here, other than it happens to be the
provider of the server I am testing.  Other server providers suffer the
same issues)

As to the problem at hand, I will investigate the numa code some more
and see about setting it up earlier.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.