Xen project Mailing List

Re: [Xen-devel] Re: NUMA and SMP

To: "Petersson, Mats" <Mats.Petersson@xxxxxxx>

From: Emmanuel Ackaouy <ack@xxxxxxxxxxxxx>

Date: Tue, 16 Jan 2007 14:55:37 +0100

Cc: Anthony Liguori <aliguori@xxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, David Pilger <pilger.david@xxxxxxxxx>, Ryan Harper <ryanh@xxxxxxxxxx>

Delivery-date: Tue, 16 Jan 2007 05:55:22 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On the topic of NUMA: I'd like to dispute the assumption that a NUMA-aware OS can actually make good decisions about the initial placement of memory in a reasonable hardware ccNUMA system. How does the OS know on which node a particular chunk of memory will be most accessed? The truth is that unless the application or person running the application is herself NUMA-aware and can provide placement hints or directives, the OS will seldom beat a round-robin / interleave or random placement strategy. To illustrate, consider an app which lays out a bunch of data in memory in a single thread and then spawns worker threads to process it.

Is the OS to place memory close to the initial thread? How can it

possibly

know how many threads will eventually process the data? Even if the OS knew how many threads will eventually crunch the data, it cannot possibly know at placement time if each thread will work on an

assigned data subset (and if so, which one) or if it will act as a

pipeline

stage with all the data being passed from one thread to the next. If you go beyond initial memory placement or start considering memory migration, then it's even harder to win because you have to pay copy and stall penalties during migrations. So you have to be real smart about predicting the future to do better than your ~10-40% memory bandwidth and latency hit associated with doing simple memory interleaving on a modern hardware-ccNUMA system. And it gets worse for you when your app is successfully taking advantage of the memory cache hierarchy because its performance is less impacted by raw memory latency and bandwidth. Things also get more difficult on a time-sharing host with competing apps. There is a strong argument for making hypervisors and OSes NUMA aware in the sense that: 1- They know about system topology

2- They can export this information up the stack to applications and

users

3- They can take in directives from users and applications to partition

the

host and place some threads and memory in specific partitions. 4- They use an interleaved (or random) initial memory placement strategy by default. The argument that the OS on its own -- without user or application directives -- can make better placement decisions than round-robin or random placement is -- in my opinion -- flawed. I also am skeptical that the complexity associated with page migration strategies would be worthwhile: If you got it wrong the first time, what makes you think you'll do better this time? Emmanuel. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.