Editing Non-uniform memory access (section)

== {{Anchor|Basic concept}}Overview ==
[[Image:NUMA.svg|right|300px|thumb|One possible architecture of a NUMA system. The processors connect to the bus or crossbar by connections of varying thickness/number. This shows that different CPUs have different access priorities to memory based on their relative location.]]

Modern CPUs operate considerably faster than the main memory they use. In the early days of computing and data processing, the CPU generally ran slower than its own memory. The performance lines of processors and memory crossed in the 1960s with the advent of the first [[supercomputer]]s. Since then, CPUs increasingly have found themselves "starved for data" and having to stall while waiting for data to arrive from memory (e.g. for Von-Neumann architecture-based computers, see [[Von Neumann architecture#Von Neumann bottleneck|Von Neumann bottleneck]]). Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.

Limiting the number of memory accesses provided the key to extracting high performance from a modern computer. For commodity processors, this meant installing an ever-increasing amount of high-speed [[cache memory]] and using increasingly sophisticated algorithms to avoid [[cache miss]]es. But the dramatic increase in size of the operating systems and of the applications run on them has generally overwhelmed these cache-processing improvements. Multi-processor systems without NUMA make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access the computer's memory at a time.<ref>{{cite web
 | url = https://www.usenix.org/legacy/event/atc11/tech/final_files/Blagodurov.pdf
 | title = A Case for NUMA-aware Contention Management on Multicore Systems
 | date = 2011-05-02 | access-date = 2014-01-27
 | author1 = Sergey Blagodurov | author2 = Sergey Zhuravlev | author3 = Mohammad Dashti | author4 = Alexandra Fedorov
 | publisher = Simon Fraser University }}</ref>

NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for [[Server (computing)|server]]s and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks).<ref name="acm-zmajo">{{cite web
 | url = http://people.inf.ethz.ch/zmajo/publications/11-systor.pdf
 | title = Memory System Performance in a NUMA Multicore Multiprocessor
 | year = 2011
 | access-date = 2014-01-27
 | author1 = Zoltan Majo
 | author2 = Thomas R. Gross
 | publisher = ACM
 | archive-url = https://web.archive.org/web/20130612210800/http://people.inf.ethz.ch/zmajo/publications/11-systor.pdf
 | archive-date = 2013-06-12
 | url-status = dead
 }}</ref> Another approach to addressing this problem is the [[multi-channel memory architecture]], in which a linear increase in the number of memory channels increases the memory access concurrency linearly.<ref>{{cite web
 | publisher = Infineon Technologies North America and Kingston Technology
 | date = September 2003
 | url = http://www.kingston.com/newtech/MKF_520DDRwhitepaper.pdf
 | archive-url = https://web.archive.org/web/20110929024052/http://www.kingston.com/newtech/MKF_520DDRwhitepaper.pdf
 | title = Intel Dual-Channel DDR Memory Architecture White Paper
 | edition = Rev. 1.0 | format = PDF, 1021&nbsp;[[kilobyte|KB]]
 | access-date = 2007-09-06 | archive-date = 2011-09-29}}
</ref>

Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware or software to move data between memory banks. This operation slows the processors attached to those banks, so the overall speed increase due to NUMA heavily depends on the nature of the running tasks.<ref name="acm-zmajo" />