Editing Von Neumann architecture (section)

===von Neumann bottleneck===<!-- linked in [[Random-access memory]] -->
The use of the same bus to fetch instructions and data leads to the ''von Neumann bottleneck'', the limited [[throughput]] (data transfer rate) between the [[central processing unit]] (CPU) and memory compared to the amount of memory.  Because the single bus can only access one of the two classes of memory at a time, throughput is lower than the rate at which the CPU can work.  This seriously limits the effective processing speed when the CPU is required to perform minimal processing on large amounts of data.  The CPU is continually<!-- http://en.wiktionary.org/wiki/continual#Usage_notes --> [[Wait state|forced to wait]] for needed data to move to or from memory.  Since CPU speed and memory size have increased much faster than the throughput between them, the bottleneck has become more of a problem, a problem whose severity increases with every new generation of CPU.

The von Neumann bottleneck was described by [[John Backus]] in his 1977 ACM [[Turing Award]] lecture.  According to Backus:

<blockquote>Surely there must be a less primitive way of making big changes in the store than by pushing vast numbers of [[Word (computer architecture)|words]] back and forth through the von Neumann bottleneck. Not only is this tube a literal bottleneck for the data traffic of a problem, but, more importantly, it is an intellectual bottleneck that has kept us tied to word-at-a-time thinking instead of encouraging us to think in terms of the larger conceptual units of the task at hand. Thus programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant data itself, but where to find it.<ref name="backus">{{cite journal |doi=10.1145/359576.359579 |s2cid=16367522 |title=Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs |author-last=Backus |author-first=John W. |author-link=John Backus |journal=Communications of the ACM |volume=21 |issue=8 |date=August 1978 |pages=613–641 |doi-access=free }}</ref><ref>{{cite web |url=http://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD692.html |title=E. W. Dijkstra Archive: A review of the 1977 Turing Award Lecture |access-date=2008-07-11 |author-first=Edsger W. |author-last=Dijkstra |author-link=Edsger W. Dijkstra}}</ref></blockquote>

====Mitigations====
{{update section|reason=Everything in the list has been implemented in ordinary desktop computers to some degree, often extensively.  NUMA has been common on multichip-module workstation processors for years now.  Are they helping this problem?  Has the problem space changed?|date=April 2025}}
There are several known methods for mitigating the Von Neumann performance bottleneck.  For example, the following all can improve performance:{{why|date=November 2015}}
* Providing a [[CPU cache|cache]] between the CPU and the [[main memory]].
* Providing separate caches or separate access paths for data and instructions (the so-called [[Modified Harvard architecture]]).
* Using [[branch predictor]] algorithms and logic.
* Providing a limited CPU stack or other on-chip [[scratchpad memory]] to reduce memory access.
* Implementing the CPU and the [[memory hierarchy]] as a [[System on a chip|system on chip]], providing greater [[locality of reference]] and thus reducing latency and increasing throughput between [[processor register]]s and main memory.

The problem can also be sidestepped somewhat by using [[parallel computing]], using for example the [[non-uniform memory access]] (NUMA) architecture—this approach is commonly employed by [[supercomputer]]s. It is less clear whether the ''intellectual bottleneck'' that Backus criticized has changed much since 1977. Backus's proposed solution has not had a major influence.{{citation needed|date=December 2010}} Modern [[functional programming]] and [[object-oriented programming]] are much less geared towards "pushing vast numbers of words back and forth"{{how?|reason=Objects are just vast numbers of words arranged into clear structures.  Functional languages potentially push even more data around since the functions themselves may be passed around as parameters and they'll eventually be working on data|date=April 2025}} than earlier languages like [[FORTRAN]] were, but internally, that is still what computers spend much of their time doing, even highly parallel supercomputers.{{citation needed|date=April 2025}}