Editing Amdahl's law

{{short description|Formula in computer architecture}}
[[File:AmdahlsLaw.svg|thumb|400px|Amdahl's Law demonstrates the theoretical maximum [[speedup]] of an overall system and the concept of diminishing returns. Plotted here is logarithmic parallelization vs linear speedup. If exactly 50% of the work can be parallelized, the best possible speedup is 2 times. If 95% of the work can be parallelized, the best possible speedup is 20 times. According to the law, even with an infinite number of processors, the speedup is constrained by the unparallelizable portion.]]

In [[computer architecture]], '''Amdahl's law''' (or '''Amdahl's argument'''<ref>{{cite journal | last = Rodgers | first = David P. | journal = ACM SIGARCH Computer Architecture News | date = June 1985 | title = Improvements in multiprocessor system design | volume = 13 | issue = 3 | pages = 225–231 [p. 226] | issn = 0163-5964 | publisher = [[Association for Computing Machinery|ACM]] | location = New York, NY, USA | doi = 10.1145/327070.327215 | isbn=0-8186-0634-7 | s2cid = 7083878 }}</ref>) is a formula that shows how much faster a task can be completed when more resources are added to the system.

The law can be stated as:
<blockquote>"the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used".<ref>{{Cite book |last=Reddy |first=Martin |title=API Design for C++ |page=210 |date=2011 |place=[[Burlington, Massachusetts]] |publisher=[[Morgan Kaufmann Publishers]] |doi=10.1016/C2010-0-65832-9 |isbn=978-0-12-385003-4 |lccn=2010039601 |oclc=666246330}}</ref></blockquote>
It is named after [[computer scientist]] [[Gene Amdahl]], and was presented at the [[American Federation of Information Processing Societies]] (AFIPS) Spring Joint Computer Conference in 1967.

Amdahl's law is often used in [[parallel computing]] to predict the theoretical [[speedup]] when using multiple processors.

== Definition ==

In the context of Amdahl's law, speedup can be defined as:<ref name=":1" />

<math>\text{Speedup} = \frac{\text{Performance for the entire task when enhancements are applied}}{\text{Performance for the same task without those enhancements}}</math>

or

<math>\text{Speedup} = \frac{\text{Execution time for the entire task without enhancements}}{\text{Execution time for the same task when enhancements are applied}}</math>

Amdahl's law can be formulated in the following way:<ref name=":0">{{Citation |last=Bakos |first=Jason D. |title=Chapter 2 - Multicore and data-level optimization: OpenMP and SIMD |date=2016-01-01 |work=Embedded Systems |pages=49–103 |editor-last=Bakos |editor-first=Jason D. |url=https://linkinghub.elsevier.com/retrieve/pii/B978012800342800002X |access-date=2024-11-18 |place=Boston |publisher=Morgan Kaufmann |doi=10.1016/b978-0-12-800342-8.00002-x |isbn=978-0-12-800342-8|url-access=subscription }}</ref>

: <math>\text{Speedup}_\text{overall} = \frac{1}{(1 - \text{time}_{\text{optimized}}) + \frac{\text{time}_{\text{optimized}}}{\text{speedup}_{\text{optimized}}}}
</math>
where

* <math>\text{Speedup}_\text{overall}</math> represents the total speedup of a program
* <math>\text{Time}_{\text{optimized}}</math> represents the proportion of time spent on the portion of the code where improvements are made
* <math>\text{Speedup}_{\text{optimized}}</math> represents the extent of the improvement

The <math>\text{Speedup}_\text{overall}

</math> is frequently much lower than one might expect. For instance, if a programmer enhances a part of the code that represents 10% of the total execution time (i.e. <math>\text{Time}_{\text{optimized}}
</math> of 0.10) and achieves a  <math>\text{Speedup}_{\text{optimized}}
</math> of 10,000, then  <math>\text{Speedup}_\text{overall}
</math> becomes 1.11 which means only 11% improvement in total speedup of the program. So, despite a massive improvement in one section, the overall benefit is quite small. In another example, if the programmer optimizes a section that accounts for 99% of the execution time (i.e. <math>\text{Time}_{\text{optimized}}</math> of 0.99) with a speedup factor of 100 (i.e. <math>\text{Speedup}_{\text{optimized}}</math>of 100), the <math>\text{Speedup}_\text{overall}</math> only reaches 50. This indicates that half of the potential performance gain (<math>\text{Speedup}_\text{overall}</math> will reach 100 if 100% of the execution time is covered) is lost due to the remaining 1% of execution time that was not improved.<ref name=":0" />

== Implications ==
{{aig |1=section |date=May 2025}}

Followings are implications of Amdahl's law:<ref>{{Cite book |title=The Art of Multiprocessor Programming, Revised Reprint |date=22 May 2012 |publisher=Morgan Kaufmann |isbn=9780123973375}}</ref><ref>{{Cite book |title=Programming Many-Core Chips |isbn=9781441997395 |last1=Vajda |first1=András |date=10 June 2011 |publisher=Springer }}</ref>

* '''Diminishing Returns''': Adding more processors gives diminishing returns. Beyond a certain point, adding more processors doesn't significantly increase speedup.
* '''Limited Speedup''': Even with many processors, there's a limit to how much faster a task can be completed due to parts of the task that cannot be parallelized.

== Limitations ==
{{aig |1=section |date=May 2025}}

Followings are limitations of Amdahl's law:<ref>{{Cite book |last=Amdahl |first=Gene M. |chapter=Validity of the single processor approach to achieving large scale computing capabilities |date=1967-04-18 |title=Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring) |chapter-url=https://dl.acm.org/doi/10.1145/1465482.1465560 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=483–485 |doi=10.1145/1465482.1465560 |isbn=978-1-4503-7895-6}}</ref><ref name=":1">{{Cite book |title=Computer Architecture: A Quantitative Approach |date=2003 |publisher=Morgan Kaufmann |isbn=978-8178672663}}</ref><ref>{{Cite book |title=Parallel Computer Architecture A Hardware/Software Approach |date=1999 |publisher=Elsevier Science |isbn=9781558603431}}</ref>

* '''Assumption of Fixed Workload''': Amdahl's Law assumes that the workload remains constant. It doesn't account for dynamic or increasing workloads, which can impact the effectiveness of parallel processing.
* '''Overhead Ignored''': Amdahl's Law neglects overheads associated with [[Concurrency (computer science)|concurrency]], including coordination, [[Synchronization (computer science)|synchronization]], [[inter-process communication]], and [[concurrency control]]. Notably, merging data from multiple [[Thread (computing)|threads]] or [[Process (computing)|processes]] incurs significant overhead due to [[conflict resolution]], [[data consistency]], versioning, and synchronization.<ref>{{Cite book |title=Concurrent Programming: Algorithms, Principles, and Foundations |date=23 December 2012 |publisher=Springer |isbn=978-3642320262}}</ref>
* '''Neglecting extrinsic factors:''' Amdahl's Law addresses computational parallelism, neglecting extrinsic factors such as data persistence, I/O operations, and memory access overheads, and assumes idealized conditions.
* '''Scalability Issues''': While it highlights the limits of parallel speedup, it doesn't address practical scalability issues, such as the cost and complexity of adding more processors.
* '''Non-Parallelizable Work''': Amdahl's Law emphasizes the non-parallelizable portion of the task as a bottleneck but doesn't provide solutions for reducing or optimizing this portion.
* '''Assumes Homogeneous Processors''': It assumes that all processors are identical and contribute equally to speedup, which may not be the case in heterogeneous computing environments.

Amdahl's law applies only to the cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this case, [[Gustafson's law]] gives a less pessimistic and more realistic assessment of the parallel performance.<ref>{{cite book |last1=McCool |first1=Michael |title=Structured Parallel Programming: Patterns for Efficient Computation |last2=Reinders |first2=James |last3=Robison |first3=Arch |publisher=Elsevier |year=2013 |isbn=978-0-12-415993-8 |pages=61}}</ref>

[[Neil J. Gunther#Universal Scalability Law|Universal Scalability Law]] (USL), developed by [[Neil J. Gunther]], extends the Amdahl's law and accounts for the additional overhead due to [[inter-process communication]]. USL quantifies scalability based on parameters such as contention and coherency.<ref>{{Cite book |last=Gunther |first=Neil |title=Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services |year=2007 |isbn=978-3540261384}}</ref>

== Derivation ==

A task executed by a system whose resources are improved compared to an initial similar system can be split up into two parts:

* a part that does not benefit from the improvement of the resources of the system;
* a part that benefits from the improvement of the resources of the system.

An example is a computer program that processes files. A part of that program may scan the directory of the disk and create a list of files internally in memory. After that, another part of the program passes each file to a separate [[Thread (computing)|thread]] for processing. The part that scans the directory and creates the file list cannot be sped up on a parallel computer, but the part that processes the files can.

The execution time of the whole task before the improvement of the resources of the system is denoted as <math> T </math>. It includes the execution time of the part that would not benefit from the improvement of the resources and the execution time of the one that would benefit from it. The fraction of the execution time of the task that would benefit from the improvement of the resources is denoted by <math> p </math>. The one concerning the part that would not benefit from it is therefore {{nowrap |<math> 1 - p </math>}}. Then:

: <math>T = (1 - p)T + pT.</math>

It is the execution of the part that benefits from the improvement of the resources that is accelerated by the factor <math> s </math> after the improvement of the resources. Consequently, the execution time of the part that does not benefit from it remains the same, while the part that benefits from it becomes:

: <math>\frac{p}{s}T.</math>

The theoretical execution time <math> T (s) </math> of the whole task after the improvement of the resources is then:

: <math>T(s) = (1 - p)T + \frac p s T.</math>

Amdahl's law gives the theoretical [[speedup]] in [[Latency (engineering)|latency]] of the execution of the whole task ''at fixed workload <math> W </math>'', which yields
: <math>S_\text{latency}(s) = \frac{TW}{T(s)W} = \frac{T}{T(s)} = \frac 1 {1 - p + \frac p s}.</math>

=== Parallel programs ===

If 30% of the execution time may be the subject of a speedup, ''p'' will be 0.3; if the improvement makes the affected part twice as fast, ''s'' will be&nbsp;2. Amdahl's law states that the overall speedup of applying the improvement will be:

: <math>S_\text{latency} = \frac{1}{1 - p + \frac{p}{s}} = \frac 1 {1 - 0.3 + \frac {0.3} 2} = 1.18.</math>

For example, assume that we are given a serial task which is split into four consecutive parts, whose percentages of execution time are {{nowrap|1=''p''1 = 0.11}}, {{nowrap|1=''p''2 = 0.18}}, {{nowrap|1=''p''3 = 0.23}}, and {{nowrap|1=''p''4 = 0.48}} respectively. Then we are told that the 1st part is not sped up, so {{nowrap|1=''s''1 = 1}}, while the 2nd part is sped up 5 times, so {{nowrap|1=''s''2 = 5}}, the 3rd part is sped up 20 times, so {{nowrap|1=''s''3 = 20}}, and the 4th part is sped up 1.6 times, so {{nowrap|1=''s''4 = 1.6}}. By using Amdahl's law, the overall speedup is
: <math>S_\text{latency} = \frac{1}{\frac{p1}{s1} + \frac{p2}{s2} + \frac{p3}{s3} + \frac{p4}{s4}} = \frac{1}{\frac{0.11}{1} + \frac{0.18}{5} + \frac{0.23}{20} + \frac{0.48}{1.6}} = 2.19.</math>
Notice how the 5 times and 20 times speedup on the 2nd and 3rd parts respectively don't have much effect on the overall speedup when the 4th part (48% of the execution time) is accelerated by only 1.6 times.

=== Serial programs ===
[[File:Optimizing-different-parts.svg|thumb|400px|Assume that a task has two independent parts, ''A'' and ''B''. Part ''B'' takes roughly 25% of the time of the whole computation. By working very hard, one may be able to make this part 5 times faster, but this reduces the time of the whole computation only slightly. In contrast, one may need to perform less work to make part ''A'' perform twice as fast. This will make the computation much faster than by optimizing part ''B'', even though part ''B'''s speedup is greater in terms of the ratio, (5 times versus 2 times).]]

For example, with a serial program in two parts ''A'' and ''B'' for which {{nowrap|1=''T''<sub>''A''</sub> = 3 s}} and {{nowrap|1=''T''<sub>''B''</sub> = 1 s}},

* if part ''B'' is made to run 5 times faster, that is {{nowrap|1=''s'' = 5}} and {{nowrap|1=''p'' = ''T''<sub>''B''</sub>/(''T''<sub>''A''</sub> + ''T''<sub>''B''</sub>) = 0.25}}, then <math display="block">S_\text{latency} = \frac 1 {1 - 0.25 + \frac{0.25}{5}} = 1.25;</math>
*if part ''A'' is made to run 2 times faster, that is {{nowrap|1=''s'' = 2}} and {{nowrap|1=''p'' = ''T''<sub>''A''</sub>/(''T''<sub>''A''</sub> + ''T''<sub>''B''</sub>) = 0.75}}, then <math display="block">S_\text{latency} = \frac 1 {1 - 0.75 + \frac{0.75}{2}} = 1.60.</math>

Therefore, making part ''A'' to run 2 times faster is better than making part ''B'' to run 5 times faster. The percentage improvement in speed can be calculated as
: <math>\text{percentage improvement} = 100 \left(1 - \frac 1 {S_\text{latency}}\right).</math>

* Improving part ''A'' by a factor of 2 will increase overall program speed by a factor of 1.60, which makes it 37.5% faster than the original computation.
* However, improving part ''B'' by a factor of 5, which presumably requires more effort, will achieve an overall speedup factor of 1.25 only, which makes it 20% faster.

=== Optimizing the sequential part of parallel programs ===

If the non-parallelizable part is optimized by a factor of {{nowrap |<math> O </math>}}, then

: <math>T(O,s) =  (1 - p)\frac{T}{O} + \frac{p}{s} T.</math>

It follows from Amdahl's law that the speedup due to parallelism is given by
: <math>S_\text{latency}(O,s) = \frac{T(O)}{T(O,s)} = \frac {(1 - p)\frac{1}{O} + {p} } {\frac{1 - p}{O} + \frac p s}.</math>

When <math>s=1</math>, we have <math>S_\text{latency}(O,s)=1</math>, meaning that the speedup is
measured with respect to the execution time after the non-parallelizable part is optimized.

When  <math>s=\infty</math>, 
: <math>S_\text{latency}(O,\infty) = \frac{T(O)}{T(O,s)} = \frac {(1 - p)\frac{1}{O} + {p} } {\frac{1 - p}{O} + \frac p s}= 1 + \frac{p}{1-p}O.</math>

If  <math>1-p=0.4</math>, <math>O=2</math> and <math>s=5</math>, then:

: <math>S_\text{latency}(O,s) = \frac{T(O)}{T(O,s)} = \frac{ {0.4} \frac{1}{2} + 0.6} {\frac{0.4}{2} + \frac{0.6}{5} }   = 2.5.</math>

=== Transforming sequential parts of parallel programs into parallelizable ===

Next, we consider the case wherein the non-parallelizable part is reduced by a factor of {{nowrap |<math> O' </math>}}, and the parallelizable part is correspondingly increased. Then

: <math>T'(O',s) =  \frac{1 - p}{O'} T + \left(1-\frac{1-p}{O'}\right) \frac{T}{s}.</math>

It follows from Amdahl's law that the speedup due to parallelism is given by
: <math>S'_\text{latency}(O',s) = \frac{T'(O')}{T'(O',s)} = \frac { 1  } {  \frac{1 - p}{O'} + \left(1-\frac{1-p}{O'}\right) \frac{1}{s}}.</math>

== Relation to the law of diminishing returns ==

Amdahl's law is often conflated with the [[law of diminishing returns]], whereas only a special case of applying Amdahl's law demonstrates law of diminishing returns. If one picks optimally (in terms of the achieved speedup) what is to be improved, then one will see monotonically decreasing improvements as one improves. If, however, one picks non-optimally, after improving a sub-optimal component and moving on to improve a more optimal component, one can see an increase in the return. Note that it is often rational to improve a system in an order that is "non-optimal" in this sense, given that some improvements are more difficult or require larger development time than others.

Amdahl's law does represent the law of diminishing returns if one is considering what sort of return one gets by adding more processors to a machine, if one is running a fixed-size computation that will use all available processors to their capacity. Each new processor added to the system will add less usable power than the previous one. Each time one doubles the number of processors the speedup ratio will diminish, as the total throughput heads toward the limit of 1/(1&nbsp;−&nbsp;''p'').

This analysis neglects other potential bottlenecks such as [[memory bandwidth]] and I/O bandwidth. If these resources do not scale with the number of processors, then merely adding processors provides even lower returns.

An implication of Amdahl's law is that to speed up real applications which have both serial and parallel portions, [[heterogeneous computing]] techniques are required.<ref>{{cite journal |first1=Mark D. |last1=Hill |first2=Michael R. |last2=Marty |title=Amdahl's Law in the Multicore Era |journal=[[Computer (magazine)|Computer]] |volume=41 |issue=7 |pages=33–38 |year=2008 |doi=10.1109/MC.2008.209 |citeseerx=10.1.1.221.8635 }}</ref> There are novel speedup and energy consumption models based on a more general representation of heterogeneity, referred to as the normal form heterogeneity, that support a wide range of heterogeneous many-core architectures. These modelling methods aim to predict system power efficiency and performance ranges, and facilitates research and development at the hardware and system software levels.<ref>{{Cite journal |last1=Rafiev |first1=Ashur |last2=Al-Hayanni |first2=Mohammed A. N. |last3=Xia |first3=Fei |last4=Shafik |first4=Rishad |last5=Romanovsky |first5=Alexander |last6=Yakovlev |first6=Alex |date=2018-07-01 |title=Speedup and Power Scaling Models for Heterogeneous Many-Core Systems |url=https://ieeexplore.ieee.org/document/8255653 |journal=IEEE Transactions on Multi-Scale Computing Systems |volume=4 |issue=3 |pages=436–449 |doi=10.1109/TMSCS.2018.2791531 |s2cid=52287374 |issn=2332-7766}}</ref><ref>{{Cite journal |last1=Al-hayanni |first1=Mohammed A. Noaman |last2=Xia |first2=Fei |last3=Rafiev |first3=Ashur |last4=Romanovsky |first4=Alexander |last5=Shafik |first5=Rishad |last6=Yakovlev |first6=Alex |date=July 2020 |title=Amdahl's law in the context of heterogeneous many-core systems – a survey |journal=IET Computers & Digital Techniques |language=en |volume=14 |issue=4 |pages=133–148 |doi=10.1049/iet-cdt.2018.5220 |s2cid=214415079 |issn=1751-8601|doi-access=free }}</ref>

== See also ==

* [[Gustafson's law]]
* [[Neil J. Gunther#Universal Law of Computational Scalability|Universal Law of Computational Scalability]]
* [[Analysis of parallel algorithms]]
* [[Critical path method]]
* [[Moore's law]]
* [[List of eponymous laws]]

== References ==
{{reflist}}

== Further reading ==

* {{ cite journal | first = Gene M. | last = Amdahl | url = https://inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf | doi=10.1145/1465482.1465560 | title = Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities | journal = AFIPS Conference Proceedings | issue = 30 | pages = 483–485 | year = 1967 | s2cid = 195607370 }}

== External links ==
{{commons category}}

* {{citation |author=Gene M. Amdahl|hdl=11299/104341|title=Oral history interview with Gene M. Amdahl|publisher=[[Charles Babbage Institute]], University of Minnesota |year=1989}}. Amdahl discusses his graduate work at the University of Wisconsin and his design of [[Wisconsin Integrally Synchronized Computer|WISC]]. Discusses his role in the design of several computers for IBM including the [[IBM Stretch|STRETCH]], [[IBM 701]], and [[IBM 704]]. He discusses his work with [[Nathaniel Rochester (computer scientist)|Nathaniel Rochester]] and IBM's management of the design process. Mentions work with [[TRW Inc.|Ramo-Wooldridge]], [[Aeronutronic]], and [[Computer Sciences Corporation]]
* [http://demonstrations.wolfram.com/AmdahlsLaw/ "Amdahl's Law"] by Joel F. Klein, [[Wolfram Demonstrations Project]] (2007)
* [https://research.cs.wisc.edu/multifacet/amdahl/ Amdahl's Law in the Multicore Era] (July 2008)

{{Computer laws}}
{{Parallel Computing}}

[[Category:Analysis of parallel algorithms]]
[[Category:Computer architecture statements]]