Editing Algorithmic efficiency

{{Distinguish|text=[[program optimization]], [[optimizing compiler]], [[loop optimization]], [[object code optimizer]]}}
{{multiple|
{{More citations needed|date=January 2024}}
{{Tone|date=January 2024}}{{short description|Property of an algorithm}}
}}
{{Use dmy dates|date=February 2023}}
In [[computer science]], '''algorithmic efficiency''' is a property of an [[algorithm]] which relates to the amount of [[computational resource]]s used by the algorithm. Algorithmic efficiency can be thought of as analogous to engineering [[productivity]] for a repeating or continuous process.

For maximum efficiency it is desirable to minimize resource usage. However, different resources such as [[Time complexity|time]] and [[Space complexity|space]] complexity cannot be compared directly, so which of two algorithms is considered to be more efficient often depends on which measure of efficiency is considered most important.

For example, [[bubble sort]] and [[timsort]] are both [[Sorting algorithm|algorithms to sort a list]] of items from smallest to largest. Bubble sort organizes the list in time proportional to the number of elements squared (<math display="inline">O(n^2)</math>, see [[Big O notation]]), but only requires a small amount of extra [[computer memory|memory]] which is constant with respect to the length of the list (<math display="inline">O(1)</math>). Timsort sorts the list in time [[linearithmic]] (proportional to a quantity times its logarithm) in the list's length (<math display="inline">O(n\log n)</math>), but has a space requirement [[Proportionality (mathematics)|linear]] in the length of the list (<math display="inline">O(n)</math>). If large lists must be sorted at high speed for a given application, timsort is a better choice; however, if minimizing the [[memory footprint]] of the sorting is more important, bubble sort is a better choice.

==Background==
The importance of efficiency with respect to time was emphasized by [[Ada Lovelace]] in 1843 as applied to [[Charles Babbage]]'s mechanical analytical engine:
<blockquote>"In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selections amongst them for the purposes of a calculating engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation"<ref>{{
Citation
 | last1 = Green | first1 = Christopher
 | title = Classics in the History of Psychology
 | url  = http://psychclassics.yorku.ca/Lovelace/lovelace.htm
 | access-date = 19 May 2013
}}</ref></blockquote>

Early [[electronic computer]]s had both limited [[clock cycle|speed]] and limited [[random access memory]]. Therefore, a [[space–time trade-off]] occurred. A [[Task (computing)|task]] could use a fast algorithm using a lot of memory, or it could use a slow algorithm using little memory. The engineering trade-off was therefore to use the fastest algorithm that could fit in the available memory.

Modern computers are significantly faster than early computers and have a much larger amount of memory available ([[Orders of magnitude (computing)|gigabytes instead of kilobytes]]). Nevertheless, [[Donald Knuth]] emphasized that efficiency is still an important consideration:

 <blockquote> "In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"<ref name="Knuth1974">{{
Citation
 |last1=Knuth 
 |first1=Donald 
 |title=Structured Programming with go-to Statements 
 |journal=Computing Surveys 
 |volume=6 
 |issue=4 
 |pages=261–301 
 |year=1974 
 |url=http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf 
 |access-date=19 May 2013 
 |url-status=dead 
 |archive-url=https://web.archive.org/web/20090824073244/http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf 
 |archive-date=24 August 2009 
|doi=10.1145/356635.356640 
 |citeseerx=10.1.1.103.6084 
 |s2cid=207630080 
 }}</ref></blockquote>

==Overview==
An algorithm is considered efficient if its resource consumption, also known as computational cost, is at or below some acceptable level. Roughly speaking, 'acceptable' means:  it will run in a reasonable amount of time or space on an available computer, typically as a [[function (mathematics)|function]] of the size of the input. Since the 1950s computers have seen dramatic increases in both the available computational power and in the available amount of memory, so current acceptable levels would have been unacceptable even 10 years ago. In fact, thanks to the [[Moore's law|approximate doubling of computer power every 2 years]], tasks that are acceptably efficient on modern [[smartphone]]s and [[embedded system]]s may have been unacceptably inefficient for industrial [[server (computing)|server]]s 10 years ago.

Computer manufacturers frequently bring out new models, often with higher [[computer performance|performance]]. Software costs can be quite high, so in some cases the simplest and cheapest way of getting higher performance might be to just buy a faster computer, provided it is [[Backward compatibility|compatible]] with an existing computer.

There are many ways in which the resources used by an algorithm can be measured: the two most common measures are speed and memory usage; other measures could include transmission speed, temporary disk usage, long-term disk usage, power consumption, [[total cost of ownership]], [[response time (technology)|response time]] to external stimuli, etc. Many of these measures depend on the size of the input to the algorithm, i.e. the amount of data to be processed. They might also depend on the way in which the data is arranged; for example, some [[sorting algorithm]]s perform poorly on data which is already sorted, or which is sorted in reverse order.

In practice, there are other factors which can affect the efficiency of an algorithm, such as requirements for accuracy and/or reliability. As detailed below, the way in which an algorithm is implemented can also have a significant effect on actual efficiency, though many aspects of this relate to [[optimization (computer science)|optimization]] issues.

===Theoretical analysis===
In the theoretical [[analysis of algorithms]], the normal practice is to estimate their complexity in the asymptotic sense. The most commonly used notation to describe resource consumption or "complexity" is [[Donald Knuth]]'s [[Big O notation]], representing the complexity of an algorithm as a function of the size of the input <math display="inline">n</math>. Big O notation is an [[Asymptote|asymptotic]] measure of function complexity, where <math display="inline">f(n) = O\bigl( g(n)\bigr)</math> roughly means the time requirement for an algorithm is proportional to <math>g(n)</math>, omitting [[lower-order terms]] that contribute less than <math>g(n)</math> to the growth of the function as <math display="inline">n</math> [[limit (mathematics)|grows arbitrarily large]]. This estimate may be misleading when <math display="inline">n</math> is small, but is generally sufficiently accurate when <math display="inline">n</math> is large as the notation is asymptotic. For example, bubble sort may be faster than [[merge sort]] when only a few items are to be sorted; however either implementation is likely to meet performance requirements for a small list. Typically, programmers are interested in algorithms that [[Scalability|scale]] efficiently to large input sizes, and merge sort is preferred over bubble sort for lists of length encountered in most data-intensive programs.

Some examples of Big O notation applied to algorithms' asymptotic time complexity include:

{| class="wikitable"
|-
!Notation !! Name !! Examples
|-
|<math>O(1)</math>||[[Constant time|constant]] || Finding the median from a sorted list of measurements; Using a constant-size [[lookup table]]; Using a suitable [[hash function]] for looking up an item.
|-
|<math>O(\log n)</math>||[[Logarithmic time|logarithmic]] || Finding an item in a sorted array with a [[Binary search algorithm|binary search]] or a balanced search [[Tree data structure|tree]] as well as all operations in a [[Binomial heap]].
|-
|<math>O(n)</math>||[[linear time|linear]] || Finding an item in an unsorted list or a malformed tree (worst case) or in an unsorted array; Adding two ''n''-bit integers by [[Ripple carry adder|ripple carry]].
|-
|<math>O(n\log n)</math>||[[Linearithmic time|linearithmic]], loglinear, or quasilinear || Performing a [[Fast Fourier transform]]; [[heapsort]], [[quicksort]] ([[Best, worst and average case|best and average case]]), or [[merge sort]]
|-
|<math>O(n^2)</math>||[[quadratic time|quadratic]] ||[[Multiplication|Multiplying]] two ''n''-digit numbers by [[Long multiplication|a simple algorithm]]; [[bubble sort]] (worst case or naive implementation), [[Shell sort]], quicksort ([[Best, worst and average case|worst case]]), [[selection sort]] or [[insertion sort]]
|-
|<math>O(c^n),\;c>1</math>||[[exponential time|exponential]] || Finding the optimal (non-[[Travelling salesman problem#Heuristic and approximation algorithms|approximate]]) solution to the [[travelling salesman problem]] using [[dynamic programming]]; [[Boolean satisfiability problem|determining if two logical statements are equivalent]] using [[brute-force search]]
|}

===Measuring performance===
For new versions of software or to provide comparisons with competitive systems, [[benchmark (computing)|benchmark]]s are sometimes used, which assist with gauging an algorithms relative performance. If a new [[sorting algorithm|sort algorithm]] is produced, for example, it can be compared with its predecessors to ensure that at least it is efficient as before with known data, taking into consideration any functional improvements. Benchmarks can be used by customers when comparing various products from alternative suppliers to estimate which product will best suit their specific requirements in terms of functionality and performance. For example, in the [[mainframe computer|mainframe]] world certain proprietary [[Mainframe sort merge|sort]] products from independent software companies such as [[Syncsort]] compete with products from the major suppliers such as [[IBM]] for speed.

Some benchmarks provide opportunities for producing an analysis comparing the relative speed of various compiled and interpreted languages for example<ref name="fourmilab.ch" /><ref>{{cite web|url=http://www.roylongbottom.org.uk/whetstone.htm#anchorPC2 |title=Whetstone Benchmark History |publisher=Roylongbottom.org.uk |access-date=14 December 2011}}</ref>
and [[The Computer Language Benchmarks Game]] compares the performance of implementations of typical programming problems in several programming languages.

Even creating "[[do it yourself]]" benchmarks can demonstrate the relative performance of different programming languages, using a variety of user specified criteria. This is quite simple, as a "Nine language performance roundup" by Christopher W. Cowell-Shah demonstrates by example.<ref>{{Cite web|url=http://www.osnews.com/story/5602|title=Nine Language Performance Round-up: Benchmarking Math & File I/O|author=OSNews Staff|website=osnews.com|access-date=2018-09-18}}</ref>

===Implementation concerns===
Implementation issues can also have an effect on efficiency, such as the choice of programming language, or the way in which the algorithm is actually coded,<ref name="KriegelSchubert2016">{{cite journal|last1=Kriegel|first1=Hans-Peter|author-link=Hans-Peter Kriegel|last2=Schubert|first2=Erich|last3=Zimek|first3=Arthur|author-link3=Arthur Zimek|title=The (black) art of runtime evaluation: Are we comparing algorithms or implementations?|journal=Knowledge and Information Systems|volume=52|issue=2|year=2016|pages=341–378|issn=0219-1377|doi=10.1007/s10115-016-1004-2|s2cid=40772241}}</ref> or the choice of a [[compiler]] for a particular language, or the [[compiler optimization|compilation options]] used, or even the [[operating system]] being used. In many cases a language implemented by an [[interpreter (computing)|interpreter]] may be much slower than a language implemented by a compiler.<ref name="fourmilab.ch">{{cite web|url=http://www.fourmilab.ch/fourmilog/archives/2005-08/000567.html |title=Floating Point Benchmark: Comparing Languages (Fourmilog: None Dare Call It Reason) |publisher=Fourmilab.ch |date=4 August 2005 |access-date=14 December 2011}}</ref> See the articles on [[just-in-time compilation]] and [[interpreted language]]s.

There are other factors which may affect time or space issues, but which may be outside of a programmer's control; these include [[data alignment]], [[granularity#Data granularity|data granularity]], [[locality of reference|cache locality]], [[cache coherence|cache coherency]], [[garbage collection (computer science)|garbage collection]], [[instruction-level parallelism]], [[Multithreading (disambiguation)|multi-threading]]<!--Intentional link to DAB page--> (at either a hardware or software level), [[Simultaneous multithreading|simultaneous multitasking]], and [[subroutine]] calls.<ref name="steele1997">Guy Lewis Steele, Jr. "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO". MIT AI Lab. AI Lab Memo AIM-443. October 1977.[http://dspace.mit.edu/handle/1721.1/5753]</ref>

Some processors have capabilities for [[vector processor|vector processing]], which allow a [[SIMD|single instruction to operate on multiple operands]]; it may or may not be easy for a programmer or compiler to use these capabilities. Algorithms designed for sequential processing may need to be completely redesigned to make use of [[parallel computing|parallel processing]], or they could be easily reconfigured. As [[parallel computing|parallel]] and [[distributed computing]] grow in importance in the late 2010s, more investments are being made into efficient [[high-level programming language|high-level]] [[Application programming interface|API]]s for parallel and distributed computing systems such as [[CUDA]], [[TensorFlow]], [[Apache Hadoop|Hadoop]], [[OpenMP]] and [[Message Passing Interface|MPI]].

Another problem which can arise in programming is that processors compatible with the same [[instruction set architecture|instruction set]] (such as [[x86-64]] or [[ARM architecture|ARM]]) may implement an instruction in different ways, so that instructions which are relatively fast on some models may be relatively slow on other models. This often presents challenges to [[optimizing compiler]]s, which must have extensive knowledge of the specific [[Central processing unit|CPU]] and other hardware available on the compilation target to best optimize a program for performance. In the extreme case, a compiler may be forced to [[software emulation|emulate]] instructions not supported on a compilation target platform, forcing it to [[code generation (compiler)|generate code]] or [[linking (computing)|link]] an external [[library (computing)|library call]] to produce a result that is otherwise incomputable on that platform, even if it is natively supported and more efficient in hardware on other platforms. This is often the case in [[embedded system]]s with respect to [[floating-point arithmetic]], where small and [[low-power computing|low-power]] [[microcontroller]]s often lack hardware support for floating-point arithmetic and thus require computationally expensive software routines to produce floating point calculations.

==Measures of resource usage==
Measures are normally expressed as a function of the size of the input <math>\scriptstyle {n}</math>.

The two most common measures are:
* ''Time'': how long does the algorithm take to complete?
* ''Space'': how much working memory (typically RAM) is needed by the algorithm? This has two aspects: the amount of memory needed by the code (auxiliary space usage), and the amount of memory needed for the data on which the code operates (intrinsic space usage).

For computers whose power is supplied by a battery (e.g. [[laptop]]s and [[smartphone]]s), or for very long/large calculations (e.g. [[supercomputer]]s), other measures of interest are:
* ''Direct power consumption'': power needed directly to operate the computer.
* ''Indirect power consumption'': power needed for cooling, lighting, etc.

{{As of|2018}}, power consumption is growing as an important metric for computational tasks of all types and at all scales ranging from [[Embedded system|embedded]] [[Internet of things]] devices to [[system-on-chip]] devices to [[server farm]]s. This trend is often referred to as [[green computing]].

Less common measures of computational efficiency may also be relevant in some cases:

*''Transmission size'': bandwidth could be a limiting factor. [[Data compression]] can be used to reduce the amount of data to be transmitted. Displaying a picture or image (e.g. [[:File:Google.png|Google logo]]) can result in transmitting tens of thousands of bytes (48K in this case) compared with transmitting six bytes for the text "Google". This is important for [[I/O bound computing]] tasks.
*''External space'': space needed on a disk or other external memory device; this could be for temporary storage while the algorithm is being carried out, or it could be long-term storage needed to be carried forward for future reference.
*''Response time'' ([[latency (engineering)|latency]]): this is particularly relevant in a [[real-time computing|real-time application]] when the computer system must [[event-driven programming|respond quickly to some external event]].
*''Total cost of ownership'': particularly if a computer is dedicated to one particular algorithm.

===Time===
====Theory====
[[Analysis of algorithms]], typically using concepts like [[time complexity]], can be used to get an estimate of the running time as a function of the size of the input data. The result is normally expressed using [[Big O notation]]. This is useful for comparing algorithms, especially when a large amount of data is to be processed. More detailed estimates are needed to compare algorithm performance when the amount of data is small, although this is likely to be of less importance. [[Parallel algorithm]]s may be [[Analysis of parallel algorithms|more difficult to analyze]].

====Practice====
A [[benchmark (computing)|benchmark]] can be used to assess the performance of an algorithm in practice. Many programming languages have an available function which provides [[CPU time]] usage. For long-running algorithms the elapsed time could also be of interest. Results should generally be averaged over several tests.

Run-based profiling can be very sensitive to hardware configuration and the possibility of other programs or tasks running at the same time in a [[multi-processing]] and [[multi-programming]] environment.

This sort of test also depends heavily on the selection of a particular programming language, compiler, and compiler options, so algorithms being compared must all be implemented under the same conditions.

===Space===
This section is concerned with use of memory resources ([[Processor register|registers]], [[cache (computing)|cache]], [[Random-access memory|RAM]], [[virtual memory]], [[Auxiliary memory|secondary memory]]) while the algorithm is being executed. As for time analysis above, [[Analysis of algorithms|analyze]] the algorithm, typically using [[space complexity]] analysis to get an estimate of the run-time memory needed as a function as the size of the input data. The result is normally expressed using [[Big O notation]].

There are up to four aspects of memory usage to consider:
* The amount of memory needed to hold the code for the algorithm.
* The amount of memory needed for the [[Input (computer science)|input data]].
* The amount of memory needed for any [[Output (computing)|output data]]. 
**Some algorithms, such as sorting, often [[In-place algorithm|rearrange the input data]] and do not need any additional space for output data. This property is referred to as "[[In-place algorithm|in-place]]" operation.
* The amount of memory needed as working space during the calculation.
**This includes [[local variable]]s and any [[local variables, recursion and reentrancy|stack space needed]] by [[function call|routines called]] during a calculation; this stack space can be significant for algorithms which use [[Recursion (computer science)|recursive]] techniques.

Early electronic computers, and early home computers, had relatively small amounts of working memory. For example, the 1949 [[Electronic Delay Storage Automatic Calculator]] (EDSAC) had a maximum working memory of 1024 17-bit words, while the 1980 Sinclair [[ZX80]] came initially with 1024 8-bit bytes of working memory. In the late 2010s, it is typical for [[personal computer]]s to have between 4 and 32 [[gigabyte|GB]] of RAM, an increase of over 300 million times as much memory.

==== Caching and memory hierarchy ====
{{Further|Memory hierarchy}}
Modern computers can have relatively large amounts of memory (possibly gigabytes), so having to squeeze an algorithm into a confined amount of memory is not the kind of problem it used to be. However, the different types of memory and their relative access speeds can be significant:

* [[Processor register]]s, are the fastest memory with the least amount of space. Most direct computation on modern computers occurs with source and destination operands in registers before being updated to the cache, main memory and virtual memory if needed. On a [[CPU core|processor core]], there are typically on the order of hundreds of bytes or fewer of register availability, although a [[register file]] may contain more physical registers than [[Instruction set architecture|architectural]] registers defined in the instruction set architecture.
* [[CPU cache|Cache memory]] is the second fastest, and second smallest, available in the memory hierarchy. Caches are present in processors such as CPUs or GPUs, where they are typically implemented in [[Static random-access memory|static RAM]], though they can also be found in peripherals such as disk drives. Processor caches often have their own [[Cache hierarchy|multi-level hierarchy]]; lower levels are larger, slower and typically [[shared cache|shared]] between [[processor core]]s in [[multi-core processor]]s. In order to process operands in cache memory, a [[processor (computing)|processing unit]] must fetch the data from the cache, perform the operation in registers and write the data back to the cache. This operates at speeds comparable (about 2-10 times slower) with the CPU or GPU's [[arithmetic logic unit]] or [[floating-point unit]] if in the [[L1 cache]].<ref name="CompArc:QuantApp">{{cite book |last1=Hennessy |first1=John L |last2=Patterson |first2=David A |last3=Asanović |first3=Krste |author-link3=Krste Asanović |last4=Bakos |first4=Jason D |last5=Colwell |first5=Robert P |last6=Bhattacharjee |first6=Abhishek |last7=Conte |first7=Thomas M |last8=Duato |first8=José |last9=Franklin |first9=Diana |last10=Goldberg |first10=David |author-link11=Norman Jouppi|last11=Jouppi |first11=Norman P |last12=Li |first12=Sheng |last13=Muralimanohar |first13=Naveen |last14=Peterson |first14=Gregory D |last15=Pinkston |first15=Timothy Mark |last16=Ranganathan |first16=Prakash |last17=Wood |first17=David Allen |last18=Young |first18=Clifford |last19=Zaky |first19=Amr |title=Computer Architecture: a Quantitative Approach |date=2011 |publisher=Elsevier Science |isbn=978-0128119051 |edition=Sixth |language=en|oclc=983459758 }}</ref> It is about 10 times slower if there is an L1 [[cache miss]] and it must be retrieved from and written to the [[L2 cache]], and a further 10 times slower if there is an L2 cache miss and it must be retrieved from an [[L3 cache]], if present.
* [[Main memory|Main physical memory]] is most often implemented in [[dynamic RAM]] (DRAM). The main memory is much larger (typically [[gigabyte]]s compared to ≈8 [[megabyte]]s) than an L3 CPU cache, with read and write latencies typically 10-100 times slower.<ref name="CompArc:QuantApp" /> {{As of|2018}}, RAM is increasingly implemented [[system-on-chip|on-chip]] of processors, as CPU or [[GPU memory]].{{Citation needed |date=July 2024}}
* [[Paged memory]], often used for [[virtual memory]] management, is memory stored in [[secondary storage]] such as a [[hard disk]], and is an extension to the [[memory hierarchy]] which allows use of a potentially larger storage space, at the cost of much higher latency, typically around 1000 times slower than a [[cache miss]] for a value in RAM.<ref name="CompArc:QuantApp" /> While originally motivated to create the impression of higher amounts of memory being available than were truly available, virtual memory is more important in contemporary usage for its [[time-space tradeoff]] and enabling the usage of [[virtual machine]]s.<ref name="CompArc:QuantApp" /> Cache misses from main memory are called [[page fault]]s, and incur huge performance penalties on programs.

An algorithm whose memory needs will fit in cache memory will be much faster than an algorithm which fits in main memory, which in turn will be very much faster than an algorithm which has to resort to paging. Because of this, [[cache replacement policies]] are extremely important to high-performance computing, as are [[Cache-aware model|cache-aware programming]] and [[data alignment]]. To further complicate the issue, some systems have up to three levels of cache memory, with varying effective speeds. Different systems will have different amounts of these various types of memory, so the effect of algorithm memory needs can vary greatly from one system to another.

In the early days of electronic computing, if an algorithm and its data would not fit in main memory then the algorithm could not be used. Nowadays the use of virtual memory appears to provide much more memory, but at the cost of performance. Much higher speed can be obtained if an algorithm and its data fit in cache memory; in this case minimizing space will also help minimize time. This is called the [[principle of locality]], and can be subdivided into [[locality of reference]], [[spatial locality]], and [[temporal locality]]. An algorithm which will not fit completely in cache memory but which exhibits locality of reference may perform reasonably well.

==See also==
* [[Analysis of algorithms]]—how to determine the resources needed by an algorithm
* [[Benchmark (computing)|Benchmark]]—a method for measuring comparative execution times in defined cases
* [[Best, worst and average case]]—considerations for estimating execution times in three scenarios
* [[Compiler optimization]]—compiler-derived optimization
* [[Computational complexity theory]]
* [[Computer performance]]—computer hardware metrics
* [[Empirical algorithmics]]—the practice of using empirical methods to study the behavior of algorithms
* [[Program optimization]]
* [[Profiling (computer programming)|Performance analysis]]—methods of measuring actual performance of an algorithm at run-time

==References==
{{Reflist}}

{{Computer science}}
{{Software quality}}
{{Authority control}}

[[Category:Analysis of algorithms]]
[[Category:Computer performance]]
[[Category:Software optimization]]
[[Category:Software quality]]