Editing Needleman–Wunsch algorithm (section)

==Historical notes and algorithm development==
The original purpose of the algorithm described by Needleman and Wunsch was to find similarities in the amino acid sequences of two proteins.<ref name=Needleman />

Needleman and Wunsch describe their algorithm explicitly for the case when the alignment is penalized solely by the matches and mismatches, and gaps have no penalty (''d''=0). The original publication from 1970 suggests the [[recursion]]
<math>F_{ij} = \max_{h<i,k<j} \{ F_{h,j-1}+S(A_{i},B_{j}), F_{i-1,k}+S(A_i,B_j) \}</math>.

The corresponding dynamic programming algorithm takes cubic time. The paper also points out that the recursion can accommodate arbitrary gap penalization formulas:

<blockquote>
A penalty factor, a number subtracted for every gap made, may be assessed as a barrier to allowing the gap. The penalty factor could be a function of the size and/or direction of the gap. [page 444]
</blockquote>

A better dynamic programming algorithm with quadratic running time for the same problem (no gap penalty) was introduced later<ref name=Sankoff>{{cite journal | doi=10.1073/pnas.69.1.4 | journal=Proceedings of the National Academy of Sciences of the USA | volume=69 | issue=1 | pages=4–6 | year=1972  | author=Sankoff D | title=Matching sequences under deletion/insertion constraints | pmid=4500555 | pmc=427531| bibcode=1972PNAS...69....4S | doi-access=free }}</ref> by [[David Sankoff]] in 1972.
Similar quadratic-time algorithms were discovered independently
by T. K. Vintsyuk<ref name=Vintsyuk>{{cite journal | journal=Kibernetika | volume=4 | pages=81–88 | year=1968  | author=Vintsyuk TK | title=Speech discrimination by dynamic programming| doi=10.1007/BF01074755 | s2cid=123081024 }}</ref> in 1968 for speech processing
([[Dynamic time warping|"time warping"]]), and by Robert A. Wagner and [[Michael J. Fischer]]<ref name=WagnerFischer>{{cite journal |vauthors=Wagner RA, Fischer MJ | journal = [[Journal of the ACM]] | title=The string-to-string correction problem | volume=21 | issue=1 | year=1974 | pages=168–173 | doi=10.1145/321796.321811| s2cid = 13381535 | doi-access=free }}</ref> in 1974 for string matching.

Needleman and Wunsch formulated their problem in terms of maximizing similarity. Another possibility is to minimize the [[Levenshtein distance|edit distance]] between sequences, introduced by [[Vladimir Levenshtein]]. Peter H. Sellers showed<ref name=Sellers>{{cite journal | doi=10.1137/0126070 | title=On the theory and computation of evolutionary distances | author=Sellers PH | journal = SIAM Journal on Applied Mathematics | volume = 26 | issue = 4 | pages = 787–793 | year = 1974}}</ref> in 1974 that the two problems are equivalent.

The Needleman–Wunsch algorithm is still widely used for optimal [[Sequence alignment#Global and local alignments|global alignment]], particularly when the quality of the global alignment is of the utmost importance. However, the algorithm is expensive with respect to time and space, proportional to the product of the length of two sequences and hence is not suitable for long sequences.

Recent development has focused on improving the time and space cost of the algorithm while maintaining quality. For example, in 2013, a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA),<ref>{{cite journal|last1=Chakraborty|first1=Angana|last2=Bandyopadhyay|first2=Sanghamitra|title=FOGSAA: Fast Optimal Global Sequence Alignment Algorithm|journal=Scientific Reports|date=29 April 2013|volume=3|pages=1746|doi=10.1038/srep01746|pmid=23624407|pmc=3638164|bibcode=2013NatSR...3.1746C}}</ref> suggested alignment of nucleotide/protein sequences faster than other optimal global alignment methods, including the Needleman–Wunsch algorithm. The paper claims that when compared to the Needleman–Wunsch algorithm, FOGSAA achieves a time gain of 70–90% for highly similar nucleotide sequences (with > 80% similarity), and 54–70% for sequences having 30–80% similarity.