Editing Needleman–Wunsch algorithm (section)

==Advanced presentation of algorithm==
Scores for aligned characters are specified by a [[similarity matrix]]. Here, {{math|''S''(''a'', ''b'')}} is the similarity of characters ''a'' and ''b''. It uses a linear [[gap penalty]], here called {{mvar|d}}.

For example, if the similarity matrix was
{| class="wikitable"
! scope="col" |
! scope="col" | A
! scope="col" | G
! scope="col" | C
! scope="col" | T
|- style="text-align: right;"
! scope="row" | A
| 10 || −1 || −3 || −4
|- style="text-align: right;"
! scope="row" | G
| −1 || 7 || −5 || −3
|- style="text-align: right;"
! scope="row" | C
| −3 || −5 || 9 || 0
|- style="text-align: right;"
! scope="row" | T
| −4 || −3 || 0 || 8
|}

then the alignment:
 AGACTAGTTAC
 CGA---GACGT
with a gap penalty of −5, would have the following score:
:{{math|''S''(A,C) + ''S''(G,G) + ''S''(A,A) + (3 × ''d'') + ''S''(G,G) + ''S''(T,A) + ''S''(T,C) + ''S''(A,G) + ''S''(C,T)}}
:= −3 + 7 + 10 − (3 × 5) + 7 + (−4) + 0 + (−1) + 0 = 1

To find the alignment with the highest score, a two-dimensional [[Array data structure|array]] (or [[Matrix (mathematics)|matrix]]) ''F'' is allocated. The entry in row ''i'' and column ''j'' is denoted here by
<math>F_{ij}</math>. There is one row for each character in sequence ''A'', and one column for each character in sequence ''B''. Thus, if aligning sequences of sizes ''n'' and ''m'', the amount of memory used is in <math>O(nm)</math>. [[Hirschberg's algorithm]] only holds a subset of the array in memory and uses <math>\Theta(\min \{n,m\})</math> space, but is otherwise similar to Needleman-Wunsch (and still requires <math>O(nm)</math> time).

As the algorithm progresses, the <math>F_{ij}</math> will be assigned to be the optimal score for the alignment of the first <math>i=0,\dotsc,n</math> characters in ''A'' and the first <math>j=0,\dotsc,m</math> characters in ''B''. The [[Bellman equation#Bellman's principle of optimality|principle of optimality]] is then applied as follows:
* Basis:
:<math>F_{0j} = d*j</math>
:<math>F_{i0} = d*i</math>
* Recursion, based on the principle of optimality:
:<math>F_{ij} = \max(F_{i-1,j-1} + S(A_{i}, B_{j}), \; F_{i,j-1} + d, \; F_{i-1,j} + d)</math>

The pseudo-code for the algorithm to compute the F matrix therefore looks like this:
 d ← Gap penalty score
 '''for''' i = 0 '''to''' '''length'''(A)
     F(i,0) ← d * i
 '''for''' j = 0 '''to''' '''length'''(B)
     F(0,j) ← d * j
 '''for''' i = 1 '''to''' '''length'''(A)
     '''for''' j = 1 '''to''' '''length'''(B)
     {
         Match ← F(i−1, j−1) + S(A<sub>i</sub>, B<sub>j</sub>)
         Delete ← F(i−1, j) + d
         Insert ← F(i, j−1) + d
         F(i,j) ← '''max'''(Match, Insert, Delete)
     }
Once the ''F'' matrix is computed, the entry <math>F_{nm}</math> gives the maximum score among all possible alignments. To compute an alignment that actually gives this score, you start from the bottom right cell, and compare the value with the three possible sources (Match, Insert, and Delete above) to see which it came from. If Match, then <math>A_i</math> and <math>B_j</math> are aligned, if Delete, then <math>A_i</math> is aligned with a gap, and if Insert, then <math>B_j</math> is aligned with a gap. (In general, more than one choice may have the same value, leading to alternative optimal alignments.)
 AlignmentA ← ""
 AlignmentB ← ""
 i ← '''length'''(A)
 j ← '''length'''(B)
 '''while''' (i > 0 '''or''' j > 0)
 {
     '''if''' (i > 0 '''and''' j > 0 '''and''' F(i, j) == F(i−1, j−1) + S(A<sub>i</sub>, B<sub>j</sub>))
     {
         AlignmentA ← A<sub>i</sub> + AlignmentA
         AlignmentB ← B<sub>j</sub> + AlignmentB
         i ← i − 1
         j ← j − 1
     }
     '''else''' '''if''' (i > 0 '''and''' F(i, j) == F(i−1, j) + d)
     {
         AlignmentA ← A<sub>i</sub> + AlignmentA
         AlignmentB ← "−" + AlignmentB
         i ← i − 1
     }
     '''else'''
     {
         AlignmentA ← "−" + AlignmentA
         AlignmentB ← B<sub>j</sub> + AlignmentB
         j ← j − 1
     }
 }