Editing Neighbor joining (section)

== The algorithm ==
[[File:Neighbor joining 7 taxa start to finish diagram.svg|thumb|right|400px|Starting with a star tree (A), the Q matrix is calculated and used to choose a pair of nodes for joining, in this case f and g. These are joined to a newly created node, u, as shown in (B). The part of the tree shown as solid lines is now fixed and will not be changed in subsequent joining steps. The distances from node u to the nodes a-e are computed from equation ({{EquationNote|3}}). This process is then repeated, using a matrix of just the distances between the nodes, a,b,c,d,e, and u, and a Q matrix derived from it. In this case u and e are joined to the newly created v, as shown in (C). Two more iterations lead first to (D), and then to (E), at which point the algorithm is done, as the tree is fully resolved.]]
Neighbor joining takes a [[distance matrix]], which specifies the distance between each pair of [[taxon|taxa]], as input.
The algorithm starts with a completely unresolved tree, whose topology corresponds to that of a [[star network]], and iterates over the following steps, until the tree is completely resolved, and all branch lengths are known:

# Based on the current distance matrix, calculate a matrix <math>Q</math> (defined below).
# Find the pair of distinct taxa i and j (i.e. with <math>i \neq j</math>) for which <math>Q(i,j)</math> is smallest. Make a new node that joins the taxa i and j, and connect the new node to the central node. For example, in part (B) of the figure at right, node u is created to join f and g.
# Calculate the distance from each of the taxa in the pair to this new node.
# Calculate the distance from each of the taxa outside of this pair to the new node.
# Start the algorithm again, replacing the pair of joined neighbors with the new node and using the distances calculated in the previous step.

=== The Q-matrix ===
Based on a distance matrix relating the <math>n</math> taxa, calculate the <math>n</math> x <math>n</math> matrix <math>Q</math> as follows:

{{NumBlk|:| <math>Q(i,j)=(n-2)d(i,j)-\sum_{k=1}^n d(i,k) - \sum_{k=1}^n d(j,k)</math>|{{EquationRef|1}}}}

where <math>d(i,j)</math> is the distance between taxa <math>i</math> and <math>j</math>.

=== Distance from the pair members to the new node ===
For each of the taxa in the pair being joined, use the following formula to calculate the distance to the new node:

{{NumBlk|:| <math>\delta(f,u)=\frac{1}{2}d(f,g)+\frac{1}{2(n-2)} \left [ \sum_{k=1}^n d(f,k) - \sum_{k=1}^n d(g,k) \right ] \quad </math> |{{EquationRef|2}}}}
and:
: <math>\delta(g,u)=d(f,g)-\delta(f,u) \quad </math>

Taxa <math>f</math> and <math>g</math> are the paired taxa and <math>u</math> is the newly created node. The branches joining <math>f</math> and <math>u</math> and <math>g</math> and <math>u</math>, and their lengths, <math>\delta(f,u)</math> and <math>\delta(g,u)</math> are part of the tree which is gradually being created; they neither affect nor are affected by later neighbor-joining steps.

=== Distance of the other taxa from the new node ===
For each taxon not considered in the previous step, we calculate the distance to the new node as follows:

{{NumBlk|:|  <math>d(u,k)=\frac{1}{2} [d(f,k)+d(g,k)-d(f,g)]</math> |{{EquationRef|3}}}}

where <math>u</math> is the new node, <math>k</math> is the node which we want to calculate the distance to and <math>f</math> and <math>g</math> are the members of the pair just joined.

=== Complexity ===
Neighbor joining on a set of  <math>n</math> taxa requires <math>n-3</math> iterations.  At each step one has to build and search a <math>Q</math> matrix. Initially the <math>Q</math> matrix is size <math>n\times n</math>, then the next step it is <math>(n-1)\times(n-1)</math>, etc. Implementing this in a straightforward way leads to an algorithm with a time complexity of <math>O(n^3)</math>;<ref>{{Cite journal|date=November 1988|title=A note on the neighbor-joining algorithm of Saitou and Nei.|url=https://academic.oup.com/mbe/article/5/6/729/1044339/A-note-on-the-neighborjoining-algorithm-of-Saitou|journal=Molecular Biology and Evolution|language=en|doi=10.1093/oxfordjournals.molbev.a040527|pmid=3221794|issn=1537-1719|last1=Studier|first1=J. A.|last2=Keppler|first2=K. J.|volume=5|issue=6|pages=729–31|doi-access=free}}</ref> implementations exist which use heuristics to do much better than this on average.<ref>{{Cite journal|last1=Mailund|first1=Thomas|last2=Brodal|first2=GerthS|last3=Fagerberg|first3=Rolf|last4=Pedersen|first4=ChristianNS|last5=Phillips|first5=Derek|date=2006|title=Recrafting the neighbor-joining method|url= |journal=BMC Bioinformatics|volume=7|issue=1|pages=29|doi=10.1186/1471-2105-7-29|pmc=3271233|pmid=16423304 |doi-access=free }}</ref>