Editing Minimum spanning tree (section)

==Algorithms==

In all of the algorithms below, {{mvar|m}} is the number of edges in the graph and {{mvar|n}} is the number of vertices.

=== Classic algorithms ===
The first algorithm for finding a minimum spanning tree was developed by Czech scientist [[Otakar Borůvka]] in 1926 (see [[Borůvka's algorithm]]). Its purpose was an efficient electrical coverage of [[Moravia]].  The algorithm proceeds in a sequence of stages. In each stage, called ''Boruvka step'', it identifies a forest {{mvar|F}} consisting of the minimum-weight edge incident to each vertex in the graph {{mvar|G}}, then forms the graph {{math|1=''G''{{sub|1}} = ''G'' \ ''F''}} as the input to the next step. Here {{math|''G'' \ ''F''}} denotes the graph derived from {{mvar|G}} by contracting edges in {{mvar|F}} (by the [[#Cut property|Cut property]], these edges belong to the MST). Each Boruvka step takes linear time. Since the number of vertices is reduced by at least half in each step, Boruvka's algorithm takes {{math|''O''(''m'' log ''n'')}} time.<ref name=PettieRamachandran2002/>

A second algorithm is [[Prim's algorithm]], which was invented by [[Vojtěch Jarník]] in 1930 and rediscovered by [[Robert C. Prim|Prim]] in 1957 and [[Edsger W. Dijkstra|Dijkstra]] in 1959. Basically, it grows the MST ({{mvar|T}}) one edge at a time. Initially, {{mvar|T}} contains an arbitrary vertex. In each step, {{mvar|T}} is augmented with a least-weight edge {{math|(''x'',''y'')}} such that {{mvar|x}} is in {{mvar|T}} and {{mvar|y}} is not yet in {{mvar|T}}. By the [[#Cut property|Cut property]], all edges added to {{mvar|T}} are in the MST. Its run-time is either {{math|''O''(''m'' log ''n'')}} or {{math|''O''(''m'' + ''n'' log ''n'')}}, depending on the data-structures used.

A third algorithm commonly in use is [[Kruskal's algorithm]], which also takes {{math|''O''(''m'' log ''n'')}} time.

A fourth algorithm, not as commonly used, is the [[reverse-delete algorithm]], which is the reverse of Kruskal's algorithm. Its runtime is {{math|O(''m'' log ''n'' (log log ''n''){{sup|3}})}}.

All four of these are [[greedy algorithm]]s. Since they run in polynomial time, the problem of finding such trees is in '''[[FP (complexity)|FP]]''', and related [[decision problem]]s such as determining whether a particular edge is in the MST or determining if the minimum total weight exceeds a certain value are in '''[[P (complexity)|P]]'''.

=== Faster algorithms ===
Several researchers have tried to find more computationally-efficient algorithms.

In a comparison model, in which the only allowed operations on edge weights are pairwise comparisons, {{harvtxt|Karger|Klein|Tarjan|1995}} found a [[Expected linear time MST algorithm|linear time randomized algorithm]] based on a combination of Borůvka's algorithm and the reverse-delete algorithm.<ref>{{citation |last1=Karger |first1=David R. |title=A randomized linear-time algorithm to find minimum spanning trees |journal=[[Journal of the Association for Computing Machinery]] |volume=42 |issue=2 |pages=321–328 |year=1995 |doi=10.1145/201019.201022 |mr=1409738 |s2cid=832583 |last2=Klein |first2=Philip N. |last3=Tarjan |first3=Robert E. |author1-link=David Karger |author-link2=Philip N. Klein |author3-link=Robert Tarjan |doi-access=free}}</ref><ref>{{citation
 | last1 = Pettie | first1 = Seth
 | last2 = Ramachandran | first2 = Vijaya | author2-link = Vijaya Ramachandran
 | contribution = Minimizing randomness in minimum spanning tree, parallel connectivity, and set maxima algorithms
 | location = San Francisco, California
 | pages = 713–722
 | title = Proc. 13th ACM-SIAM Symposium on Discrete Algorithms (SODA '02)
 | contribution-url = http://portal.acm.org/citation.cfm?id=545477
 | year = 2002| isbn = 9780898715132
 }}.</ref>

The fastest non-randomized comparison-based algorithm with known complexity, by [[Bernard Chazelle]], is based on the [[soft heap]], an approximate priority queue.<ref name=Chazelle2000>{{citation
 | last = Chazelle | first = Bernard | author-link = Bernard Chazelle
 | doi = 10.1145/355541.355562
 | mr = 1866456
 | issue = 6
 | journal = [[Journal of the Association for Computing Machinery]]
 | pages = 1028–1047
 | title = A minimum spanning tree algorithm with inverse-Ackermann type complexity
 | volume = 47
 | year = 2000| s2cid = 6276962 | doi-access = free
 }}.</ref><ref>{{citation
 | last = Chazelle | first = Bernard | author-link = Bernard Chazelle
 | doi = 10.1145/355541.355554
 | mr = 1866455
 | issue = 6
 | journal = [[Journal of the Association for Computing Machinery]]
 | pages = 1012–1027
 | title = The soft heap: an approximate priority queue with optimal error rate
 | volume = 47
 | year = 2000| s2cid = 12556140| doi-access = free
 }}.</ref> Its running time is {{math|''[[Big O notation|O]]''(''m'' α(''m'',''n''))}}, where {{math|α}} is the classical functional [[Ackermann function#Inverse|inverse of the Ackermann function]]. The function {{math|α}} grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time.

=== Linear-time algorithms in special cases ===

==== Dense graphs ====

If the graph is dense (i.e. {{math|''m''/''n'' ≥ log log log ''n'')}}, then a deterministic algorithm by Fredman and Tarjan finds the MST in time {{math|O(''m'')}}.<ref>{{Cite journal | doi = 10.1145/28869.28874| title = Fibonacci heaps and their uses in improved network optimization algorithms| journal = Journal of the ACM| volume = 34| issue = 3| pages = 596| year = 1987| last1 = Fredman | first1 = M. L. | last2 = Tarjan | first2 = R. E. | s2cid = 7904683| doi-access = free}}</ref> The algorithm executes a number of phases. Each phase executes [[Prim's algorithm]] many times, each for a limited number of steps. The run-time of each phase is {{math|O(''m'' + ''n'')}}. If the number of vertices before a phase is {{mvar|n'}}, the number of vertices remaining after a phase is at most <math>\tfrac{n'}{2^{m/n'}}</math>. Hence, at most {{math|log*''n''}} phases are needed, which gives a linear run-time for dense graphs.<ref name=PettieRamachandran2002/>

There are other algorithms that work in linear time on dense graphs.<ref name=Chazelle2000/><ref>{{Cite journal | doi = 10.1007/bf02579168| title = Efficient algorithms for finding minimum spanning trees in undirected and directed graphs| journal = Combinatorica| volume = 6| issue = 2| pages = 109| year = 1986| last1 = Gabow | first1 = H. N. | author1-link = Harold N. Gabow | last2 = Galil | first2 = Z. | last3 = Spencer | first3 = T. | last4 = Tarjan | first4 = R. E. | s2cid = 35618095}}</ref>

==== Integer weights ====

If the edge weights are integers represented in binary, then deterministic algorithms are known that solve the problem in {{math|''O''(''m'' + ''n'')}} integer operations.<ref>{{citation
 | last1 = Fredman | first1 = M. L. | author1-link = Michael Fredman
 | last2 = Willard | first2 = D. E. | author2-link = Dan Willard
 | doi = 10.1016/S0022-0000(05)80064-9
 | mr = 1279413
 | issue = 3
 | journal = [[Journal of Computer and System Sciences]]
 | pages = 533–551
 | title = Trans-dichotomous algorithms for minimum spanning trees and shortest paths
 | volume = 48
 | year = 1994| doi-access = free
 }}.</ref>
Whether the problem can be solved ''deterministically'' for a ''general graph'' in ''linear time'' by a comparison-based algorithm remains an open question.

=== Decision trees ===
Given graph {{mvar|G}} where the nodes and edges are fixed but the weights are unknown, it is possible to construct a binary [[decision tree]] (DT) for calculating the MST for any permutation of weights. Each internal node of the DT contains a comparison between two edges, e.g. "Is the weight of the edge between {{mvar|x}} and {{mvar|y}} larger than the weight of the edge between {{mvar|w}} and {{mvar|z}}?". The two children of the node correspond to the two possible answers "yes" or "no". In each leaf of the DT, there is a list of edges from {{mvar|G}} that correspond to an MST. The runtime complexity of a DT is the largest number of queries required to find the MST, which is just the depth of the DT. A DT for a graph {{mvar|G}} is called ''optimal'' if it has the smallest depth of all correct DTs for {{mvar|G}}.

For every integer {{mvar|r}}, it is possible to find optimal decision trees for all graphs on {{mvar|r}} vertices by [[brute-force search]]. This search proceeds in two steps.

'''A. Generating all potential DTs'''
* There are <math>2^{r \choose 2}</math> different graphs on  {{mvar|r}} vertices.
* For each graph, an MST can always be found using {{math|''r''(''r'' – 1)}} comparisons, e.g. by [[Prim's algorithm]].
* Hence, the depth of an optimal DT is less than {{math|''r''{{sup|2}}}}.
* Hence, the number of internal nodes in an optimal DT is less than <math>2^{r^2}</math>.
* Every internal node compares two edges. The number of edges is at most {{math|''r''{{sup|2}}}} so the different number of comparisons is at most {{math|''r''{{sup|4}}}}.
* Hence, the number of potential DTs is less than

<math>{(r^4)}^{(2^{r^2})} = r^{2^{(r^2+2)}}.</math>

'''B. Identifying the correct DTs'''
To check if a DT is correct, it should be checked on all possible permutations of the edge weights.
* The number of such permutations is at most {{math|(''r''{{sup|2}})!}}.
* For each permutation, solve the MST problem on the given graph using any existing algorithm, and compare the result to the answer given by the DT.
* The running time of any MST algorithm is at most {{math|''r''{{sup|2}}}}, so the total time required to check all permutations is at most {{math|(''r''{{sup|2}} + 1)!}}.

Hence, the total time required for finding an optimal DT for ''all'' graphs with {{mvar|r}} vertices is:<ref name=PettieRamachandran2002/>
:<math>2^{r \choose 2} \cdot r^{2^{(r^2+2)}} \cdot (r^2+1)!,</math>
which is less than
:<math>2^{2^{r^2+o(r)}}.</math>

{{See also|Decision tree model}}

=== Optimal algorithm ===

[[Seth Pettie]] and [[Vijaya Ramachandran]] have found a {{not a typo|provably}} optimal deterministic comparison-based minimum spanning tree algorithm.<ref name=PettieRamachandran2002>{{citation
 | last1 = Pettie | first1 = Seth
 | last2 = Ramachandran | first2 = Vijaya
 | doi = 10.1145/505241.505243
 | mr = 2148431
 | issue = 1
 | journal = [[Journal of the Association for Computing Machinery]]
 | pages = 16–34
 | title = An optimal minimum spanning tree algorithm
 | url = https://web.eecs.umich.edu/~pettie/papers/jacm-optmsf.pdf
 | volume = 49
 | year = 2002| s2cid = 5362916
 }}.</ref> The following is a simplified description of the algorithm.

# Let {{math|1=''r'' = log log log ''n''}}, where {{mvar|n}} is the number of vertices. Find all optimal decision trees on {{mvar|r}} vertices. This can be done in time {{math|''O''(''n'')}} (see [[#Decision trees|Decision trees]] above).
# Partition the graph to components with at most {{mvar|r}} vertices in each component. This partition uses a [[soft heap]], which "corrupts" a small number of the edges of the graph.
# Use the optimal decision trees to find an MST for the uncorrupted subgraph within each component.
# Contract each connected component spanned by the MSTs to a single vertex, and apply any algorithm which works on [[#Dense graphs|dense graphs]] in time {{math|''O''(''m'')}} to the contraction of the uncorrupted subgraph
# Add back the corrupted edges to the resulting forest to form a subgraph guaranteed to contain the minimum spanning tree, and smaller by a constant factor than the starting graph. Apply the optimal algorithm recursively to this graph.

The runtime of all steps in the algorithm is {{math|''O''(''m'')}}, ''except for the step of using the decision trees''. The runtime of this step is unknown, but it has been proved that it is optimal - no algorithm can do better than the optimal decision tree. Thus, this algorithm has the peculiar property that it is ''{{not a typo|provably}} optimal'' although its runtime complexity is ''unknown''.

=== Parallel and distributed algorithms ===
{{further|Parallel algorithms for minimum spanning trees}}
Research has also considered [[parallel algorithm]]s for the minimum spanning tree problem.
With a linear number of processors it is possible to solve the problem in {{math|''O''(log ''n'')}} time.<ref>{{citation
 | last1 = Chong | first1 = Ka Wong
 | last2 = Han | first2 = Yijie
 | last3 = Lam | first3 = Tak Wah
 | doi = 10.1145/375827.375847
 | mr = 1868718
 | issue = 2
 | journal = [[Journal of the Association for Computing Machinery]]
 | pages = 297–323
 | title = Concurrent threads and optimal parallel minimum spanning trees algorithm
 | volume = 48
 | year = 2001| s2cid = 1778676
 }}.</ref><ref>{{citation
 | last1 = Pettie | first1 = Seth
 | last2 = Ramachandran | first2 = Vijaya
 | doi = 10.1137/S0097539700371065
 | mr = 1954882
 | issue = 6
 | journal = [[SIAM Journal on Computing]]
 | pages = 1879–1895
 | title = A randomized time-work optimal parallel algorithm for finding a minimum spanning forest
 | volume = 31
 | year = 2002| url = http://www.eecs.umich.edu/~pettie/papers/sicomp-randmst.pdf
 }}.</ref>

The problem can also be approached in a [[distributed computing|distributed manner]].  If each node is considered a computer and no node knows anything except its own connected links, one can still calculate the [[distributed minimum spanning tree]].