Minimum spanning tree

Template:Short description Template:Use American English

A planar graph and its minimum spanning tree. Each edge is labeled with its weight, which here is roughly proportional to its length.

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.<ref name="Numpy and Scipy Documentation — Numpy and Scipy documentation">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> That is, it is a spanning tree whose sum of edge weights is as small as possible.<ref name="NetworkX 2.6.2 documentation">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> More generally, any edge-weighted undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

There are many use cases for minimum spanning trees. One example is a telecommunications company trying to lay cable in a new neighborhood. If it is constrained to bury the cable only along certain paths (e.g. roads), then there would be a graph containing the points (e.g. houses) connected by those paths. Some of the paths might be more expensive, because they are longer, or require the cable to be buried deeper; these paths would be represented by edges with larger weights. Currency is an acceptable unit for edge weight – there is no requirement for edge lengths to obey normal rules of geometry such as the triangle inequality. A spanning tree for that graph would be a subset of those paths that has no cycles but still connects every house; there might be several spanning trees possible. A minimum spanning tree would be one with the lowest total cost, representing the least expensive path for laying the cable.

PropertiesEdit

Possible multiplicityEdit

If there are Template:Mvar vertices in the graph, then each spanning tree has Template:Math edges.

File:Multiple minimum spanning trees.svg

This figure shows there may be more than one minimum spanning tree in a graph. In the figure, the two trees below the graph are two possibilities of minimum spanning tree of the given graph.

There may be several minimum spanning trees of the same weight; in particular, if all the edge weights of a given graph are the same, then every spanning tree of that graph is minimum.

UniquenessEdit

If each edge has a distinct weight then there will be only one, unique minimum spanning tree. This is true in many realistic situations, such as the telecommunications company example above, where it's unlikely any two paths have exactly the same cost. This generalizes to spanning forests as well.

Proof:

Assume the contrary, that there are two different MSTs Template:Mvar and Template:Mvar.
Since Template:Mvar and Template:Mvar differ despite containing the same nodes, there is at least one edge that belongs to one but not the other. Among such edges, let Template:Math be the one with least weight; this choice is unique because the edge weights are all distinct. Without loss of generality, assume Template:Math is in Template:Mvar.
As Template:Mvar is an MST, Template:Math must contain a cycle Template:Mvar with Template:Math.
As a tree, Template:Mvar contains no cycles, therefore Template:Mvar must have an edge Template:Math that is not in Template:Mvar.
Since Template:Math was chosen as the unique lowest-weight edge among those belonging to exactly one of Template:Mvar and Template:Mvar, the weight of Template:Math must be greater than the weight of Template:Math.
As Template:Math and Template:Math are part of the cycle Template:Mvar, replacing Template:Math with Template:Math in Template:Mvar therefore yields a spanning tree with a smaller weight.
This contradicts the assumption that Template:Mvar is an MST.

More generally, if the edge weights are not all distinct then only the (multi-)set of weights in minimum spanning trees is certain to be unique; it is the same for all minimum spanning trees.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

Minimum-cost subgraphEdit

If the weights are positive, then a minimum spanning tree is, in fact, a minimum-cost subgraph connecting all vertices, since if a subgraph contains a cycle, removing any edge along that cycle will decrease its cost and preserve connectivity.

Cycle propertyEdit

For any cycle Template:Mvar in the graph, if the weight of an edge Template:Mvar of Template:Mvar is larger than any of the individual weights of all other edges of Template:Mvar, then this edge cannot belong to an MST.

Proof: Assume the contrary, i.e. that Template:Mvar belongs to an MST Template:Math. Then deleting Template:Mvar will break Template:Math into two subtrees with the two ends of Template:Mvar in different subtrees. The remainder of Template:Mvar reconnects the subtrees, hence there is an edge Template:Mvar of Template:Mvar with ends in different subtrees, i.e., it reconnects the subtrees into a tree Template:Math with weight less than that of Template:Math, because the weight of Template:Mvar is less than the weight of Template:Mvar.

Cut propertyEdit

File:Msp-the-cut-correct.svg

This figure shows the cut property of MSTs. Template:Mvar is the only MST of the given graph. If Template:Math thus Template:Math then there are 3 possibilities of the edge across the cut Template:Math, they are edges Template:Mvar, Template:Mvar, Template:Mvar of the original graph. Then, e is one of the minimum-weight-edge for the cut, therefore Template:Math is part of the MST Template:Mvar.

For any cut Template:Mvar of the graph, if the weight of an edge Template:Mvar in the cut-set of Template:Mvar is strictly smaller than the weights of all other edges of the cut-set of Template:Mvar, then this edge belongs to all MSTs of the graph.

Proof: Assume that there is an MST Template:Mvar that does not contain Template:Mvar. Adding Template:Mvar to Template:Mvar will produce a cycle, that crosses the cut once at Template:Mvar and crosses back at another edge Template:Mvar. Deleting Template:Mvar we get a spanning tree Template:Math of strictly smaller weight than Template:Mvar. This contradicts the assumption that Template:Mvar was a MST.

By a similar argument, if more than one edge is of minimum weight across a cut, then each such edge is contained in some minimum spanning tree.

Minimum-cost edgeEdit

If the minimum cost edge Template:Mvar of a graph is unique, then this edge is included in any MST.

Proof: if Template:Mvar was not included in the MST, removing any of the (larger cost) edges in the cycle formed after adding Template:Mvar to the MST, would yield a spanning tree of smaller weight.

ContractionEdit

If Template:Mvar is a tree of MST edges, then we can contract Template:Mvar into a single vertex while maintaining the invariant that the MST of the contracted graph plus Template:Mvar gives the MST for the graph before contraction.<ref name=PettieRamachandran2002/>

AlgorithmsEdit

In all of the algorithms below, Template:Mvar is the number of edges in the graph and Template:Mvar is the number of vertices.

Classic algorithmsEdit

The first algorithm for finding a minimum spanning tree was developed by Czech scientist Otakar Borůvka in 1926 (see Borůvka's algorithm). Its purpose was an efficient electrical coverage of Moravia. The algorithm proceeds in a sequence of stages. In each stage, called Boruvka step, it identifies a forest Template:Mvar consisting of the minimum-weight edge incident to each vertex in the graph Template:Mvar, then forms the graph Template:Math as the input to the next step. Here Template:Math denotes the graph derived from Template:Mvar by contracting edges in Template:Mvar (by the Cut property, these edges belong to the MST). Each Boruvka step takes linear time. Since the number of vertices is reduced by at least half in each step, Boruvka's algorithm takes Template:Math time.<ref name=PettieRamachandran2002/>

A second algorithm is Prim's algorithm, which was invented by Vojtěch Jarník in 1930 and rediscovered by Prim in 1957 and Dijkstra in 1959. Basically, it grows the MST (Template:Mvar) one edge at a time. Initially, Template:Mvar contains an arbitrary vertex. In each step, Template:Mvar is augmented with a least-weight edge Template:Math such that Template:Mvar is in Template:Mvar and Template:Mvar is not yet in Template:Mvar. By the Cut property, all edges added to Template:Mvar are in the MST. Its run-time is either Template:Math or Template:Math, depending on the data-structures used.

A third algorithm commonly in use is Kruskal's algorithm, which also takes Template:Math time.

A fourth algorithm, not as commonly used, is the reverse-delete algorithm, which is the reverse of Kruskal's algorithm. Its runtime is Template:Math.

All four of these are greedy algorithms. Since they run in polynomial time, the problem of finding such trees is in FP, and related decision problems such as determining whether a particular edge is in the MST or determining if the minimum total weight exceeds a certain value are in P.

Faster algorithmsEdit

Several researchers have tried to find more computationally-efficient algorithms.

In a comparison model, in which the only allowed operations on edge weights are pairwise comparisons, Template:Harvtxt found a linear time randomized algorithm based on a combination of Borůvka's algorithm and the reverse-delete algorithm.<ref>Template:Citation</ref><ref>Template:Citation.</ref>

The fastest non-randomized comparison-based algorithm with known complexity, by Bernard Chazelle, is based on the soft heap, an approximate priority queue.<ref name=Chazelle2000>Template:Citation.</ref><ref>Template:Citation.</ref> Its running time is Template:Math, where Template:Math is the classical functional inverse of the Ackermann function. The function Template:Math grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time.

Linear-time algorithms in special casesEdit

Dense graphsEdit

If the graph is dense (i.e. Template:Math, then a deterministic algorithm by Fredman and Tarjan finds the MST in time Template:Math.<ref>Template:Cite journal</ref> The algorithm executes a number of phases. Each phase executes Prim's algorithm many times, each for a limited number of steps. The run-time of each phase is Template:Math. If the number of vertices before a phase is Template:Mvar, the number of vertices remaining after a phase is at most <math>\tfrac{n'}{2^{m/n'}}</math>. Hence, at most Template:Math phases are needed, which gives a linear run-time for dense graphs.<ref name=PettieRamachandran2002/>

There are other algorithms that work in linear time on dense graphs.<ref name=Chazelle2000/><ref>Template:Cite journal</ref>

Integer weightsEdit

If the edge weights are integers represented in binary, then deterministic algorithms are known that solve the problem in Template:Math integer operations.<ref>Template:Citation.</ref> Whether the problem can be solved deterministically for a general graph in linear time by a comparison-based algorithm remains an open question.

Decision treesEdit

Given graph Template:Mvar where the nodes and edges are fixed but the weights are unknown, it is possible to construct a binary decision tree (DT) for calculating the MST for any permutation of weights. Each internal node of the DT contains a comparison between two edges, e.g. "Is the weight of the edge between Template:Mvar and Template:Mvar larger than the weight of the edge between Template:Mvar and Template:Mvar?". The two children of the node correspond to the two possible answers "yes" or "no". In each leaf of the DT, there is a list of edges from Template:Mvar that correspond to an MST. The runtime complexity of a DT is the largest number of queries required to find the MST, which is just the depth of the DT. A DT for a graph Template:Mvar is called optimal if it has the smallest depth of all correct DTs for Template:Mvar.

For every integer Template:Mvar, it is possible to find optimal decision trees for all graphs on Template:Mvar vertices by brute-force search. This search proceeds in two steps.

A. Generating all potential DTs

There are <math>2^{r \choose 2}</math> different graphs on Template:Mvar vertices.
For each graph, an MST can always be found using Template:Math comparisons, e.g. by Prim's algorithm.
Hence, the depth of an optimal DT is less than Template:Math.
Hence, the number of internal nodes in an optimal DT is less than <math>2^{r^2}</math>.
Every internal node compares two edges. The number of edges is at most Template:Math so the different number of comparisons is at most Template:Math.
Hence, the number of potential DTs is less than

B. Identifying the correct DTs To check if a DT is correct, it should be checked on all possible permutations of the edge weights.

The number of such permutations is at most Template:Math.
For each permutation, solve the MST problem on the given graph using any existing algorithm, and compare the result to the answer given by the DT.
The running time of any MST algorithm is at most Template:Math, so the total time required to check all permutations is at most Template:Math.

Hence, the total time required for finding an optimal DT for all graphs with Template:Mvar vertices is:<ref name=PettieRamachandran2002/>

<math>2^{r \choose 2} \cdot r^{2^{(r^2+2)}} \cdot (r^2+1)!,</math>

which is less than

Template:See also

Optimal algorithmEdit

Seth Pettie and Vijaya Ramachandran have found a Template:Not a typo optimal deterministic comparison-based minimum spanning tree algorithm.<ref name=PettieRamachandran2002>Template:Citation.</ref> The following is a simplified description of the algorithm.

Let Template:Math, where Template:Mvar is the number of vertices. Find all optimal decision trees on Template:Mvar vertices. This can be done in time Template:Math (see Decision trees above).
Partition the graph to components with at most Template:Mvar vertices in each component. This partition uses a soft heap, which "corrupts" a small number of the edges of the graph.
Use the optimal decision trees to find an MST for the uncorrupted subgraph within each component.
Contract each connected component spanned by the MSTs to a single vertex, and apply any algorithm which works on dense graphs in time Template:Math to the contraction of the uncorrupted subgraph
Add back the corrupted edges to the resulting forest to form a subgraph guaranteed to contain the minimum spanning tree, and smaller by a constant factor than the starting graph. Apply the optimal algorithm recursively to this graph.

The runtime of all steps in the algorithm is Template:Math, except for the step of using the decision trees. The runtime of this step is unknown, but it has been proved that it is optimal - no algorithm can do better than the optimal decision tree. Thus, this algorithm has the peculiar property that it is Template:Not a typo optimal although its runtime complexity is unknown.

Parallel and distributed algorithmsEdit

Template:Further Research has also considered parallel algorithms for the minimum spanning tree problem. With a linear number of processors it is possible to solve the problem in Template:Math time.<ref>Template:Citation.</ref><ref>Template:Citation.</ref>

The problem can also be approached in a distributed manner. If each node is considered a computer and no node knows anything except its own connected links, one can still calculate the distributed minimum spanning tree.

MST on complete graphs with random weightsEdit

Alan M. Frieze showed that given a complete graph on n vertices, with edge weights that are independent identically distributed random variables with distribution function <math>F</math> satisfying <math>F'(0) > 0</math>, then as n approaches +∞ the expected weight of the MST approaches <math>\zeta(3)/F'(0)</math>, where <math>\zeta</math> is the Riemann zeta function (more specifically is <math>\zeta(3)</math> Apéry's constant). Frieze and Steele also proved convergence in probability. Svante Janson proved a central limit theorem for weight of the MST.

For uniform random weights in <math>[0,1]</math>, the exact expected size of the minimum spanning tree has been computed for small complete graphs.<ref>Template:Citation</ref>

Vertices	Expected size	Approximate expected size
2	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	0.5
3	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	0.75
4	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	0.8857143
5	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	0.9664502
6	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	1.0183151
7	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	1.053716
8	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	1.0790588
9	Template:Sfrac {{safesubst:#invoke:Check for unknown parameters\|check\|unknown=\|preview=Page using Template:Center with unknown parameter "_VALUE_"\|ignoreblank=y\| 1 \| style }}	1.0979027

Fractional variantTemplate:AnchorEdit

There is a fractional variant of the MST, in which each edge is allowed to appear "fractionally". Formally, a fractional spanning set of a graph (V,E) is a nonnegative function f on E such that, for every non-trivial subset W of V (i.e., W is neither empty nor equal to V), the sum of f(e) over all edges connecting a node of W with a node of V\W is at least 1. Intuitively, f(e) represents the fraction of e that is contained in the spanning set. A minimum fractional spanning set is a fractional spanning set for which the sum <math>\sum_{e\in E} f(e)\cdot w(e)</math> is as small as possible.

If the fractions f(e) are forced to be in {0,1}, then the set T of edges with f(e)=1 are a spanning set, as every node or subset of nodes is connected to the rest of the graph by at least one edge of T. Moreover, if f minimizes<math>\sum_{e\in E} f(e)\cdot w(e)</math>, then the resulting spanning set is necessarily a tree, since if it contained a cycle, then an edge could be removed without affecting the spanning condition. So the minimum fractional spanning set problem is a relaxation of the MST problem, and can also be called the fractional MST problem.

The fractional MST problem can be solved in polynomial time using the ellipsoid method.<ref name=":1">Template:Cite Geometric Algorithms and Combinatorial Optimization</ref>Template:Rp However, if we add a requirement that f(e) must be half-integer (that is, f(e) must be in {0, 1/2, 1}), then the problem becomes NP-hard,<ref name=":1" />Template:Rp since it includes as a special case the Hamiltonian cycle problem: in an <math>n</math>-vertex unweighted graph, a half-integer MST of weight <math>n/2</math> can only be obtained by assigning weight 1/2 to each edge of a Hamiltonian cycle.

Other variantsEdit

Template:Regular polygon minimum spanning tree.svg

The Steiner tree of a subset of the vertices is the minimum tree that spans the given subset. Finding the Steiner tree is NP-complete.<ref>Template:Garey-Johnson. ND12</ref>

The k-minimum spanning tree (k-MST) is the tree that spans some subset of k vertices in the graph with minimum weight.
A set of k-smallest spanning trees is a subset of k spanning trees (out of all possible spanning trees) such that no spanning tree outside the subset has smaller weight.<ref>Template:Citation.</ref><ref>Template:Citation.</ref><ref>Template:Citation.</ref> (Note that this problem is unrelated to the k-minimum spanning tree.)
The Euclidean minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the Euclidean distance between vertices which are points in the plane (or space).
The rectilinear minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the rectilinear distance between vertices which are points in the plane (or space).
The distributed minimum spanning tree is an extension of MST to the distributed model, where each node is considered a computer and no node knows anything except its own connected links. The mathematical definition of the problem is the same but there are different approaches for a solution.
The capacitated minimum spanning tree is a tree that has a marked node (origin, or root) and each of the subtrees attached to the node contains no more than c nodes. c is called a tree capacity. Solving CMST optimally is NP-hard,<ref>Template:Citation</ref> but good heuristics such as Esau-Williams and Sharma produce solutions close to optimal in polynomial time.
The degree-constrained minimum spanning tree is a MST in which each vertex is connected to no more than d other vertices, for some given number d. The case d = 2 is a special case of the traveling salesman problem, so the degree constrained minimum spanning tree is NP-hard in general.
An arborescence is a variant of MST for directed graphs. It can be solved in <math>O(E + V \log V)</math> time using the Chu–Liu/Edmonds algorithm.
A maximum spanning tree is a spanning tree with weight greater than or equal to the weight of every other spanning tree. Such a tree can be found with algorithms such as Prim's or Kruskal's after multiplying the edge weights by -1 and solving the MST problem on the new graph. A path in the maximum spanning tree is the widest path in the graph between its two endpoints: among all possible paths, it maximizes the weight of the minimum-weight edge.<ref>Template:Citation.</ref> Maximum spanning trees find applications in parsing algorithms for natural languages<ref>Template:Cite conference</ref> and in training algorithms for conditional random fields.
The dynamic MST problem concerns the update of a previously computed MST after an edge weight change in the original graph or the insertion/deletion of a vertex.<ref>Template:Citation.</ref><ref>Template:Citation.</ref><ref>Template:Citation.</ref>
The minimum labeling spanning tree problem is to find a spanning tree with least types of labels if each edge in a graph is associated with a label from a finite label set instead of a weight.<ref>Template:Citation.</ref>
A bottleneck edge is the highest weighted edge in a spanning tree. A spanning tree is a minimum bottleneck spanning tree (or MBST) if the graph does not contain a spanning tree with a smaller bottleneck edge weight. A MST is necessarily a MBST (Template:Not a typo by the cut property), but a MBST is not necessarily a MST.<ref>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref><ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

A minimum-cost spanning tree game is a cooperative game in which the players have to share among them the costs of constructing the optimal spanning tree.
The optimal network design problem is the problem of computing a set, subject to a budget constraint, which contains a spanning tree, such that the sum of shortest paths between every pair of nodes is as small as possible.

ApplicationsEdit

Minimum spanning trees have direct applications in the design of networks, including computer networks, telecommunications networks, transportation networks, water supply networks, and electrical grids (which they were first invented for, as mentioned above).<ref>Template:Citation</ref> They are invoked as subroutines in algorithms for other problems, including the Christofides algorithm for approximating the traveling salesman problem,<ref>Nicos Christofides, Worst-case analysis of a new heuristic for the travelling salesman problem, Report 388, Graduate School of Industrial Administration, CMU, 1976.</ref> approximating the multi-terminal minimum cut problem (which is equivalent in the single-terminal case to the maximum flow problem),<ref>Template:Cite journal</ref> and approximating the minimum-cost weighted perfect matching.<ref>Template:Cite conference</ref>

Other practical applications based on minimal spanning trees include:

Taxonomy.<ref>Template:Cite journal</ref>
Cluster analysis: clustering points in the plane,<ref>Template:Cite conference</ref> single-linkage clustering (a method of hierarchical clustering),<ref>Template:Cite journal</ref> graph-theoretic clustering,<ref>Template:Cite journal</ref> and clustering gene expression data.<ref>Template:Cite journal</ref>
Constructing trees for broadcasting in computer networks.<ref>Template:Cite journal</ref>
Image registration<ref>Template:Cite conference</ref> and segmentation<ref>P. Felzenszwalb, D. Huttenlocher: Efficient Graph-Based Image Segmentation. IJCV 59(2) (September 2004)</ref> – see minimum spanning tree-based segmentation.
Curvilinear feature extraction in computer vision.<ref>Template:Cite journal</ref>
Handwriting recognition of mathematical expressions.<ref>Template:Cite book</ref>
Circuit design: implementing efficient multiple constant multiplications, as used in finite impulse response filters.<ref>Template:Cite conference</ref>
Regionalisation of socio-geographic areas, the grouping of areas into homogeneous, contiguous regions.<ref>Template:Cite journal</ref>
Comparing ecotoxicology data.<ref>Template:Cite journal</ref>
Topological observability in power systems.<ref>Template:Cite journal</ref>
Measuring homogeneity of two-dimensional materials.<ref>Template:Cite journal</ref>
Minimax process control.<ref>Template:Citation</ref>
Minimum spanning trees can also be used to describe financial markets.<ref>Mantegna, R. N. (1999). Hierarchical structure in financial markets. The European Physical Journal B-Condensed Matter and Complex Systems, 11(1), 193–197.</ref><ref>Djauhari, M., & Gan, S. (2015). Optimality problem of network topology in stocks market analysis. Physica A: Statistical Mechanics and Its Applications, 419, 108–114.</ref> A correlation matrix can be created by calculating a coefficient of correlation between any two stocks. This matrix can be represented topologically as a complex network and a minimum spanning tree can be constructed to visualize relationships.

ReferencesEdit

Template:Reflist

External linksEdit

Template:Sister project