Editing Prim's algorithm

{{Short description|Method for finding minimum spanning trees}}
[[File:PrimAlgDemo.gif|200px|thumb|A demo for Prim's algorithm based on Euclidean distance]]

In [[computer science]], '''Prim's algorithm''' is a [[greedy algorithm]] that finds a [[minimum spanning tree]] for a [[Weighted graph|weighted]] [[undirected graph]]. This means it finds a subset of the [[edge (graph theory)|edge]]s that forms a [[Tree (graph theory)|tree]] that includes every [[Vertex (graph theory)|vertex]], where the total weight of all the [[graph theory|edges]] in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

The algorithm was developed in 1930 by [[Czech people|Czech]] mathematician [[Vojtěch Jarník]]<ref>{{citation
 | last = Jarník | first = V. | author-link = Vojtěch Jarník
 | journal = Práce Moravské Přírodovědecké Společnosti
 | language = cs
 | volume = 6
 | issue = 4
 | pages = 57–63
 | title = O jistém problému minimálním
 |trans-title=About a certain minimal problem
 | year = 1930| hdl = 10338.dmlcz/500726 }}.</ref> and later rediscovered and republished by [[computer scientist]]s [[Robert C. Prim]] in 1957<ref>{{citation
 | title = Shortest connection networks And some generalizations
 | last = Prim | first = R. C. | author-link = Robert C. Prim
 | date = November 1957
 | journal = [[Bell System Technical Journal]]
 | volume = 36
 | issue = 6
 | pages = 1389–1401
 | doi = 10.1002/j.1538-7305.1957.tb01515.x
 | url = https://archive.org/details/bstj36-6-1389
| bibcode = 1957BSTJ...36.1389P }}.</ref> and [[Edsger W. Dijkstra]] in 1959.<ref>{{citation
 | last = Dijkstra | first = E. W. | author-link = Edsger W. Dijkstra
 | date = December 1959
 | journal = Numerische Mathematik
 | volume = 1
 | issue = 1
 | pages = 269–271
 | title = A note on two problems in connexion with graphs
 | url = http://www-m3.ma.tum.de/twiki/pub/MN0506/WebHome/dijkstra.pdf
 | doi = 10.1007/BF01386390
 | citeseerx = 10.1.1.165.7577
| s2cid = 123284777 }}.</ref> Therefore, it is also sometimes called the '''Jarník's algorithm''',<ref>{{citation
 | title=Algorithms
 | edition=4th
 | first1=Robert |last1=Sedgewick |author1-link=Robert Sedgewick (computer scientist)
 | first2=Kevin Daniel |last2=Wayne
 | publisher=Addison-Wesley
 | year=2011
 | page=628
 | isbn=978-0-321-57351-3
 | url=https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA628
}}.</ref> '''Prim–Jarník algorithm''',<ref>{{citation|title=Discrete Mathematics and Its Applications|edition=7th|first=Kenneth|last=Rosen|publisher=McGraw-Hill Science|year=2011|url=https://books.google.com/books?id=6EJOCAAAQBAJ&pg=PA798|page=798}}.</ref> '''Prim–Dijkstra algorithm'''<ref name="chertar">{{citation
 | last1 = Cheriton | first1 = David | author1-link = David Cheriton
 | last2 = Tarjan | first2 = Robert Endre | author2-link = Robert Tarjan
 | doi = 10.1137/0205051
 | issue = 4
 | journal = [[SIAM Journal on Computing]]
 | mr = 0446458
 | pages = 724–742
 | title = Finding minimum spanning trees
 | volume = 5
 | year = 1976}}.</ref>
or the '''DJP algorithm'''.<ref name="pr02">{{citation
 | title = An optimal minimum spanning tree algorithm
 | last1 = Pettie | first1 = Seth
 | last2 = Ramachandran | first2 = Vijaya | author2-link = Vijaya Ramachandran
 | date = January 2002
 | journal = Journal of the ACM
 | issue = 1
 | volume = 49
 | pages = 16–34
 | mr = 2148431
 | doi = 10.1145/505241.505243
 | citeseerx = 10.1.1.110.7670
 | s2cid = 5362916 | url = http://www.cs.utexas.edu/~vlr/papers/optmsf-jacm.pdf
}}.</ref>

Other well-known algorithms for this problem include [[Kruskal's algorithm]] and [[Borůvka's algorithm]].<ref>{{citation|first=Robert Endre|last=Tarjan|author-link=Robert Tarjan|title=Data Structures and Network Algorithms|series=CBMS-NSF Regional Conference Series in Applied Mathematics|volume=44|publisher=[[Society for Industrial and Applied Mathematics]]|year=1983|contribution=Chapter 6. Minimum spanning trees. 6.2. Three classical algorithms|pages=72–77}}.</ref> These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each [[Connected component (graph theory)|connected component]] of the graph, it can also be used to find the minimum spanning forest.<ref>{{citation|title=Graph Algorithms in the Language of Linear Algebra|volume=22|series=Software, Environments, and Tools|first1=Jeremy|last1=Kepner|first2=John|last2=Gilbert|publisher=[[Society for Industrial and Applied Mathematics]]|year=2011|isbn=9780898719901|page=55|url=https://books.google.com/books?id=JBXDc83jRBwC&pg=PA55}}.</ref> In terms of their asymptotic [[time complexity]], these three algorithms are equally fast for [[sparse graph]]s, but slower than other more sophisticated algorithms.<ref name="pr02"/><ref name="chertar"/>
However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in [[linear time]], meeting or improving the time bounds for other algorithms.<ref name="tarjan83p77">{{harvtxt|Tarjan|1983}}, p.&nbsp;77.</ref>

[[File:Prim's algorithm.svg|right|120px|thumb|Prim's algorithm starting at vertex A. In the third step, edges BD and AB both have weight 2, so BD is chosen arbitrarily. After that step, AB is no longer a candidate for addition to the tree because it links two nodes that are already in the tree.]]

== Description ==
The algorithm may informally be described as performing the following steps:
{{ordered list
 |  Initialize a tree with a single vertex, chosen arbitrarily from the graph.
 | Grow the tree by one edge: Of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
 | Repeat step 2 (until all vertices are in the tree).
}}

In more detail, it may be implemented following the [[pseudocode]] below.
 '''function''' Prim(vertices, edges) '''is'''
     '''for''' '''each''' vertex '''in''' vertices '''do'''
         cheapestCost[vertex] ← ∞
         cheapestEdge[vertex] ← null
 
     explored ← empty set
     unexplored ← set containing all vertices
 
     startVertex ← any element of vertices
     cheapestCost[startVertex] ← 0
 
     '''while''' unexplored '''is''' not empty '''do'''
         // Select vertex in unexplored with minimum cost
         currentVertex ← vertex in unexplored with minimum cheapestCost[vertex]
         unexplored.remove(currentVertex)
         explored.add(currentVertex)
 
         '''for''' '''each''' edge (currentVertex, neighbor) '''in''' edges '''do'''
             '''if''' neighbor '''in''' unexplored '''and''' weight(currentVertex, neighbor) < cheapestCost[neighbor] THEN
                 cheapestCost[neighbor] ← weight(currentVertex, neighbor)
                 cheapestEdge[neighbor] ← (currentVertex, neighbor)
 
     resultEdges ← empty list
     '''for''' '''each''' vertex '''in''' vertices '''do'''
         '''if''' cheapestEdge[vertex] ≠ null '''THEN'''
             resultEdges.append(cheapestEdge[vertex])
 
     '''return''' resultEdges
As described above, the starting vertex for the algorithm will be chosen arbitrarily, because the first iteration of the main loop of the algorithm will have a set of vertices in ''Q'' that all have equal weights, and the algorithm will automatically start a new tree in ''F'' when it completes a spanning tree of each connected component of the input graph. The algorithm may be modified to start with any particular vertex ''s'' by setting ''C''[''s''] to be a number smaller than the other values of ''C'' (for instance, zero), and it may be modified to only find a single spanning tree rather than an entire spanning forest (matching more closely the informal description) by stopping whenever it encounters another vertex flagged as having no associated edge.

Different variations of the algorithm differ from each other in how the set ''Q'' is implemented: as a simple [[linked list]] or [[Array data structure|array]] of vertices, or as a more complicated [[priority queue]] data structure. This choice leads to differences in the [[time complexity]] of the algorithm. In general, a priority queue will be quicker at finding the vertex ''v'' with minimum cost, but will entail more expensive updates when the value of ''C''[''w''] changes.

== Time complexity ==
[[File:MAZE 30x20 Prim.ogv|thumb|Prim's algorithm has many applications, such as in the [[maze generation|generation]] of this maze, which applies Prim's algorithm to a randomly weighted [[grid graph]].]]

The time complexity of Prim's algorithm depends on the data structures used for the graph and for ordering the edges by weight, which can be done using a [[priority queue]]. The following table shows the typical choices:

{| class="wikitable"
! Minimum edge weight data structure !! Time complexity (total)
|-
| [[adjacency matrix]], searching || <math>O(|V|^2)</math>
|-
| [[binary heap]] and [[adjacency list]] || <math>O((|V| + |E|) \log |V|) = O(|E| \log |V|)</math>
|-
| [[Fibonacci heap]] and [[adjacency list]] || <math>O(|E| + |V| \log |V|)</math>
|}

A simple implementation of Prim's, using an [[adjacency matrix]] or an [[adjacency list]] graph representation and linearly searching an array of weights to find the minimum weight edge to add, requires [[Big-O notation|O]](<nowiki>|</nowiki>V<nowiki>|</nowiki><sup>2</sup>) running time. However, this running time can be greatly improved by using [[Heap (data structure)|heaps]] to implement finding minimum weight edges in the algorithm's inner loop.

A first improved version uses a heap to store all edges of the input graph, ordered by their weight. This leads to an O(|E| log |E|) worst-case running time. But storing vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed [[minimum spanning tree]] (MST) (or infinity if no such edge exists). Every time a vertex ''v'' is chosen and added to the MST, a decrease-key operation is performed on all vertices ''w'' outside the partial MST such that ''v'' is connected to ''w'', setting the key to the minimum of its previous value and the edge cost of (''v'',''w'').

Using a simple [[binary heap]] data structure, Prim's algorithm can now be shown to run in time [[Big-O notation|O]](<nowiki>|</nowiki>E<nowiki>|</nowiki> log <nowiki>|</nowiki>V<nowiki>|</nowiki>) where <nowiki>|</nowiki>E<nowiki>|</nowiki> is the number of edges and <nowiki>|</nowiki>V<nowiki>|</nowiki> is the number of vertices. Using a more sophisticated [[Fibonacci heap]], this can be brought down to [[Big-O notation|O]](<nowiki>|</nowiki>E<nowiki>|</nowiki> + <nowiki>|</nowiki>V<nowiki>|</nowiki> log <nowiki>|</nowiki>V<nowiki>|</nowiki>), which is [[Asymptotic computational complexity|asymptotically faster]] when the graph is [[Dense graph|dense]] enough that <nowiki>|</nowiki>E<nowiki>|</nowiki> is [[Big-O notation#Family of Bachmann.E2.80.93Landau notations|ω]](<nowiki>|</nowiki>V<nowiki>|</nowiki>), and [[linear time]] when |E| is at least |V|&nbsp;log&nbsp;|V|. For graphs of even greater density (having at least |V|<sup>''c''</sup> edges for some ''c''&nbsp;>&nbsp;1), Prim's algorithm can be made to run in linear time even more simply, by using a [[d-ary heap|''d''-ary heap]] in place of a Fibonacci heap.<ref name="tarjan83p77"/><ref>{{citation
 | last = Johnson | first = Donald B. | author-link = Donald B. Johnson
 | date = December 1975
 | doi = 10.1016/0020-0190(75)90001-0
 | issue = 3
 | journal = Information Processing Letters
 | pages = 53–57
 | title = Priority queues with update and finding minimum spanning trees
 | volume = 4}}.</ref>

[[File:Prim's algorithm proof.svg|thumb|150px|Demonstration of proof. In this case, the graph ''Y<sub>1</sub>'' = ''Y'' − ''f'' + ''e'' is already equal to ''Y''. In general, the process may need to be repeated.]]

== Proof of correctness ==
Let ''P'' be a connected, weighted [[graph theory|graph]].  At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph.  Since  ''P'' is connected, there will always be a path to every vertex.  The output ''Y'' of Prim's algorithm is a [[Tree (graph theory)|tree]], because the edge and vertex added to tree ''Y'' are connected. 

Let ''Y<sub>1</sub>'' be a minimum spanning tree of graph P. If ''Y<sub>1</sub>''=''Y'' then ''Y'' is a minimum spanning tree. Otherwise, let ''e'' be the first edge added during the construction of tree ''Y'' that is not in tree ''Y<sub>1</sub>'', and ''V'' be the set of vertices connected by the edges added before edge ''e''.  Then one endpoint of edge ''e'' is in set ''V'' and the other is not.  Since tree ''Y<sub>1</sub>'' is a spanning tree of graph ''P'', there is a path in tree ''Y<sub>1</sub>'' joining the two endpoints.  As one travels along the path, one must encounter an edge ''f'' joining a vertex in set ''V'' to one that is not in set ''V''.  Now, at the iteration when edge ''e'' was added to tree ''Y'', edge ''f'' could also have been added and it would be added instead of edge ''e'' if its weight was less than ''e'', and since edge ''f'' was not added, we conclude that

:<math>w(f) \ge w(e).</math>

Let tree ''Y<sub>2</sub>'' be the graph obtained by removing edge ''f'' from and adding edge ''e'' to tree ''Y<sub>1</sub>''.  It is easy to show that tree ''Y<sub>2</sub>'' is connected, has the same number of edges as tree ''Y<sub>1</sub>'', and the total weights of its edges is not larger than that of tree ''Y<sub>1</sub>'', therefore it is also a minimum spanning tree of graph ''P'' and it contains edge ''e'' and all the edges added before it during the construction of set ''V''.  Repeat the steps above and we will eventually obtain a minimum spanning tree of graph ''P'' that is identical to tree ''Y''.  This shows ''Y'' is a minimum spanning tree. The minimum spanning tree allows for the first subset of the sub-region to be expanded into a larger subset ''X'', which we assume to be the minimum.

== Parallel algorithm ==
[[File:Distributed adjacency matrix for parallel prim.png|thumb|The adjacency matrix distributed between multiple processors for parallel Prim's algorithm. In each iteration of the algorithm, every processor updates its part of ''C'' by inspecting the row of the newly inserted vertex in its set of columns in the adjacency matrix. The results are then collected and the next vertex to include in the MST is selected globally.]]

The main loop of Prim's algorithm is inherently sequential and thus not [[parallel algorithm|parallelizable]]. However, the [[#step3c|inner loop]], which determines the next edge of minimum weight that does not form a cycle, can be parallelized by dividing the vertices and edges between the available processors.<ref name="grama2003">{{citation|title=Introduction to Parallel Computing|last1=Grama|first1=Ananth|last2=Gupta|first2=Anshul|last3=Karypis|first3=George|last4=Kumar|first4=Vipin|year=2003|isbn=978-0201648652|pages=444–446|publisher=Addison-Wesley }}</ref> The following [[pseudocode]] demonstrates this.

{{ordered list
  | Assign each processor <math>P_i</math> a set <math>V_i</math> of consecutive vertices of length <math>\tfrac{|V|}{|P|}</math>.
  | Create C, E, F, and Q as in the [[#sequential_algorithm|sequential algorithm]] and divide C, E, as well as the graph between all processors such that each processor holds the incoming edges to its set of vertices. Let <math>C_i</math>, <math>E_i</math> denote the parts of ''C'', ''E'' stored on processor <math>P_i</math>.
  | Repeat the following steps until ''Q'' is empty:
{{ordered list|type=a
  | On every processor: find the vertex <math>v_i</math> having the minimum value in <math>C_i</math>[<math>v_i</math>] (local solution).
  | [[Reduction Operator|Min-reduce]] the local solutions to find the vertex ''v'' having the minimum possible value of ''C''[''v''] (global solution).
  | [[Broadcasting (computing)|Broadcast]] the selected node to every processor.
  | Add ''v'' to ''F'' and, if ''E''[''v''] is not the special flag value, also add ''E''[''v''] to ''F''.
  | On every processor: update <math>C_i</math> and <math>E_i</math> as in the sequential algorithm.
}} 
 | Return ''F''
}}

This algorithm can generally be implemented on distributed machines<ref name="grama2003" /> as well as on shared memory machines.<ref>{{citation|last1=Quinn|first1=Michael J.|last2=Deo|first2=Narsingh|date=1984|title=Parallel graph algorithms|journal=ACM Computing Surveys |volume=16 |issue=3|pages=319–348|doi=10.1145/2514.2515|s2cid=6833839|doi-access=free}}</ref> The running time is <math>O(\tfrac{|V|^2}{|P|}) + O(|V| \log |P|)</math>, assuming that the ''reduce'' and ''broadcast'' operations can be performed in <math>O(\log |P|)</math>.<ref name="grama2003" /> A variant of Prim's algorithm for shared memory machines, in which Prim's sequential algorithm is being run in parallel, starting from different vertices, has also been explored.<ref>{{citation|last=Setia|first=Rohit|date=2009|title=A new parallel algorithm for minimum spanning tree problem|journal=Proc. International Conference on High Performance Computing (HiPC)|url=https://ncit-cluster.grid.pub.ro/trac/PP2009/export/157/proiecte/pgraph/Documentation/parallelspannintree.pdf}}</ref> It should, however, be noted that more sophisticated algorithms exist to solve the [[distributed minimum spanning tree]] problem in a more efficient manner.

== See also ==
*[[Dijkstra's algorithm]], a very similar algorithm for the [[shortest path problem]]
*[[Greedoid|Greedoid<nowiki/>s]] offer a general way to understand the correctness of Prim's algorithm

== References ==
{{Reflist|30em}}

== External links ==
* [https://meyavuz.wordpress.com/2017/03/10/prims-algorithm-animation-for-randomly-distributed-points Prim's Algorithm progress on randomly distributed points]
*{{Commons category-inline|Prim's algorithm}}

{{Graph traversal algorithms}}

[[Category:Graph algorithms]]
[[Category:Spanning tree]]
[[Category:Articles containing proofs]]
[[Category:Articles containing video clips]]
[[Category:Greedy algorithms]]
[[Category:Articles with example pseudocode]]