Editing Disjoint-set data structure (section)

== History ==

Disjoint-set forests were first described by [[Bernard A. Galler]] and [[Michael J. Fischer]] in 1964.<ref name="Galler1964">{{cite journal|first1=Bernard A.|last1=Galler|author1-link=Bernard A. Galler|first2=Michael J.|last2=Fischer|author2-link=Michael J. Fischer|title=An improved equivalence algorithm|journal=[[Communications of the ACM]]|volume=7|issue=5|date=May 1964|pages=301–303|doi=10.1145/364099.364331|s2cid=9034016 |doi-access=free}}. The paper originating disjoint-set forests.</ref> In 1973, their time complexity was bounded to <math>O(\log^{*}(n))</math>, the [[iterated logarithm]] of <math>n</math>, by [[John Hopcroft|Hopcroft]] and [[Jeffrey Ullman|Ullman]].<ref name="Hopcroft1973">{{cite journal|last1=Hopcroft|first1=J. E.|author1-link=John Hopcroft|last2=Ullman|first2=J. D.|author2-link=Jeffrey Ullman|year=1973|title=Set Merging Algorithms|journal=SIAM Journal on Computing|volume=2|issue=4|pages=294&ndash;303|doi=10.1137/0202024}}</ref> In 1975, [[Robert Tarjan]] was the first to prove the <math>O(m\alpha(n))</math> ([[Ackermann function#Inverse|inverse Ackermann function]]) upper bound on the algorithm's time complexity.<ref name="Tarjan1984">{{cite journal|first1=Robert E.|last1=Tarjan|author1-link=Robert E. Tarjan|first2=Jan|last2=van Leeuwen|author2-link=Jan van Leeuwen|title=Worst-case analysis of set union algorithms|journal=Journal of the ACM|volume=31|issue=2|pages=245–281|year=1984|doi= 10.1145/62.2160|s2cid=5363073 |doi-access=free}}</ref> He also proved it to be tight. In 1979, he showed that this was the lower bound for a certain class of algorithms, that include the Galler-Fischer structure.<ref name="Tarjan1979">{{cite journal|first=Robert Endre|last=Tarjan|author-link=Robert E. Tarjan|year=1979|title=A class of algorithms which require non-linear time to maintain disjoint sets|journal=Journal of Computer and System Sciences|volume=18|issue=2|pages=110&ndash;127|doi=10.1016/0022-0000(79)90042-4|doi-access=free }}</ref> In 1989, [[Michael Fredman|Fredman]] and [[Michael Saks (mathematician)|Saks]] showed that <math>\Omega(\alpha(n))</math> (amortized) words of <math>O(\log n)</math> bits must be accessed by ''any'' disjoint-set data structure per operation,<ref name="Fredman1989">{{cite book|first1=M.|last1=Fredman|author-link=Michael Fredman|first2=M.|last2=Saks|title=Proceedings of the twenty-first annual ACM symposium on Theory of computing - STOC '89 |chapter=The cell probe complexity of dynamic data structures |pages=345&ndash;354|date=May 1989|doi=10.1145/73007.73040|isbn=0897913078|s2cid=13470414|quote=Theorem 5: Any CPROBE(log ''n'') implementation of the set union problem requires Ω(''m'' α(''m'', ''n'')) time to execute ''m'' Find's and ''n''&minus;1 Union's, beginning with ''n'' singleton sets. |doi-access=free}}</ref> thereby proving the optimality of the data structure in this model.

In 1991, Galil and Italiano published a survey of data structures for disjoint-sets.<ref name="Galil1991">{{cite journal|first1=Z.|last1=Galil|first2=G.|last2=Italiano|title=Data structures and algorithms for disjoint set union problems|journal=ACM Computing Surveys|volume=23|issue=3|pages=319&ndash;344|year=1991|doi=10.1145/116873.116878|s2cid=207160759 }}</ref>

In 1994, Richard J. Anderson and Heather Woll described a parallelized version of Union–Find that never needs to block.<ref name="Anderson1994">{{cite conference|first1=Richard J.|last1=Anderson|first2=Heather|last2=Woll|title=Wait-free Parallel Algorithms for the Union-Find Problem|conference=23rd ACM Symposium on Theory of Computing|year=1994|pages=370&ndash;380}}</ref>

In 2007, Sylvain Conchon and Jean-Christophe Filliâtre developed a semi-[[persistent data structure|persistent]] version of the disjoint-set forest data structure and formalized its correctness using the [[proof assistant]] [[Coq (software)|Coq]].<ref name="Conchon2007">{{cite conference|first1=Sylvain|last1=Conchon|first2=Jean-Christophe|last2=Filliâtre|contribution=A Persistent Union-Find Data Structure|title=ACM SIGPLAN Workshop on ML|location=Freiburg, Germany|date=October 2007|url=https://www.lri.fr/~filliatr/puf/}}</ref> "Semi-persistent" means that previous versions of the structure are efficiently retained, but accessing previous versions of the data structure invalidates later ones. Their fastest implementation achieves performance almost as efficient as the non-persistent algorithm. They do not perform a complexity analysis.

Variants of disjoint-set data structures with better performance on a restricted class of problems have also been considered.  Gabow and Tarjan showed that if the possible unions are restricted in certain ways, then a truly linear time algorithm is possible.<ref>Harold N. Gabow, Robert Endre Tarjan, "A linear-time algorithm for a special case of disjoint set union," Journal of Computer and System Sciences, Volume 30, Issue 2, 1985, pp. 209–221, ISSN 0022-0000, https://doi.org/10.1016/0022-0000(85)90014-5</ref>