Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hierarchical clustering
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Complexity == The standard algorithm for '''hierarchical agglomerative clustering''' (HAC) has a [[time complexity]] of <math>\mathcal{O}(n^3)</math> and requires <math>\Omega(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: '''SLINK'''<!--boldface per WP:R#PLA--><ref name="SLINK">{{cite journal | author=R. Sibson | title=SLINK: an optimally efficient algorithm for the single-link cluster method | journal=The Computer Journal | volume=16 | issue=1 | pages=30β34 | year=1973 | publisher=British Computer Society | url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf | doi=10.1093/comjnl/16.1.30| doi-access=free }}</ref> for [[Single-linkage clustering|single-linkage]] and CLINK<ref name="CLINK">{{cite journal | author=D. Defays | title=An efficient algorithm for a complete-link method | journal=The Computer Journal | volume=20 | issue=4 | pages=364β6 | year=1977 | publisher=British Computer Society | doi=10.1093/comjnl/20.4.364| doi-access= }}</ref> for [[complete-linkage clustering]]. With a [[heap (data structure)|heap]], the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math>, an improvement on the aforementioned bound of <math>\mathcal{O}(n^3)</math>, at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Methods exist which use [[quadtree]]s that demonstrate <math>\mathcal{O}(n^2)</math> total running time with <math>\mathcal{O}(n)</math> space.<ref name=DE>{{Cite journal |last=Eppstein |first=David |date=2001-12-31 |title=Fast hierarchical clustering and other applications of dynamic closest pairs |url=https://dl.acm.org/doi/10.1145/351827.351829 |journal=ACM Journal of Experimental Algorithmics |volume=5 |pages=1βes |doi=10.1145/351827.351829 |issn=1084-6654|arxiv=cs/9912014 }}</ref> Divisive clustering with an exhaustive search is <math>\mathcal{O}(2^n)</math>, but it is common to use faster heuristics to choose splits, such as [[k-means clustering|''k''-means]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)