Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hierarchical clustering
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Divisive clustering == The basic principle of divisive clustering was published as the DIANA (DIvisive ANAlysis clustering) algorithm.<ref>{{cite book |first1=L. |last1=Kaufman |first2=P.J. |last2=Rousseeuw |chapter=6. Divisive Analysis (Program DIANA) |title=Finding Groups in Data: An Introduction to Cluster Analysis |chapter-url=https://books.google.com/books?id=YeFQHiikNo0C&pg=PA253 |orig-year=1990 |date=2009 |publisher=Wiley |isbn=978-0-470-31748-8 |pages=253β279}}</ref> Initially, all data is in the same cluster, and the largest cluster is split until every object is separate. Because there exist <math>O(2^n)</math> ways of splitting each cluster, heuristics are needed. DIANA chooses the object with the maximum average dissimilarity and then moves all objects to this cluster that are more similar to the new cluster than to the remainder. Informally, DIANA is not so much a process of "dividing" as it is of "hollowing out": each iteration, an existing cluster (e.g. the initial cluster of the entire dataset) is chosen to form a new cluster inside of it. Objects progressively move to this nested cluster, and hollow out the existing cluster. Eventually, all that's left inside a cluster is nested clusters that grew there, without it owning any loose objects by itself. Formally, DIANA operates in the following steps: # Let <math>C_0 = \{1\dots n\}</math> be the set of all <math>n</math> object indices and <math>\mathcal{C} = \{C_0\}</math> the set of all formed clusters so far. # Iterate the following until <math>|\mathcal{C}| = n</math>: ## Find the current cluster with 2 or more objects that has the largest diameter: <math>C_* = \arg\max_{C\in \mathcal{C}} \max_{i_1,i_2\in C} \delta(i_1,i_2)</math> ## Find the object in this cluster with the most dissimilarity to the rest of the cluster: <math>i^* = \arg\max_{i\in C_*} \frac{1}{|C_*|-1}\sum_{j\in C_*\setminus\{i\}} \delta(i,j)</math> ## Pop <math>i^*</math> from its old cluster <math>C_*</math> and put it into a new ''splinter group'' <math>C_\textrm{new} = \{i^*\}</math>. ## As long as <math>C_*</math> isn't empty, keep migrating objects from <math>C_*</math> to add them to <math>C_\textrm{new}</math>. To choose which objects to migrate, don't just consider dissimilarity to <math>C_*</math>, but also adjust for dissimilarity to the splinter group: let <math>i^* = \arg\max_{i\in C} D(i)</math> where we define <math>D(i) = \frac{1}{|C_*|-1}\sum_{j\in C_*\setminus\{i\}} \delta(i,j) - \frac{1}{|C_\textrm{new}|}\sum_{j\in C_\textrm{new}} \delta(i,j)</math>, then either stop iterating when <math>D(i^*) < 0</math>, or migrate <math>i^*</math>. ## Add <math>C_\textrm{new}</math> to <math>\mathcal{C}</math>. Intuitively, <math>D(i)</math> above measures how strongly an object wants to leave its current cluster, but it is attenuated when the object wouldn't fit in the splinter group either. Such objects will likely start their own splinter group eventually. The dendrogram of DIANA can be constructed by letting the splinter group <math>C_\textrm{new}</math> be a child of the hollowed-out cluster <math>C_*</math> each time. This constructs a tree with <math>C_0</math> as its root and <math>n</math> unique single-object clusters as its leaves.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)