Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cluster analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Applications == {{more citations needed section|date=November 2016}} ===Biology, computational biology and bioinformatics=== {{See also|Distance matrices in phylogeny}} ; [[Plant]] and [[animal]] [[ecology]] :Cluster analysis is used to describe and to make spatial and temporal comparisons of communities (assemblages) of organisms in heterogeneous environments. It is also used in [[Systematics|plant systematics]] to generate artificial [[Phylogeny|phylogenies]] or clusters of organisms (individuals) at the species, genus or higher level that share a number of attributes. ; [[Transcriptomics]] :Clustering is used to build groups of [[genes]] with related expression patterns (also known as coexpressed genes) as in [[HCS clustering algorithm]].<ref>{{Cite journal|last=Johnson|first=Stephen C.|s2cid=930698|date=1967-09-01|title=Hierarchical clustering schemes|journal=Psychometrika|language=en|volume=32|issue=3|pages=241–254|doi=10.1007/BF02289588|pmid=5234703|issn=1860-0980}}</ref><ref>{{Cite journal|last1=Hartuv|first1=Erez|last2=Shamir|first2=Ron|date=2000-12-31|title=A clustering algorithm based on graph connectivity|journal=Information Processing Letters|volume=76|issue=4|pages=175–181|doi=10.1016/S0020-0190(00)00142-3|issn=0020-0190}}</ref> Often such groups contain functionally related proteins, such as [[enzyme]]s for a specific [[metabolic pathway|pathway]], or genes that are co-regulated. High throughput experiments using [[expressed sequence tag]]s (ESTs) or [[DNA microarray]]s can be a powerful tool for [[genome annotation]]{{snd}}a general aspect of [[genomics]]. ; [[Sequence analysis]] :[[Sequence clustering]] is used to group homologous sequences into [[list of gene families|gene families]].<ref>{{Cite journal|last1=Remm|first1=Maido|last2=Storm|first2=Christian E. V.|last3=Sonnhammer|first3=Erik L. L.|date=2001-12-14|title=Automatic clustering of orthologs and in-paralogs from pairwise species comparisons11Edited by F. Cohen|journal=Journal of Molecular Biology|volume=314|issue=5|pages=1041–1052|doi=10.1006/jmbi.2000.5197|issn=0022-2836|pmid=11743721}}</ref> This is a very important concept in [[bioinformatics]], and [[evolutionary biology]] in general. See evolution by [[gene duplication]]. ; High-throughput [[genotype|genotyping]] platforms :Clustering algorithms are used to automatically assign genotypes.<ref>{{Cite journal|last1=Botstein|first1=David|last2=Cox|first2=David R.|last3=Risch|first3=Neil|last4=Olshen|first4=Richard|last5=Curb|first5=David|last6=Dzau|first6=Victor J.|last7=Chen|first7=Yii-Der I.|last8=Hebert|first8=Joan|last9=Pesich|first9=Robert|date=2001-07-01|title=High-Throughput Genotyping with Single Nucleotide Polymorphisms|url=http://genome.cshlp.org/content/11/7/1262|journal=Genome Research|language=en|volume=11|issue=7|pages=1262–1268|doi=10.1101/gr.157801|issn=1088-9051|pmid=11435409|pmc=311112}}</ref> ; [[Human genetic clustering]] :The similarity of genetic data is used in clustering to infer population structures. ===[[Medicine]]=== ; [[Medical imaging]] :On [[PET scan]]s, cluster analysis can be used to differentiate between different types of [[tissue (biology)|tissue]] in a three-dimensional image for many different purposes.<ref>{{cite journal |last1=Filipovych |first1=Roman |last2=Resnick |first2=Susan M. |last3=Davatzikos |first3=Christos|title=Semi-supervised Cluster Analysis of Imaging Data |journal=NeuroImage |date=2011|volume=54 |issue=3 |pages=2185–2197 |doi=10.1016/j.neuroimage.2010.09.074|pmc=3008313 |pmid=20933091}}</ref> ; Analysis of antimicrobial activity :Cluster analysis can be used to analyse patterns of antibiotic resistance, to classify antimicrobial compounds according to their mechanism of action, to classify antibiotics according to their antibacterial activity. ; IMRT segmentation :Clustering can be used to divide a fluence map into distinct regions for conversion into deliverable fields in MLC-based Radiation Therapy. ===Business and marketing=== ; [[Market research]] :Cluster analysis is widely used in market research when working with multivariate data from [[Statistical survey|surveys]] and test panels. Market researchers use cluster analysis to partition the general [[population]] of [[consumer]]s into market segments and to better understand the relationships between different groups of consumers/potential [[customers]], and for use in [[market segmentation]], [[positioning (marketing)|product positioning]], [[new product development]] and selecting test markets. ; Grouping of shopping items :Clustering can be used to group all the shopping items available on the web into a set of unique products. For example, all the items on eBay can be grouped into unique products (eBay does not have the concept of a [[Stock-keeping unit|SKU]]). ===[[World Wide Web]]=== ; Social network analysis :In the study of [[social network]]s, clustering may be used to recognize [[communities]] within large groups of people. ; Search result grouping :In the process of intelligent grouping of the files and websites, clustering may be used to create a more relevant set of search results compared to normal search engines like [[Google]]{{citation needed|date=July 2018}}. There are currently a number of web-based clustering tools such as [[Clusty]]. It also may be used to return a more comprehensive set of results in cases where a search term could refer to vastly different things. Each distinct use of the term corresponds to a unique cluster of results, allowing a ranking algorithm to return comprehensive results by picking the top result from each cluster.<ref name="mitpressjournals.org">{{Cite journal |doi = 10.1162/COLI_a_00148|title = Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction|journal = Computational Linguistics|volume = 39|issue = 3|pages = 709–754|year = 2013|last1 = Di Marco|first1 = Antonio|last2 = Navigli|first2 = Roberto|s2cid = 1775181}}</ref> ; Slippy map optimization :[[Flickr]]'s map of photos and other map sites use clustering to reduce the number of markers on a map.{{citation needed|date=May 2023}} This makes it both faster and reduces the amount of visual clutter. ===[[Computer science]]=== ; [[Software evolution]] :Clustering is useful in software evolution as it helps to reduce legacy properties in code by reforming functionality that has become dispersed. It is a form of restructuring and hence is a way of direct preventative maintenance. ; [[Image segmentation]] :Image segmentation is the process of dividing a digital image into multiple meaningful regions or segments to simplify and/or change the representation of an image, making it easier to analyze. These segments may correspond to different objects, parts of objects, or background areas. The goal is to assign a label to every pixel in the image so that the pixels with similar attributes are grouped together. :This process is used in fields like medical imaging, computer vision, satellite imaging, and in daily applications like face detection and photo editing. :[[File:Aurora_borealis_over_Eielson_Air_Force_Base,_Alaska.jpg|alt=The aurora borealis, or northern lights, above Bear Lake, Alaska|thumb|300x300px|The aurora borealis, or northern lights, above Bear Lake, Alaska]][[File:Polarlicht_2_kmeans_16_large.png|alt=Polarlicht 2 kmeans 16 large|thumb|300x300px|Image after running k-means clustering with ''k = 16''.]]'''Clustering in Image Segmentation:''' :Clustering plays a significant role in image segmentation. It groups pixels into clusters based on similarity without needing labeled data. These clusters then define segments within the image. : :Here are the most commonly used clustering algorithms for image segmentation: :# '''[[K-means clustering|''K''-means Clustering]]:''' One of the most popular and straightforward methods. Pixels are treated as data points in a feature space (usually defined by color or intensity) and grouped into ''k'' clusters. Each pixel is assigned to the nearest cluster center, and the centers are updated iteratively. :# '''[[Mean shift|Mean Shift Clustering]]:''' A non-parametric method that does not require specifying the number of clusters in advance. It identifies clusters by locating dense areas of data points in the feature space. :# '''[[Fuzzy clustering|Fuzzy ''C''-means]]:''' Unlike ''k''-means, which assigns pixels to exactly one cluster, fuzzy ''c''-means allows each pixel to belong to multiple clusters with varying degrees of membership. : ; [[Evolutionary algorithms]] :Clustering may be used to identify different niches within the population of an evolutionary algorithm so that reproductive opportunity can be distributed more evenly amongst the evolving species or subspecies. ; [[Recommender systems]] : Recommender systems suggest items, products, or other users to an individual based on their past behavior and current preferences. These systems will occasionally use clustering algorithms to predict a user's unknown preferences by analyzing the preferences and activities of other users within the same cluster. Cluster analysis is not the only approach for recommendation systems, for example there are systems that leverage graph theory. Recommendation algorithms that utilize cluster analysis often fall into one of the three main categories: Collaborative filtering, Content-Based filtering, and a hybrid of the collaborative and content-based. <br> :'''Collaborative Filtering Recommendation Algorithm''' : Collaborative filtering works by analyzing large amounts of data on user behavior, preferences, and activities to predict what a user might like based on similarities with others. It detects patterns in how users rate items and groups similar users or items into distinct “neighborhoods.” Recommendations are then generated by leveraging the ratings of content from others within the same neighborhood. The algorithm can focus on either user-based or item-based grouping depending on the context.<ref name="ReviewCluster">{{cite arXiv |eprint=2109.12839 |last1=Beregovskaya |first1=Irina |last2=Koroteev |first2=Mikhail |title=Review of Clustering-Based Recommender Systems |date=2021 |class=cs.IR }}</ref> : [[File:FlowDiagram for Recommendation Systems.png|thumb|Flow diagram that shows a basic and generic approach to recommendation systems and how they utilize clustering.]] <br> :'''Content-Based Filtering Recommendation Algorithm''' : Content-based filtering uses item descriptions and a user's preference profile to recommend items with similar characteristics to those the user previously liked. It evaluates the distance between feature vectors of item clusters, or “neighborhoods.” The user's past interactions are represented as a weighted feature vector, which is compared to these clusters. Recommendations are generated by identifying the cluster evaluated be the closest in distance with the user's preferences.<ref name="ReviewCluster" /> : <br> :'''Hybrid Recommendation Algorithms''' : Hybrid recommendation algorithms combine collaborative and content-based filtering to better meet the requirements of specific use cases. In certain cases this approach leads to more effective recommendations. Common strategies include: (1) running collaborative and content-based filtering separately and combining the results, (2) adding onto one approach with specific features of the other, and (3) integrating both hybrid methods into one model.<ref name="ReviewCluster" /> ; [[Markov chain Monte Carlo|Markov chain Monte Carlo methods]] :Clustering is often utilized to locate and characterize extrema in the target distribution. ; [[Anomaly detection]] :Anomalies/outliers are typically – be it explicitly or implicitly – defined with respect to clustering structure in data. ; [[Natural language processing]] :Clustering can be used to resolve [[lexical ambiguity]].<ref name="mitpressjournals.org"/> ; [[DevOps]] :Clustering has been used to analyse the effectiveness of DevOps teams.<ref name="stateofdevopsreport">{{cite report |url=https://services.google.com/fh/files/misc/2022_state_of_devops_report.pdf|title=2022 Accelerate State of DevOps Report|publisher=Google Cloud's DevOps Research and Assessment (DORA)|date=29 September 2022|pages=8, 14, 74 }}</ref> ===Social science=== ; [[Sequence analysis in social sciences]] :Cluster analysis is used to identify patterns of family life trajectories, professional careers, and daily or weekly time use for example. ; [[Crime analysis]] :Cluster analysis can be used to identify areas where there are greater incidences of particular types of crime. By identifying these distinct areas or "hot spots" where a similar crime has happened over a period of time, it is possible to manage law enforcement resources more effectively. ; [[Educational data mining]] :Cluster analysis is for example used to identify groups of schools or students with similar properties. ; Typologies :From poll data, projects such as those undertaken by the Pew Research Center use cluster analysis to discern typologies of opinions, habits, and demographics that may be useful in politics and marketing. ===Others=== ; Field robotics :Clustering algorithms are used for robotic situational awareness to track objects and detect outliers in sensor data.<ref>{{cite journal | last1 = Bewley | first1 = A. | title = Real-time volume estimation of a dragline payload | journal = IEEE International Conference on Robotics and Automation | volume = 2011 | pages = 1571–1576 |display-authors=etal}}</ref> ; [[Mathematical chemistry]] :To find structural similarity, etc., for example, 3000 chemical compounds were clustered in the space of 90 [[topological index|topological indices]].<ref>{{cite journal | last1 = Basak | first1 = S.C. | last2 = Magnuson | first2 = V.R. | last3 = Niemi | first3 = C.J. | last4 = Regal | first4 = R.R. | title = Determining Structural Similarity of Chemicals Using Graph Theoretic Indices | journal = Discr. Appl. Math. |volume=19 | issue = 1–3 | year= 1988 | pages = 17–44 | doi=10.1016/0166-218x(88)90004-2| doi-access = free }}</ref> ; [[Climatology]] :To find weather regimes or preferred sea level pressure atmospheric patterns.<ref>{{cite journal | last1 = Huth | first1 = R. | year = 2008 | title = Classifications of Atmospheric Circulation Patterns: Recent Advances and Applications | journal = Ann. N.Y. Acad. Sci. | volume = 1146 | issue = 1 | pages = 105–152 |display-authors=etal| bibcode = 2008NYASA1146..105H | doi = 10.1196/annals.1446.019 | pmid = 19076414 | s2cid = 22655306 | url = https://opus.bibliothek.uni-augsburg.de/opus4/files/40082/40082.pdf }}</ref> ; Finance :Cluster analysis has been used to cluster stocks into sectors.<ref>{{Cite journal|last=Arnott|first=Robert D.|date=1980-11-01|title=Cluster Analysis and Stock Price Comovement|journal=Financial Analysts Journal|volume=36|issue=6|pages=56–62|doi=10.2469/faj.v36.n6.56|issn=0015-198X}}</ref> ;Petroleum geology :Cluster analysis is used to reconstruct missing bottom hole core data or missing log curves in order to evaluate reservoir properties. ; Geochemistry :The clustering of chemical properties in different sample locations.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)