Editing Unsupervised learning (section)

== Probabilistic methods ==
Two of the main methods used in unsupervised learning are [[Principal component analysis|principal component]] and [[cluster analysis]]. [[Cluster analysis]] is used in unsupervised learning to group, or segment, datasets with shared attributes in order to extrapolate algorithmic relationships.<ref name="tds-ul" /> Cluster analysis is a branch of [[machine learning]] that groups the data that has not been [[Labeled data|labelled]], classified or categorized. Instead of responding to feedback, cluster analysis identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data. This approach helps detect anomalous data points that do not fit into either group.

A central application of unsupervised learning is in the field of [[density estimation]] in [[statistics]],<ref name="JordanBishop2004" /> though unsupervised learning encompasses many other domains involving summarizing and explaining data features. It can be contrasted with supervised learning by saying that whereas supervised learning intends to infer a [[conditional probability distribution]]  conditioned on the label  of input data; unsupervised learning intends to infer an [[a priori probability]] distribution .

=== Approaches ===
Some of the most common algorithms used in unsupervised learning include: (1) Clustering, (2) Anomaly detection, (3) Approaches for learning latent variable models. Each approach uses several methods as follows:

* [[Data clustering|Clustering]] methods include: [[hierarchical clustering]],<ref name="Hastie" /> [[k-means]],<ref name="tds-kmeans" /> [[mixture models]], [[model-based clustering]], [[DBSCAN]], and [[OPTICS algorithm]]
* [[Anomaly detection]] methods include: [[Local Outlier Factor]], and [[Isolation Forest]]
* Approaches for learning [[latent variable model]]s such as [[Expectation–maximization algorithm]] (EM), [[Method of moments (statistics)|Method of moments]], and [[Blind signal separation]] techniques ([[Principal component analysis]], [[Independent component analysis]], [[Non-negative matrix factorization]], [[Singular value decomposition]])

=== Method of moments ===
One of the statistical approaches for unsupervised learning is the [[Method of moments (statistics)|method of moments]]. In the method of moments, the unknown parameters (of interest) in the model are related to the moments of one or more random variables, and thus, these unknown parameters can be estimated given the moments. The moments are usually estimated from samples empirically. The basic moments are first and second order moments. For a random vector, the first order moment is the [[mean]] vector, and the second order moment is the [[covariance matrix]] (when the mean is zero). Higher order moments are usually represented using [[tensors]] which are the generalization of matrices to higher orders as multi-dimensional arrays.

In particular, the method of moments is shown to be effective in learning the parameters of [[latent variable model]]s. Latent variable models are statistical models where in addition to the observed variables, a set of latent variables also exists which is not observed. A highly practical example of latent variable models in machine learning is the [[topic modeling]] which is a statistical model for generating the words (observed variables) in the document based on the topic (latent variable) of the document. In the topic modeling, the words in the document are generated according to different statistical parameters when the topic of the document is changed. It is shown that method of moments (tensor decomposition techniques) consistently recover the parameters of a large class of latent variable models under some assumptions.<ref name="TensorLVMs" />

The [[Expectation–maximization algorithm]] (EM) is also one of the most practical methods for learning latent variable models. However, it can get stuck in local optima, and it is not guaranteed that the algorithm will converge to the true unknown parameters of the model. In contrast, for the method of moments, the global convergence is guaranteed under some conditions.