Editing Mixture model (section)

==Examples==

===A financial model===
[[File:Normal Distribution PDF.png|thumb|250x250px|The [[normal distribution]] plotted with different means and variances]]
Financial returns often behave differently in normal situations and during crisis times. A mixture model<ref>Dinov, ID. "[http://repositories.cdlib.org/socr/EM_MM/ Expectation Maximization and Mixture Modeling Tutorial]". ''[http://repositories.cdlib.org/escholarship California Digital Library]'', Statistics Online Computational Resource, Paper EM_MM, http://repositories.cdlib.org/socr/EM_MM, December 9, 2008</ref> for return data seems reasonable. Sometimes the model used is a [[jump-diffusion model]], or as a mixture of two normal distributions. See {{slink|Financial economics#Challenges and criticism}} and {{slink|Financial risk management#Banking}} for further context.

===House prices===
Assume that we observe the prices of ''N'' different houses.  Different types of houses in different neighborhoods will have vastly different prices, but the price of a particular type of house in a particular neighborhood (e.g., three-bedroom house in moderately upscale neighborhood) will tend to cluster fairly closely around the mean.  One possible model of such prices would be to assume that the prices are accurately described by a mixture model with ''K'' different components, each distributed as a [[normal distribution]] with unknown mean and variance, with each component specifying a particular combination of house type/neighborhood.  Fitting this model to observed prices, e.g., using the [[expectation-maximization algorithm]], would tend to cluster the prices according to house type/neighborhood and reveal the spread of prices in each type/neighborhood. (Note that for values such as prices or incomes that are guaranteed to be positive and which tend to grow [[exponential growth|exponentially]], a [[log-normal distribution]] might actually be a better model than a normal distribution.)

===Topics in a document===
Assume that a document is composed of ''N'' different words from a total vocabulary of size ''V'', where each word corresponds to one of ''K'' possible topics.  The distribution of such words could be modelled as a mixture of ''K'' different ''V''-dimensional [[categorical distribution]]s.  A model of this sort is commonly termed a [[topic model]].  Note that [[expectation maximization]] applied to such a model will typically fail to produce realistic results, due (among other things) to the [[overfitting|excessive number of parameters]].  Some sorts of additional assumptions are typically necessary to get good results.  Typically two sorts of additional components are added to the model:
#A [[prior distribution]] is placed over the parameters describing the topic distributions, using a [[Dirichlet distribution]] with a [[concentration parameter]] that is set significantly below 1, so as to encourage sparse distributions (where only a small number of words have significantly non-zero probabilities).
#Some sort of additional constraint is placed over the topic identities of words, to take advantage of natural clustering.
#*For example, a [[Markov chain]] could be placed on the topic identities (i.e., the latent variables specifying the mixture component of each observation), corresponding to the fact that nearby words belong to similar topics. (This results in a [[hidden Markov model]], specifically one where a [[prior distribution]] is placed over state transitions that favors transitions that stay in the same state.)
#*Another possibility is the [[latent Dirichlet allocation]] model, which divides up the words into ''D'' different documents and assumes that in each document only a small number of topics occur with any frequency.

===Handwriting recognition===
The following example is based on an example in [[Christopher M. Bishop]], ''Pattern Recognition and Machine Learning''.<ref>{{cite book | last = Bishop | first = Christopher | title = Pattern recognition and machine learning | publisher = Springer | location = New York | year = 2006 | isbn = 978-0-387-31073-2 }}</ref>

Imagine that we are given an ''N''×''N'' black-and-white image that is known to be a scan of a hand-written digit between 0 and 9, but we don't know which digit is written.  We can create a mixture model with <math>K=10</math> different components, where each component is a vector of size <math>N^2</math> of [[Bernoulli distribution]]s (one per pixel).  Such a model can be trained with the [[expectation-maximization algorithm]] on an unlabeled set of hand-written digits, and will effectively cluster the images according to the digit being written.  The same model could then be used to recognize the digit of another image simply by holding the parameters constant, computing the probability of the new image for each possible digit (a trivial calculation), and returning the digit that generated the highest probability.

===Assessing projectile accuracy (a.k.a. circular error probable, CEP)===
Mixture models apply in the problem of directing multiple projectiles at a target (as in air, land, or sea defense applications), where the physical and/or statistical characteristics of the projectiles differ within the multiple projectiles. An example might be shots from multiple munitions types or shots from multiple locations directed at one target. The combination of projectile types may be characterized as a Gaussian mixture model.<ref>Spall, J. C. and Maryak, J. L. (1992). "A feasible Bayesian estimator of quantiles for projectile accuracy from non-i.i.d. data." ''Journal of the American Statistical Association'', vol. 87 (419), pp. 676–681. {{JSTOR|2290205}}</ref> Further, a well-known measure of accuracy for a group of projectiles is the [[circular error probable]] (CEP), which is the number ''R'' such that, on average, half of the group of projectiles falls within the circle of radius ''R'' about the target point. The mixture model can be used to determine (or estimate) the value ''R''. The mixture model properly captures the different types of projectiles.

===Direct and indirect applications===
The financial example above is one direct application of the mixture model, a situation in which we assume an underlying mechanism so that each observation belongs to one of some number of different sources or categories. This underlying mechanism may or may not, however, be observable. In this form of mixture, each of the sources is described by a component probability density function, and its mixture weight is the probability that an observation comes from this component.

In an indirect application of the mixture model we do not assume such a mechanism. The mixture model is simply used for its mathematical flexibilities. For example, a mixture of two [[normal distribution]]s with different means may result in a density with two [[Mode (statistics)|modes]], which is not modeled by standard parametric distributions. Another example is given by the possibility of mixture distributions to model fatter tails than the basic Gaussian ones, so as to be a candidate for modeling more extreme events. 

===Predictive Maintenance===
The mixture model-based clustering is also predominantly used in identifying the state of the machine in [[predictive maintenance]]. Density plots are used to analyze the density of high dimensional features. If multi-model densities are observed, then it is assumed that a finite set of densities are formed by a finite set of normal mixtures. A multivariate Gaussian mixture model is used to cluster the feature data into k number of groups where k represents each state of the machine. The machine state can be a normal state, power off state, or faulty state.<ref>{{Cite book|url=https://www.researchgate.net/publication/322900854|title=Fault Class Prediction in Unsupervised Learning using Model-Based Clustering Approach|last1=Amruthnath|first1=Nagdev|last2=Gupta|first2=Tarun|date=2018-02-02|doi=10.13140/rg.2.2.22085.14563|publisher=Unpublished}}</ref> Each formed cluster can be diagnosed using techniques such as spectral analysis. In the recent years, this has also been widely used in other areas such as early fault detection.<ref>{{Cite book|url=https://www.researchgate.net/publication/322869981|title=A Research Study on Unsupervised Machine Learning Algorithms for Fault Detection in Predictive Maintenance|last1=Amruthnath|first1=Nagdev|last2=Gupta|first2=Tarun|date=2018-02-01|doi=10.13140/rg.2.2.28822.24648|publisher=Unpublished}}</ref>

===Fuzzy image segmentation===
[[File:Movie.gif|thumb|An example of Gaussian Mixture in image segmentation with grey histogram]]
In image processing and computer vision, traditional [[image segmentation]] models often assign to one [[pixel]] only one exclusive pattern. In fuzzy or soft segmentation, any pattern can have certain "ownership" over any single pixel. If the patterns are Gaussian, fuzzy segmentation naturally results in Gaussian mixtures. Combined with other analytic or geometric tools (e.g., phase transitions over diffusive boundaries), such spatially regularized mixture models could lead to more realistic and computationally efficient segmentation methods.<ref>
{{cite journal 
| last = Shen | first = Jianhong (Jackie) 
| title = A stochastic-variational model for soft Mumford-Shah segmentation
| date = 2006 
| volume=2006
| pages=2–16
| journal=International Journal of Biomedical Imaging
| doi=10.1155/IJBI/2006/92329
| pmid = 23165059 
| pmc = 2324060 
| bibcode = 2006IJBI.200649515H 
| doi-access = free 
}}</ref>

===Point set registration===
Probabilistic mixture models such as [[Gaussian mixture model]]s (GMM) are used to resolve [[point set registration]] problems in image processing and computer vision fields. For pair-wise [[point set registration]], one point set is regarded as the centroids of mixture models, and the other point set is regarded as data points (observations). State-of-the-art methods are e.g. [[Point_set_registration#Point_set_registration_algorithms#Coherent point drift|coherent point drift]] (CPD)<ref>
{{cite journal 
| last1 = Myronenko | first1 = Andriy
| last2 = Song | first2 = Xubo
| title = Point set registration: Coherent point drift
| number=12
| volume=32
| year=2010
| pages=2262–2275
| journal=IEEE Trans. Pattern Anal. Mach. Intell.
| doi=10.1109/TPAMI.2010.46
| pmid = 20975122
| arxiv = 0905.2635
| s2cid = 10809031
}}</ref> 
and [[Student's t-distribution]] mixture models (TMM).<ref>
{{cite journal 
| last1 = Ravikumar| first1 = Nishant
| last2 = Gooya| first2 = Ali
| last3 = Cimen| first3 = Serkan
| last4 = Frangi| first4 = Alexjandro
| last5 = Taylor| first5 = Zeike
| title = Group-wise similarity registration of point sets using Student's t-mixture model for statistical shape models
| volume=44
| year=2018
| pages=156–176
| journal=Med. Image Anal.
| doi=10.1016/j.media.2017.11.012
| pmid = 29248842
| doi-access = free
}}</ref> 
The result of recent research demonstrate the superiority of hybrid mixture models<ref>
{{cite conference 
| url = https://www.miccai2018.org/en/
| title = Intraoperative brain shift compensation using a hybrid mixture model
| last1 = Bayer| first1 = Siming
| last2 = Ravikumar| first2 = Nishant
| last3 = Strumia| first3 = Maddalena
| last4 = Tong| first4 = Xiaoguang
| last5 = Gao| first5 = Ying
| last6 = Ostermeier| first6 = Martin
| last7 = Fahrig| first7 = Rebecca
| last8 = Maier| first8 = Andreas
| date = 2018
| publisher = Springer, Cham
| book-title = Medical Image Computing and Computer Assisted Intervention – MICCAI 2018
| pages = 116–124
| location = Granada, Spain
| doi = 10.1007/978-3-030-00937-3_14
}}
</ref> 
(e.g. combining Student's t-distribution and Watson distribution/[[Bingham distribution]] to model spatial positions and axes orientations separately) compare to CPD and TMM, in terms of inherent robustness, accuracy and discriminative capacity.