Editing Mixture distribution (section)

== Properties ==

=== Convexity ===
A general [[linear combination]] of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, a [[convex combination]] of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.

=== Moments ===
Let {{math|''X''<sub>1</sub>}}, ..., {{math|''X''<sub>''n''</sub>}} denote random variables from the {{mvar|n}} component distributions, and let {{mvar|X}} denote a random variable from the mixture distribution. Then, for any function {{math|''H''(·)}} for which <math>\operatorname{E}[H(X_i)]</math> exists, and assuming that the component densities {{math|''p<sub>i</sub>''(''x'')}} exist,

<math display="block">\begin{align}
\operatorname{E}[H(X)] & = \int_{-\infty}^\infty H(x) \sum_{i = 1}^n w_i p_i(x) \, dx \\
& = \sum_{i = 1}^n w_i \int_{-\infty}^\infty p_i(x) H(x) \, dx = \sum_{i = 1}^n w_i \operatorname{E}[H(X_i)].
\end{align}</math>

The {{mvar|j}}th moment about zero (i.e. choosing {{math|1=''H''(''x'') = ''x''{{i sup|''j''}}}}) is simply a weighted average of the {{mvar|j}}-th moments of the components.  Moments about the mean {{math|1=''H''(''x'') = (''x − μ''){{i sup|''j''}}}} involve a binomial expansion:<ref>{{harvtxt|Frühwirth-Schnatter|2006|at=Ch.1.2.4}}</ref>

<math display="block">\begin{align}
\operatorname{E}\left[{\left(X - \mu\right)}^j\right]
& = \sum_{i=1}^n w_i \operatorname{E}\left[{\left(X_i - \mu_i + \mu_i - \mu\right)}^j\right] \\
& = \sum_{i=1}^n w_i \sum_{k=0}^j \binom{j}{k} {\left(\mu_i - \mu\right)}^{j-k} \operatorname{E}\left[{\left(X_i - \mu_i\right)}^k\right],
\end{align}</math>

where {{math|''μ<sub>i</sub>''}} denotes the mean of the {{mvar|i}}-th component.

In the case of a mixture of one-dimensional distributions with weights {{math|''w<sub>i</sub>''}}, means {{math|''μ<sub>i</sub>''}} and variances {{math|''σ''<sub>''i''</sub><sup>2</sup>}}, the total mean and variance will be:
<math display="block"> \operatorname{E}[X] = \mu = \sum_{i = 1}^n w_i \mu_i ,</math><math display="block"> \begin{align}
\operatorname{E}\left[(X - \mu)^2\right] & = \sigma^2 \\
& = \operatorname{E}[X^2] - \mu^2 & (\text{standard variance reformulation})\\
& = \left(\sum_{i=1}^n w_i \operatorname{E}\left[X_i^2\right]\right) - \mu^{2} \\
& = \sum_{i=1}^n w_i(\sigma_i^2 + \mu_i^2)- \mu^2 & ( \sigma_i^2 = \operatorname{E}[X_i^2] - \mu_i^2 \implies \operatorname{E}[X_i^2] = \sigma_i^2 + \mu_i^2) 
\end{align}</math>

These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such as [[skewness]] and [[kurtosis]] ([[fat tail]]s) and multi-modality, even in the absence of such features within the components themselves.  Marron and Wand (1992) give an illustrative account of the flexibility of this framework.<ref name="Marron92">{{Cite journal|title=Exact Mean Integrated Squared Error |first1=J. S. |last1=Marron |first2=M. P. | last2=Wand | journal=[[The Annals of Statistics]]|volume=20 |year=1992| pages=712–736 |issue=2 | doi=10.1214/aos/1176348653|doi-access=free }}, http://projecteuclid.org/euclid.aos/1176348653</ref>

===Modes===

The question of [[Multimodal distribution|multimodality]] is simple for some cases, such as mixtures of [[exponential distribution]]s: all such mixtures are [[Unimodality|unimodal]].<ref>Frühwirth-Schnatter (2006, Ch.1)</ref> However, for the case of mixtures of [[normal distribution]]s, it is a complex one. Conditions for the number of modes in a multivariate normal mixture are explored by Ray & Lindsay<ref name="RayLindsay">{{citation | title = The topography of multivariate normal mixtures | last1 = Ray | first1 = R. | last2 = Lindsay |first2= B. | year = 2005 | journal = The Annals of Statistics | volume = 33 | number = 5 | pages = 2042–2065 | doi = 10.1214/009053605000000417 | arxiv = math/0602238}}</ref> extending earlier work on univariate<ref name=Robertson1969>Robertson CA, Fryer JG (1969) Some descriptive properties of normal mixtures. Skand Aktuarietidskr 137–146</ref><ref name=Behboodian1970>{{cite journal | last1 = Behboodian | first1 = J | year = 1970 | title = On the modes of a mixture of two normal distributions | journal = Technometrics | volume = 12 | pages = 131–139 | doi=10.2307/1267357| jstor = 1267357 }}</ref> and multivariate<ref>{{cite book | last1 = Carreira-Perpiñán | first1 = M Á | last2 = Williams | first2 = C | year = 2003 | title = On the modes of a Gaussian mixture | series = Published as: Lecture Notes in Computer Science 2695 | publisher = [[Springer-Verlag]] | pages = 625–640 | doi=10.1007/3-540-44935-3_44 | issn = 0302-9743 | url = http://faculty2.ucmerced.edu/mcarreira-perpinan/papers/EDI-INF-RR-0159.pdf}}</ref> distributions.

Here the problem of evaluation of the modes of an {{mvar|n}} component mixture in a {{mvar|D}} dimensional space is reduced to identification of critical points (local minima, maxima and [[saddle point]]s) on a [[manifold]] referred to as the [[Ridge (differential geometry)|ridgeline surface]], which is the image of the ridgeline function
<math display="block"> x^{*}(\alpha) = \left[ \sum_{i=1}^{n} \alpha_i \Sigma_i^{-1} \right]^{-1} \times \left[  \sum_{i=1}^{n}  \alpha_i \Sigma_i^{-1} \mu_i \right],
</math>
where <math>\alpha</math> belongs to the <math>(n-1)</math>-dimensional standard [[simplex]]:
<math display="block"> \mathcal{S}_n  = 
 \left\{ \alpha \in \mathbb{R}^n: \alpha_i \in [0,1], \sum_{i=1}^n \alpha_i = 1 \right\}
</math>
and <math>\Sigma_i \in \Reals^{D\times D},\, \mu_i \in \Reals^D</math> correspond to the covariance and mean of the {{mvar|i}}-th component.  Ray & Lindsay<ref name="RayLindsay" /> consider the case in which <math>n-1 < D</math> showing a one-to-one correspondence of modes of the mixture and those on the '''ridge elevation function''' <math>h(\alpha) = q(x^*(\alpha))</math> thus one may identify the modes by solving <math> \frac{d h(\alpha)}{d \alpha} = 0 </math>  with respect to <math>\alpha</math> and determining the value <math>x^*(\alpha)</math>.

Using graphical tools, the potential multi-modality of mixtures with number of components <math>n \in \{2,3\}</math> is demonstrated; in particular it is shown that the number of modes may exceed <math>n</math> and that the modes may not be coincident with the component means. For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to the first mixing weight <math>w_1</math> (which also determines the second mixing weight through <math>w_2 = 1-w_1</math>) and expressing the solutions as a function <math>\Pi(\alpha), \,\alpha \in [0,1]</math> so that the number and location of modes for a given value of <math>w_1</math> corresponds to the number of intersections of the graph on the line <math>\Pi(\alpha) = w_1</math>. This in turn can be related to the number of oscillations of the graph and therefore to solutions of <math> \frac{d \Pi(\alpha)}{d \alpha} = 0 </math> leading to an explicit solution for the case of a two component mixture with <math>\Sigma_1 = \Sigma_2 = \Sigma </math> (sometimes called a [[homoscedastic]] mixture) given by
<math display="block">  1 - \alpha(1-\alpha) d_M(\mu_1, \mu_2, \Sigma)^2 </math>
where <math display="inline"> d_M(\mu_1,\mu_2,\Sigma) = \sqrt{(\mu_2-\mu_1)^\mathsf{T} \Sigma^{-1} (\mu_2-\mu_1)} </math> 
is the [[Mahalanobis distance]] between <math>\mu_1</math> and <math>\mu_2</math>.

Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights.

For normal mixtures with general <math>n>2</math> and <math>D>1</math>, a lower bound for the maximum number of possible modes, and{{snd}}conditionally on the assumption that the maximum number is finite{{snd}}an upper bound are known. For those combinations of <math>n</math> and <math>D</math> for which the maximum number is known, it matches the lower bound.<ref>{{citation 
| title = Maximum number of modes of Gaussian mixtures 
| last1 = Améndola | first1 = C.
| last2 = Engström | first2 = A.
| last3 = Haase | first3 = C.
| year = 2020
| journal = Information and Inference: A Journal of the IMA
| volume  = 9
| number  = 3
| pages   = 587–600
 | doi = 10.1093/imaiai/iaz013 
| arxiv = 1702.05066}}</ref>