Editing Normal distribution (section)

== Related distributions ==

=== Central limit theorem ===
[[File:De moivre-laplace.gif|right|thumb|250px|As the number of discrete events increases, the function begins to resemble a normal distribution.]]
[[File:Dice sum central limit theorem.svg|thumb|250px|Comparison of probability density functions, {{math|''p''(''k'')}} for the sum of {{mvar|n}} fair 6-sided dice to show their convergence to a normal distribution with increasing {{mvar|na}}, in accordance to the central limit theorem. In the bottom-right graph, smoothed profiles of the previous graphs are rescaled, superimposed and compared with a normal distribution (black curve).]]
{{Main|Central limit theorem}}

The central limit theorem states that under certain (fairly common) conditions, the sum of many random variables will have an approximately normal distribution. More specifically, where <math display=inline>X_1,\ldots ,X_n</math> are [[independent and identically distributed]] random variables with the same arbitrary distribution, zero mean, and variance <math display=inline>\sigma^2</math> and {{tmath|Z}} is their
mean scaled by <math display=inline>\sqrt{n}</math>
<math display=block>Z = \sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n X_i\right)</math>
Then, as {{tmath|n}} increases, the probability distribution of {{tmath|Z}} will tend to the normal distribution with zero mean and variance {{tmath|\sigma^2}}.

The theorem can be extended to variables <math display=inline>(X_i)</math> that are not independent and/or not identically distributed if certain constraints are placed on the degree of dependence and the moments of the distributions.

Many [[test statistic]]s, [[score (statistics)|scores]], and [[estimator]]s encountered in practice contain sums of certain random variables in them, and even more estimators can be represented as sums of random variables through the use of [[influence function (statistics)|influence functions]]. The central limit theorem implies that those statistical parameters will have asymptotically normal distributions.

The central limit theorem also implies that certain distributions can be approximated by the normal distribution, for example:
* The [[binomial distribution]] <math display=inline>B(n,p)</math> is [[De Moivre–Laplace theorem|approximately normal]] with mean <math display=inline>np</math> and variance <math display=inline>np(1-p)</math> for large {{tmath|n}} and for {{tmath|p}} not too close to 0 or 1.
* The [[Poisson distribution]] with parameter {{tmath|\lambda}} is approximately normal with mean {{tmath|\lambda}} and variance {{tmath|\lambda}}, for large values of {{tmath|\lambda}}.<ref>{{cite web|url=http://www.stat.ucla.edu/~dinov/courses_students.dir/Applets.dir/NormalApprox2PoissonApplet.html|title=Normal Approximation to Poisson Distribution|website=Stat.ucla.edu|access-date=2017-03-03}}</ref>
* The [[chi-squared distribution]] <math display=inline>\chi^2(k)</math> is approximately normal with mean {{tmath|k}} and variance <math display=inline>2k</math>, for large {{tmath|k}}.
* The [[Student's t-distribution]] <math display=inline>t(\nu)</math> is approximately normal with mean 0 and variance 1 when {{tmath|\nu}} is large.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

A general upper bound for the approximation error in the central limit theorem is given by the [[Berry–Esseen theorem]], improvements of the approximation are given by the [[Edgeworth expansion]]s.

This theorem can also be used to justify modeling the sum of many uniform noise sources as [[Gaussian noise]]. See [[AWGN]].

=== Operations and functions of normal variables ===

[[File:Probabilities of functions of normal vectors.png|thumb|right|'''a:''' Probability density of a function {{math|cos ''x''{{sup|2}}}} of a normal variable {{mvar|x}} with {{math|1= ''μ'' = −2}} and {{math|1= ''σ'' = 3}}. '''b:''' Probability density of a function {{mvar|x{{sup|y}}}} of two normal variables {{mvar|x}} and {{mvar|y}}, where {{math|1= ''μ{{sub|x}}'' = 1}}, {{math|1= ''μ{{sub|y}}'' = 2}}, {{math|1= ''σ{{sub|x}}'' = 0.1}}, {{math|1= ''σ{{sub|y}}'' = 0.2}}, and {{math|1= ''ρ{{sub|xy}}'' = 0.8}}. '''c:''' Heat map of the joint probability density of two functions of two correlated normal variables {{mvar|x}} and {{mvar|y}}, where {{math|1= ''μ{{sub|x}}'' = −2}}, {{math|1= ''μ{{sub|x}}'' = 5}}, {{math|1= {{subsup|σ|s=0|''x''|2}} = 10}}, {{math|1= {{subsup|σ|s=0|''y''|2}} = 20}}, and {{math|1= ''ρ{{sub|xy}}'' = 0.495}}. '''d:''' Probability density of a function {{math|{{abs|''x''{{sub|1}}}} + ... + {{abs|''x''{{sub|4}}}}}} of four [[iid]] standard normal variables. These are computed by the numerical method of ray-tracing.<ref name="Das-2021" />]]

The [[probability density]], [[cumulative distribution function|cumulative distribution]], and [[inverse cumulative distribution function|inverse cumulative distribution]] of any function of one or more independent or correlated normal variables can be computed with the numerical method of ray-tracing<ref name="Das-2021">{{cite journal | last=Das|first=Abhranil| arxiv=2012.14331| title=A method to integrate and classify normal distributions|journal=Journal of Vision |date=2021|volume=21 |issue=10 |page=1 |doi=10.1167/jov.21.10.1 |pmid=34468706 |pmc=8419883 }}</ref> ([https://www.mathworks.com/matlabcentral/fileexchange/84973-integrate-and-classify-normal-distributions Matlab code]). In the following sections we look at some special cases.

==== Operations on a single normal variable ====
If {{tmath|X}} is distributed normally with mean {{tmath|\mu}} and variance <math display=inline>\sigma^2</math>, then
* <math display=inline>aX+b</math>, for any real numbers {{tmath|a}} and {{tmath|b}}, is also normally distributed, with mean <math display=inline>a\mu+b</math> and variance <math display=inline>a^2\sigma^2</math>. That is, the family of normal distributions is closed under [[linear transformations]].
* The exponential of {{tmath|X}} is distributed [[Log-normal distribution|log-normally]]: <math display=inline>e^X \sim \ln(N(\mu, \sigma^2))</math>.
* The standard [[logistic function|sigmoid]] of {{tmath|X}} is [[Logit-normal distribution|logit-normally distributed]]: <math display=inline>\sigma(X) \sim P( \mathcal{N}(\mu,\,\sigma^2) )</math>.
* The absolute value of {{tmath|X}} has [[folded normal distribution]]: <math display=inline>{{\left| X \right| \sim N_f(\mu, \sigma^2)}}</math>. If <math display=inline>\mu = 0</math> this is known as the [[half-normal distribution]].
* The absolute value of normalized residuals, <math display=inline>|X - \mu| / \sigma</math>, has [[chi distribution]] with one degree of freedom: <math display=inline>|X - \mu| / \sigma \sim \chi_1</math>.
* The square of <math display=inline>X/\sigma</math> has the [[noncentral chi-squared distribution]] with one degree of freedom: <math display=inline>X^2 / \sigma^2 \sim \chi_1^2(\mu^2 / \sigma^2)</math>. If <math display=inline>\mu = 0</math>, the distribution is called simply [[chi-squared distribution|chi-squared]].
* The log-likelihood of a normal variable {{tmath|x}} is simply the log of its [[probability density function]]: <math display=block>\ln p(x)= -\frac{1}{2} \left(\frac{x-\mu}{\sigma} \right)^2 -\ln \left(\sigma \sqrt{2\pi} \right).</math> Since this is a scaled and shifted square of a standard normal variable, it is distributed as a scaled and shifted [[chi-squared distribution|chi-squared]] variable.
* The distribution of the variable {{tmath|X}} restricted to an interval <math display=inline>[a, b]</math> is called the [[truncated normal distribution]].
* <math display=inline>(X - \mu)^{-2}</math> has a [[Lévy distribution]] with location 0 and scale <math display=inline>\sigma^{-2}</math>.

===== Operations on two independent normal variables =====
* If <math display=inline>X_1</math> and <math display=inline>X_2</math> are two [[independence (probability theory)|independent]] normal random variables, with means <math display=inline>\mu_1</math>, <math display=inline>\mu_2</math> and variances <math display=inline>\sigma_1^2</math>, <math display=inline>\sigma_2^2</math>, then their sum <math display=inline>X_1 + X_2</math> will also be normally distributed,<sup>[[sum of normally distributed random variables|[proof]]]</sup> with mean <math display=inline>\mu_1 + \mu_2</math> and variance <math display=inline>\sigma_1^2 + \sigma_2^2</math>.
* In particular, if {{tmath|X}} and {{tmath|Y}} are independent normal deviates with zero mean and variance <math display=inline>\sigma^2</math>, then <math display=inline>X + Y</math> and <math display=inline>X - Y</math> are also independent and normally distributed, with zero mean and variance <math display=inline>2\sigma^2</math>. This is a special case of the [[polarization identity]].<ref>{{harvtxt |Bryc |1995 |p=27 }}</ref>
* If <math display=inline>X_1</math>, <math display=inline>X_2</math> are two independent normal deviates with mean {{tmath|\mu}} and variance <math display=inline>\sigma^2</math>, and {{tmath|a}}, {{tmath|b}} are arbitrary real numbers, then the variable <math display=block>
    X_3 = \frac{aX_1 + bX_2 - (a+b)\mu}{\sqrt{a^2+b^2}} + \mu
</math> is also normally distributed with mean {{tmath|\mu}} and variance <math display=inline>\sigma^2</math>. It follows that the normal distribution is [[stable distribution|stable]] (with exponent <math display=inline>\alpha=2</math>).
* If <math display=inline>X_k \sim \mathcal N(m_k, \sigma_k^2)</math>, <math display=inline>k \in \{ 0, 1 \}</math> are normal distributions, then their normalized [[geometric mean]] <math display=inline>\frac{1}{\int_{\R^n} X_0^{\alpha}(x) X_1^{1 - \alpha}(x) \, \text{d}x} X_0^{\alpha} X_1^{1 - \alpha}</math> is a normal distribution <math display=inline>\mathcal N(m_{\alpha}, \sigma_{\alpha}^2)</math> with <math display=inline>m_{\alpha} = \frac{\alpha m_0 \sigma_1^2 + (1 - \alpha) m_1 \sigma_0^2}{\alpha \sigma_1^2 + (1 - \alpha) \sigma_0^2}</math> and <math display=inline>\sigma_{\alpha}^2 = \frac{\sigma_0^2 \sigma_1^2}{\alpha \sigma_1^2 + (1 - \alpha) \sigma_0^2}</math>.

===== Operations on two independent standard normal variables =====
If <math display=inline>X_1</math> and <math display=inline>X_2</math> are two independent standard normal random variables with mean 0 and variance 1, then
* Their sum and difference is distributed normally with mean zero and variance two: <math display=inline>X_1 \pm X_2 \sim \mathcal{N}(0, 2)</math>.
* Their product <math display=inline>Z = X_1 X_2</math> follows the [[product distribution#Independent central-normal distributions|product distribution]]<ref>{{cite web|url = http://mathworld.wolfram.com/NormalProductDistribution.html |title = Normal Product Distribution|work = MathWorld |publisher =wolfram.com| first = Eric W. |last = Weisstein}}</ref> with density function <math display=inline>f_Z(z) = \pi^{-1} K_0(|z|)</math> where <math display=inline>K_0</math> is the [[Macdonald function|modified Bessel function of the second kind]]. This distribution is symmetric around zero, unbounded at <math display=inline>z = 0</math>, and has the [[characteristic function (probability theory)|characteristic function]] <math display=inline>\phi_Z(t) = (1 + t^2)^{-1/2}</math>.
* Their ratio follows the standard [[Cauchy distribution]]: <math display=inline>X_1/ X_2 \sim \operatorname{Cauchy}(0, 1)</math>.
* Their Euclidean norm <math display=inline>\sqrt{X_1^2 + X_2^2}</math> has the [[Rayleigh distribution]].

==== Operations on multiple independent normal variables ====
* Any [[linear combination]] of independent normal deviates is a normal deviate.
* If <math display=inline>X_1, X_2, \ldots, X_n</math> are independent standard normal random variables, then the sum of their squares has the [[chi-squared distribution]] with {{tmath|n}} degrees of freedom <math display=block>X_1^2 + \cdots + X_n^2 \sim \chi_n^2.</math>
* If <math display=inline>X_1, X_2, \ldots, X_n</math> are independent normally distributed random variables with means {{tmath|\mu}} and variances <math display=inline>\sigma^2</math>, then their [[sample mean]] is independent from the sample [[standard deviation]],<ref>{{cite journal|title=A Characterization of the Normal Distribution |last=Lukacs |first=Eugene |journal=[[The Annals of Mathematical Statistics]] |issn=0003-4851 |volume=13|issue=1 |year=1942 |pages=91–3 |jstor=2236166 |doi=10.1214/aoms/1177731647 |doi-access=free}}</ref> which can be demonstrated using [[Basu's theorem]] or [[Cochran's theorem]].<ref>{{cite journal |title=On Some Characterizations of the Normal Distribution | last1=Basu|first1=D. |last2=Laha|first2=R. G.|journal=[[Sankhyā (journal)|Sankhyā]]|issn=0036-4452| volume=13|issue=4|year=1954|pages=359–62| jstor=25048183}}</ref> The ratio of these two quantities will have the [[Student's t-distribution]] with <math display=inline>n-1</math> degrees of freedom: <math display=block>t = \frac{\overline X - \mu}{S/\sqrt{n}} = \frac{\frac{1}{n}(X_1+\cdots+X_n) - \mu}{\sqrt{\frac{1}{n(n-1)}\left[(X_1-\overline X)^2 + \cdots+(X_n-\overline X)^2\right]}} \sim t_{n-1}.</math>
* If <math display=inline>X_1, X_2, \ldots, X_n</math>, <math display=inline>Y_1, Y_2, \ldots, Y_m</math> are independent standard normal random variables, then the ratio of their normalized sums of squares will have the [[F-distribution]] with {{math|(''n'', ''m'')}} degrees of freedom:<ref>{{cite book |title=Testing Statistical Hypotheses |edition=2nd | first=E. L. | last=Lehmann | publisher=Springer |year=1997 | isbn=978-0-387-94919-2| page=199}}</ref> <math display=block>F = \frac{\left(X_1^2+X_2^2+\cdots+X_n^2\right)/n}{\left(Y_1^2+Y_2^2+\cdots+Y_m^2\right)/m} \sim F_{n,m}.</math>

==== Operations on multiple correlated normal variables ====
* A [[quadratic form]] of a normal vector, i.e. a quadratic function <math display=inline>q = \sum x_i^2 + \sum x_j + c</math> of multiple independent or correlated normal variables, is a [[generalized chi-square distribution|generalized chi-square]] variable.

=== Operations on the density function ===
The [[split normal distribution]] is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The [[truncated normal distribution]] results from rescaling a section of a single density function.

=== Infinite divisibility and Cramér's theorem ===
For any positive integer {{mvar|n}}, any normal distribution with mean {{tmath|\mu}} and variance <math display=inline>\sigma^2</math> is the distribution of the sum of {{mvar|n}} independent normal deviates, each with mean <math display=inline>\frac{\mu}{n}</math> and variance <math display=inline>\frac{\sigma^2}{n}</math>. This property is called [[infinite divisibility (probability)|infinite divisibility]].<ref>{{harvtxt |Patel |Read |1996 |loc=[2.3.6] }}</ref>

Conversely, if <math display=inline>X_1</math> and <math display=inline>X_2</math> are independent random variables and their sum <math display=inline>X_1+X_2</math> has a normal distribution, then both <math display=inline>X_1</math> and <math display=inline>X_2</math> must be normal deviates.<ref>{{harvtxt |Galambos |Simonelli |2004 |loc=Theorem&nbsp;3.5 }}</ref>

This result is known as [[Cramér's decomposition theorem]], and is equivalent to saying that the [[convolution]] of two distributions is normal if and only if both are normal. Cramér's theorem implies that a linear combination of independent non-Gaussian variables will never have an exactly normal distribution, although it may approach it arbitrarily closely.<ref name="Bryc 1995 35">{{harvtxt |Bryc |1995 |p=35 }}</ref>

=== The Kac–Bernstein theorem ===
The [[Kac–Bernstein theorem]] states that if <math display="inline">X</math> and {{tmath|Y}} are independent and <math display=inline>X + Y</math> and <math display=inline>X - Y</math> are also independent, then both ''X'' and ''Y'' must necessarily have normal distributions.<ref name="Lukacs">{{harvtxt |Lukacs |King |1954 }}</ref><ref>{{cite journal| last1=Quine| first1=M.P. |year=1993|title=On three characterisations of the normal distribution |url=http://www.math.uni.wroc.pl/~pms/publicationsArticle.php?nr=14.2&nrA=8&ppB=257&ppE=263 |journal=Probability and Mathematical Statistics|volume=14 |issue=2 |pages=257–263}}</ref>

More generally, if <math display=inline>X_1, \ldots, X_n</math> are independent random variables, then two distinct linear combinations <math display=inline>\sum{a_kX_k}</math> and <math display=inline>\sum{b_kX_k}</math>will be independent if and only if all <math display=inline>X_k</math> are normal and <math display=inline>\sum{a_kb_k\sigma_k^2=0}</math>, where <math display=inline>\sigma_k^2</math> denotes the variance of <math display=inline>X_k</math>.<ref name="Lukacs" />

=== Extensions ===
The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is one-dimensional) case (Case 1). All these extensions are also called ''normal'' or ''Gaussian'' laws, so a certain ambiguity in names exists.
* The [[multivariate normal distribution]] describes the Gaussian law in the {{mvar|k}}-dimensional [[Euclidean space]]. A vector {{math|''X'' ∈ '''R'''<sup>''k''</sup>}} is multivariate-normally distributed if any linear combination of its components {{math|Σ{{su|p=''k''|b=''j''=1}}''a<sub>j</sub> X<sub>j</sub>''}} has a (univariate) normal distribution. The variance of {{mvar|X}} is a {{math|{{thinsp|''k''|×|''k''}}}} symmetric positive-definite matrix {{mvar|V}}. The multivariate normal distribution is a special case of the [[elliptical distribution]]s. As such, its iso-density loci in the {{math|1=''k'' = 2}} case are [[ellipse]]s and in the case of arbitrary {{mvar|k}} are [[ellipsoid]]s.
* [[Rectified Gaussian distribution]] a rectified version of normal distribution with all the negative elements reset to 0.
* [[Complex normal distribution]] deals with the complex normal vectors. A complex vector {{math|''X'' ∈ '''C'''<sup>''k''</sup>}} is said to be normal if both its real and imaginary components jointly possess a {{math|2''k''}}-dimensional multivariate normal distribution. The variance-covariance structure of {{mvar|X}} is described by two matrices: the ''{{dfn|variance}}'' matrix {{math|Γ}}, and the ''{{dfn|relation}}'' matrix {{mvar|C}}.
* [[Matrix normal distribution]] describes the case of normally distributed matrices.
* [[Gaussian process]]es are the normally distributed [[stochastic process]]es. These can be viewed as elements of some infinite-dimensional [[Hilbert space]] {{mvar|H}}, and thus are the analogues of multivariate normal vectors for the case {{math|''k'' {{=}} ∞}}. A random element {{math|''h'' ∈ ''H''}} is said to be normal if for any constant {{math|''a'' ∈ ''H''}} the [[scalar product]] {{math|(''a'', ''h'')}} has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear ''covariance operator'' {{math|''K'': ''H'' → ''H''}}. Several Gaussian processes became popular enough to have their own names:
** [[Wiener process|Brownian motion]];
** [[Brownian bridge]]; and
** [[Ornstein–Uhlenbeck process]].
* [[Gaussian q-distribution]] is an abstract mathematical construction that represents a [[q-analogue]] of the normal distribution.
* the [[q-Gaussian]] is an analogue of the Gaussian distribution, in the sense that it maximises the [[Tsallis entropy]], and is one type of [[Tsallis distribution]]. This distribution is different from the [[Gaussian q-distribution]] above.
* The [[Kaniadakis Gaussian distribution|Kaniadakis {{mvar|κ}}-Gaussian distribution]] is a generalization of the Gaussian distribution which arises from the [[Kaniadakis statistics]], being one of the [[Kaniadakis distribution]]s.

A random variable {{mvar|X}} has a two-piece normal distribution if it has a distribution

<math display=block>f_X( x ) = \begin{cases}
N( \mu, \sigma_1^2 ),& \text{ if } x \le \mu \\
N( \mu, \sigma_2^2 ),& \text{ if } x \ge \mu
\end{cases}</math>

where {{mvar|μ}} is the mean and {{math|{{subsup|''σ''|s=0|1|2}}}} and {{math|{{subsup|''σ''|s=0|2|2}}}} are the variances of the distribution to the left and right of the mean respectively.

The mean {{math|E(''X'')}}, variance {{math|V(''X'')}}, and third central moment {{math|T(''X'')}} of this distribution have been determined<ref name="John-1982">{{cite journal|last1=John|first1=S|year=1982|title=The three parameter two-piece normal family of distributions and its fitting|journal=Communications in Statistics – Theory and Methods|volume=11|issue=8|pages=879–885|doi=10.1080/03610928208828279}}</ref>

<math display=block>\begin{align}
\operatorname{E}( X ) &= \mu +  \sqrt{\frac 2 \pi } ( \sigma_2 - \sigma_1 ), \\
\operatorname{V}( X ) &= \left( 1 - \frac 2 \pi\right)( \sigma_2 - \sigma_1 )^2 + \sigma_1 \sigma_2, \\
\operatorname{T}( X ) &= \sqrt{ \frac 2 \pi}( \sigma_2 - \sigma_1 ) \left[ \left( \frac 4 \pi  - 1 \right) ( \sigma_2 - \sigma_1)^2 + \sigma_1 \sigma_2 \right].
\end{align}</math>

One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
* [[Pearson distribution]] — a four-parameter family of probability distributions that extend the normal law to include different skewness and kurtosis values.
* The [[generalized normal distribution]], also known as the exponential power distribution, allows for distribution tails with thicker or thinner asymptotic behaviors.