Editing Variance (section)

==Population variance and sample variance==
{{anchor|Estimation}}
{{see also|Unbiased estimation of standard deviation}}
Real-world observations such as the measurements of yesterday's rain throughout the day typically cannot be complete sets of all possible observations that could be made. As such, the variance calculated from the finite set will in general not match the variance that would have been calculated from the full population of possible observations. This means that one [[Estimation theory|estimates]] the mean and variance from a limited set of observations by using an [[estimator]] equation. The estimator is a function of the [[Sample (statistics)|sample]] of ''n'' [[observations]] drawn without observational bias from the whole [[Statistical population|population]] of potential observations. In this example, the sample would be the set of actual measurements of yesterday's rainfall from available rain gauges within the geography of interest.

The simplest estimators for population mean and population variance are simply the mean and variance of the sample, the '''sample mean''' and '''(uncorrected) sample variance''' – these are [[consistent estimator]]s (they converge to the value of the whole population as the number of samples increases) but can be improved. Most simply, the sample variance is computed as the sum of [[squared deviations]] about the (sample) mean, divided by ''n'' as the number of samples''.'' However, using values other than ''n'' improves the estimator in various ways. Four common values for the denominator are ''n,'' ''n''&nbsp;−&nbsp;1, ''n''&nbsp;+&nbsp;1, and ''n''&nbsp;−&nbsp;1.5: ''n'' is the simplest (the variance of the sample), ''n''&nbsp;−&nbsp;1 eliminates bias,<ref name=bessel /> ''n''&nbsp;+&nbsp;1 minimizes [[mean squared error]] for the normal distribution,<ref name=Kourouklis/> and ''n''&nbsp;−&nbsp;1.5 mostly eliminates bias in [[unbiased estimation of standard deviation]] for the normal distribution.<ref>{{cite journal | last = Brugger | first = R. M. | year = 1969 | title = A Note on Unbiased Estimation of the Standard Deviation | journal = The American Statistician | volume = 23 | issue = 4 | pages = 32 | doi = 10.1080/00031305.1969.10481865 }}</ref>

Firstly, if the true population mean is unknown, then the sample variance (which uses the sample mean in place of the true mean) is a [[biased estimator]]: it underestimates the variance by a factor of (''n''&nbsp;−&nbsp;1) / ''n''; correcting this factor, resulting in the sum of squared deviations about the sample mean divided by ''n'' -1 instead of ''n'', is called ''[[Bessel's correction]]''.<ref name=bessel>{{cite book | last = Reichmann | first = W. J. | title = Use and Abuse of Statistics | publisher = Methuen | year = 1961 | edition = Reprinted 1964–1970 by Pelican | location = London | chapter = Appendix 8 }}</ref> The resulting estimator is unbiased and is called the '''(corrected) sample variance''' or '''unbiased sample variance'''. If the mean is determined in some other way than from the same samples used to estimate the variance, then this bias does not arise, and the variance can safely be estimated as that of the samples about the (independently known) mean.

Secondly, the sample variance does not generally minimize [[mean squared error]] between sample variance and population variance. Correcting for bias often makes this worse: one can always choose a scale factor that performs better than the corrected sample variance, though the optimal scale factor depends on the [[excess kurtosis]] of the population (see [[Mean squared error#Variance|mean squared error: variance]]) and introduces bias. This always consists of scaling down the unbiased estimator (dividing by a number larger than ''n''&nbsp;−&nbsp;1) and is a simple example of a [[shrinkage estimator]]: one "shrinks" the unbiased estimator towards zero. For the normal distribution, dividing by ''n''&nbsp;+&nbsp;1 (instead of ''n''&nbsp;−&nbsp;1 or ''n'') minimizes mean squared error.<ref name=Kourouklis>{{Cite journal |last=Kourouklis |first=Stavros |date=2012 |title=A New Estimator of the Variance Based on Minimizing Mean Squared Error |url=https://www.jstor.org/stable/23339501 |journal=The American Statistician |volume=66 |issue=4 |pages=234–236 |doi=10.1080/00031305.2012.735209 |jstor=23339501 |issn=0003-1305|url-access=subscription }}</ref> The resulting estimator is biased, however, and is known as the '''biased sample variation'''.

===Population variance===
In general, the '''''population variance''''' of a ''finite'' [[statistical population|population]] of size {{mvar|N}} with values {{math|''x''<sub>''i''</sub>}} is given by
<math display="block">\begin{align}
  \sigma^2 &= \frac{1}{N} \sum_{i=1}^N {\left(x_i - \mu\right)}^2
            = \frac{1}{N} \sum_{i=1}^N  \left(x_i^2 - 2 \mu x_i + \mu^2 \right) \\[5pt]
           &= \left(\frac{1}{N} \sum_{i=1}^N x_i^2\right) - 2\mu \left(\frac{1}{N} \sum_{i=1}^N x_i\right) + \mu^2 \\[5pt]
           &= \operatorname{E}[x_i^2] - \mu^2 
\end{align}</math>

where the population mean is <math display="inline">\mu = \operatorname{E}[x_i] = \frac 1N \sum_{i=1}^N x_i </math> and <math display="inline">\operatorname{E}[x_i^2] = \left(\frac{1}{N} \sum_{i=1}^N x_i^2\right)  </math>, where <math display="inline">\operatorname{E} </math> is the [[Expected value|expectation value]] operator.

The population variance can also be computed using<ref>{{cite conference|author=Yuli Zhang |author2=Huaiyu Wu |author3=Lei Cheng |date=June 2012|title=Some new deformation formulas about variance and covariance|conference=Proceedings of 4th International Conference on Modelling, Identification and Control(ICMIC2012)|pages=987–992}}</ref>

<math display="block">\sigma^2 = \frac {1} {N^2}\sum_{i<j}\left( x_i-x_j \right)^2 = \frac{1}{2N^2} \sum_{i, j=1}^N\left( x_i-x_j \right)^2.</math>

(The right side has duplicate terms in the sum while the middle side has only unique terms to sum.) This is true because
<math display="block">\begin{align}
      &\frac{1}{2N^2} \sum_{i, j=1}^N {\left( x_i - x_j \right)}^2 \\[5pt]
  ={} &\frac{1}{2N^2} \sum_{i, j=1}^N \left( x_i^2 - 2x_i x_j  + x_j^2 \right) \\[5pt]
  ={} &\frac{1}{2N} \sum_{j=1}^N \left(\frac{1}{N} \sum_{i=1}^N x_i^2\right) - \left(\frac{1}{N} \sum_{i=1}^N x_i\right) \left(\frac{1}{N} \sum_{j=1}^N x_j\right) + \frac{1}{2N} \sum_{i=1}^N \left(\frac{1}{N} \sum_{j=1}^N x_j^2\right) \\[5pt]
  ={} &\frac{1}{2} \left( \sigma^2 + \mu^2 \right) - \mu^2 + \frac{1}{2} \left( \sigma^2 + \mu^2 \right) \\[5pt]
  ={} &\sigma^2.
\end{align}</math>

The population variance matches the variance of the generating probability distribution. In this sense, the concept of population can be extended to continuous random variables with infinite populations.

===Sample variance===
{{see also|Sample standard deviation}}

===={{visible anchor|Biased sample variance}}====
In many practical situations, the true variance of a population is not known ''a priori'' and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a [[sample (statistics)|sample]] of the population.<ref>{{cite book | last = Navidi | first = William | year = 2006 | title = Statistics for Engineers and Scientists | publisher = McGraw-Hill | page = 14 }}</ref> This is generally referred to as '''sample variance''' or '''empirical variance'''. Sample variance can also be applied to the estimation of the variance of a continuous distribution from a sample of that distribution.

We take a [[statistical sample|sample with replacement]] of {{mvar|n}} values {{math|''Y''<sub>1</sub>, ..., ''Y''<sub>''n''</sub>}} from the population of size {{mvar|N}}, where {{math|''n'' < ''N''}}, and estimate the variance on the basis of this sample.<ref>Montgomery, D. C. and Runger, G. C. (1994) ''Applied statistics and probability for engineers'', page 201. John Wiley & Sons New York</ref> Directly taking the variance of the sample data gives the average of the [[squared deviations]]:<ref>{{cite conference | author1 = Yuli Zhang | author2 = Huaiyu Wu | author3 = Lei Cheng | date = June 2012 | title = Some new deformation formulas about variance and covariance | conference = Proceedings of 4th International Conference on Modelling, Identification and Control(ICMIC2012) | pages = 987–992 }}</ref>

<math display="block">\tilde{S}_Y^2 =
  \frac{1}{n} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 = \left(\frac 1n \sum_{i=1}^n Y_i^2\right) - \overline{Y}^2 =
  \frac{1}{n^2} \sum_{i,j\,:\,i<j}\left(Y_i - Y_j\right)^2.
</math>

(See the section [[Variance#Population variance|Population variance]] for the derivation of this formula.) Here, <math>\overline{Y}</math> denotes the [[sample mean]]: 
<math display="block">\overline{Y} = \frac{1}{n} \sum_{i=1}^n Y_i .</math>

Since the {{math|''Y''<sub>''i''</sub>}} are selected randomly, both <math>\overline{Y}</math> and <math>\tilde{S}_Y^2</math> are [[Random variable|random variables]]. Their expected values can be evaluated by averaging over the ensemble of all possible samples {{math|{''Y''<sub>''i''</sub>}<nowiki/>}} of size {{mvar|n}} from the population. For <math>\tilde{S}_Y^2</math> this gives:
<math display="block">\begin{align}
  \operatorname{E}[\tilde{S}_Y^2]
    &= \operatorname{E}\left[ \frac{1}{n} \sum_{i=1}^n {\left(Y_i - \frac{1}{n} \sum_{j=1}^n Y_j \right)}^2 \right] \\[5pt]
    &= \frac 1n \sum_{i=1}^n \operatorname{E}\left[ Y_i^2 - \frac{2}{n} Y_i \sum_{j=1}^n Y_j + \frac{1}{n^2} \sum_{j=1}^n Y_j \sum_{k=1}^n Y_k \right] \\[5pt]
    &= \frac 1n \sum_{i=1}^n \left( \operatorname{E}\left[Y_i^2\right] - \frac{2}{n} \left( \sum_{j \neq i} \operatorname{E}\left[Y_i Y_j\right] + \operatorname{E}\left[Y_i^2\right] \right) + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j}^n \operatorname{E}\left[Y_j Y_k\right] +\frac{1}{n^2} \sum_{j=1}^n \operatorname{E}\left[Y_j^2\right] \right) \\[5pt]
    &= \frac 1n \sum_{i=1}^n \left( \frac{n - 2}{n} \operatorname{E}\left[Y_i^2\right] - \frac{2}{n} \sum_{j \neq i} \operatorname{E}\left[Y_i Y_j\right] + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j}^n \operatorname{E}\left[Y_j Y_k\right] +\frac{1}{n^2} \sum_{j=1}^n \operatorname{E}\left[Y_j^2\right] \right) \\[5pt]
    &= \frac 1n \sum_{i=1}^n \left[ \frac{n - 2}{n} \left(\sigma^2 + \mu^2\right) - \frac{2}{n} (n - 1)\mu^2 + \frac{1}{n^2} n(n - 1)\mu^2 + \frac{1}{n} \left(\sigma^2 + \mu^2\right) \right] \\[5pt]
    &= \frac{n - 1}{n} \sigma^2.
\end{align}</math>

Here <math display="inline">\sigma^2 = \operatorname{E}[Y_i^2] - \mu^2 </math> derived in the section is [[Variance#Population variance|population variance]] and <math display="inline">\operatorname{E}[Y_i Y_j] = \operatorname{E}[Y_i] \operatorname{E}[Y_j] = \mu^2</math> due to independency of <math display="inline">Y_i</math> and <math display="inline">Y_j</math>.

Hence <math display="inline">\tilde{S}_Y^2</math> gives an estimate of the population variance <math display="inline">\sigma^2</math> that is biased by a factor of <math display="inline">\frac{n - 1}{n}</math> because the expectation value of <math display="inline">\tilde{S}_Y^2</math> is smaller than the population variance (true variance) by that factor. For this reason, <math display="inline">\tilde{S}_Y^2</math> is referred to as the ''biased sample variance''.

===={{visible anchor|Unbiased sample variance}}====
Correcting for this bias yields the ''unbiased sample variance'', denoted <math>S^2</math>:

<math display="block">S^2 = \frac{n}{n - 1} \tilde{S}_Y^2 = \frac{n}{n - 1} \left[ \frac{1}{n} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 \right] = \frac{1}{n - 1} \sum_{i=1}^n \left(Y_i - \overline{Y} \right)^2</math>

Either estimator may be simply referred to as the ''sample variance'' when the version can be determined by context. The same proof is also applicable for samples taken from a continuous probability distribution.

The use of the term {{math|''n'' − 1}} is called [[Bessel's correction]], and it is also used in [[sample covariance]] and the [[sample standard deviation]] (the square root of variance). The square root is a [[concave function]] and thus introduces negative bias (by [[Jensen's inequality]]), which depends on the distribution, and thus the corrected sample standard deviation (using Bessel's correction) is biased. The [[unbiased estimation of standard deviation]] is a technically involved problem, though for the normal distribution using the term {{math|''n'' − 1.5}} yields an almost unbiased estimator.

The unbiased sample variance is a [[U-statistic]] for the function {{math|1=''f''(''y''<sub>1</sub>, ''y''<sub>2</sub>) = (''y''<sub>1</sub> − ''y''<sub>2</sub>)<sup>2</sup>/2}}, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.

===== Example =====
For a set of numbers {10, 15, 30, 45, 57, 52, 63, 72, 81, 93, 102, 105}, if this set is the whole data population for some measurement, then variance is the population variance 932.743 as the sum of the squared deviations about the mean of this set, divided by 12 as the number of the set members. If the set is a sample from the whole population, then the unbiased sample variance can be calculated as 1017.538 that is the sum of the squared deviations about the mean of the sample, divided by 11 instead of 12. A function VAR.S in [[Microsoft Excel]] gives the unbiased sample variance while VAR.P is for population variance.

====Distribution of the sample variance====
{{multiple image
<!-- Essential parameters -->
| align     = right <!-- left/right/center/none --> 
| direction = vertical <!-- horizontal/vertical -->
| width     = 250 <!-- Digits only; no "px" suffix, please -->

<!-- Image 1 -->
| image1   = Scaled chi squared.svg <!-- Filename only; no "File:" or "Image:" prefix, please -->
| width1   =
| alt1     =
| caption1 =

<!-- Image 2 -->
| image2   = Scaled chi squared cdf.svg <!-- Filename only; no "File:" or "Image:" prefix, please -->
| width2   =
| alt2     =
| caption2 = Distribution and cumulative distribution of ''S''<sup>2</sup>/&sigma;<sup>2</sup>, for various values of ''ν'' = ''n'' − 1, when the ''y<sub>i</sub>'' are independent normally distributed.
}}

Being a function of [[random variable]]s, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that ''Y''<sub>''i''</sub> are independent observations from a [[normal distribution]], [[Cochran's theorem]] shows that the [[Variance#Unbiased sample variance|unbiased sample variance]] ''S''<sup>2</sup> follows a scaled [[chi-squared distribution]] (see also: [[chi-squared distribution#Asymptotic properties|asymptotic properties]] and an [[Chi-squared distribution#Cochran's_theorem|elementary proof]]):<ref>{{cite book | last = Knight | first = K. | year = 2000 | title = Mathematical Statistics | publisher = Chapman and Hall | location = New York | at = proposition 2.11 }}</ref>
<math display="block">
(n - 1) \frac{S^2}{\sigma^2} \sim \chi^2_{n-1} 
</math>

where {{math|''σ''<sup>2</sup>}} is the [[Variance#Population variance|population variance]]. As a direct consequence, it follows that 
<math display="block">
\operatorname{E}\left(S^2\right) = \operatorname{E}\left(\frac{\sigma^2}{n - 1} \chi^2_{n-1}\right) = \sigma^2 ,
</math>

and<ref>{{cite book | author1-last = Casella | author1-first = George | author2-last = Berger | author2-first = Roger L. | year = 2002 | title = Statistical Inference | at = Example 7.3.3, p.&nbsp;331 | edition = 2nd | isbn = 0-534-24312-6 }}</ref>

<math display="block">
 \operatorname{Var}\left[S^2\right] = \operatorname{Var}\left(\frac{\sigma^2}{n - 1} \chi^2_{n-1}\right) = \frac{\sigma^4}{{\left(n - 1\right)}^2} \operatorname{Var}\left(\chi^2_{n-1}\right) = \frac{2\sigma^4}{n - 1}.
</math>

If ''Y''<sub>''i''</sub> are independent and identically distributed, but not necessarily normally distributed, then<ref>Mood, A. M., Graybill, F. A., and Boes, D.C. (1974) ''Introduction to the Theory of Statistics'', 3rd Edition, McGraw-Hill, New York, p. 229</ref>

<math display="block">
    \operatorname{E}\left[S^2\right] = \sigma^2, \quad
    \operatorname{Var}\left[S^2\right] = \frac{\sigma^4}{n} \left(\kappa - 1 + \frac{2}{n - 1} \right) = \frac{1}{n} \left(\mu_4 - \frac{n - 3}{n - 1}\sigma^4\right),
</math>

where ''κ'' is the [[kurtosis]] of the distribution and ''μ''<sub>4</sub> is the fourth [[central moment]].

If the conditions of the [[law of large numbers]] hold for the squared observations, ''S''<sup>2</sup> is a [[consistent estimator]] of&nbsp;''σ''<sup>2</sup>. One can see indeed that the variance of the estimator tends asymptotically to zero.  An asymptotically equivalent formula was given in Kenney and Keeping (1951:164), Rose and Smith (2002:264), and Weisstein (n.d.).<ref>{{cite book |last1=Kenney |first1=John F. |last2=Keeping |first2=E.S. |date=1951 |title=Mathematics of Statistics. Part Two. |edition=2nd |publisher=D. Van Nostrand Company, Inc. |location=Princeton, New Jersey |url=http://krishikosh.egranth.ac.in/bitstream/1/2025521/1/G2257.pdf |via=KrishiKosh |url-status=dead |archive-url=https://web.archive.org/web/20181117022434/http://krishikosh.egranth.ac.in/bitstream/1/2025521/1/G2257.pdf |archive-date= Nov 17, 2018 }}</ref><ref>Rose, Colin; Smith, Murray D. (2002). "[http://www.mathstatica.com/book/Mathematical_Statistics_with_Mathematica.pdf Mathematical Statistics with Mathematica]". Springer-Verlag, New York.</ref><ref>Weisstein, Eric W. "[http://mathworld.wolfram.com/SampleVarianceDistribution.html Sample Variance Distribution]". MathWorld Wolfram.</ref>

====Samuelson's inequality====

[[Samuelson's inequality]] is a result that states bounds on the values that individual observations in a sample can take, given that the sample mean and (biased) variance have been calculated.<ref>{{cite journal |last=Samuelson |first=Paul |title=How Deviant Can You Be? |journal=[[Journal of the American Statistical Association]] |volume=63 |issue=324 |year=1968 |pages=1522–1525 |jstor=2285901 |doi=10.1080/01621459.1968.10480944}}</ref> Values must lie within the limits <math>\bar y \pm \sigma_Y (n-1)^{1/2}.</math>

===Relations with the harmonic and arithmetic means===

It has been shown<ref>{{cite journal |first=A. McD. |last=Mercer |title=Bounds for A–G, A–H, G–H, and a family of inequalities of Ky Fan's type, using a general method |journal=J. Math. Anal. Appl. |volume=243 |issue=1 |pages=163–173 |year=2000 |doi=10.1006/jmaa.1999.6688 |doi-access=free }}</ref> that for a sample {''y''<sub>''i''</sub>} of positive real numbers,

<math display="block">  \sigma_y^2 \le 2y_{\max} (A - H), </math>

where {{math|''y''<sub>max</sub>}} is the maximum of the sample, {{mvar|A}} is the arithmetic mean, {{mvar|H}} is the [[harmonic mean]] of the sample and <math>\sigma_y^2</math> is the (biased) variance of the sample.

This bound has been improved, and it is known that variance is bounded by

<math display="block">\begin{align}
 \sigma_y^2 &\le \frac{y_{\max} (A - H)(y_\max - A)}{y_\max - H}, \\[1ex]
 \sigma_y^2 &\ge \frac{y_{\min} (A - H)(A - y_\min)}{H - y_\min},
\end{align} </math>

where {{math|''y''<sub>min</sub>}} is the minimum of the sample.<ref name=Sharma2008>{{cite journal |first=R. |last=Sharma |title= Some more inequalities for arithmetic mean, harmonic mean and variance|journal= Journal of Mathematical Inequalities|volume=2 |issue=1 |pages=109–114 |year=2008 |doi=10.7153/jmi-02-11|citeseerx=10.1.1.551.9397 }}</ref>

==Tests of equality of variances==

The [[F-test of equality of variances]] and the [[chi square test]]s are adequate when the sample is normally distributed. Non-normality makes testing for the equality of two or more variances more difficult. 

Several non parametric tests have been proposed: these include the Barton–David–Ansari–Freund–Siegel–Tukey test, the [[Capon test]], [[Mood test]], the [[Klotz test]] and the [[Sukhatme test]]. The Sukhatme test applies to two variances and requires that both [[median]]s be known and equal to zero. The Mood, Klotz, Capon and Barton–David–Ansari–Freund–Siegel–Tukey tests also apply to two variances. They allow the median to be unknown but do require that the two medians are equal.

The [[Lehmann test]] is a parametric test of two variances. Of this test there are several variants known. Other tests of the equality of variances include the [[Box test]], the [[Box–Anderson test]] and the [[Moses test]].

Resampling methods, which include the [[Bootstrapping (statistics)|bootstrap]] and the [[Resampling (statistics)|jackknife]], may be used to test the equality of variances.

==Moment of inertia==
{{see also|Moment (physics)#Examples}}
The variance of a probability distribution is analogous to the  [[moment of inertia]] in [[classical mechanics]] of a corresponding mass distribution along a line, with respect to rotation about its center of mass.<ref name=pearson>{{Cite web |last=Magnello |first=M. Eileen |title=Karl Pearson and the Origins of Modern Statistics: An Elastician becomes a Statistician | url=https://rutherfordjournal.org/article010107.html |website=The Rutherford Journal}}</ref>  It is because of this analogy that such things as the variance are called ''[[moment (mathematics)|moment]]s'' of [[probability distribution]]s.<ref name=pearson/> The covariance matrix is related to the [[moment of inertia tensor]] for multivariate distributions. The moment of inertia of a cloud of ''n'' points with a covariance matrix of <math>\Sigma</math> is given by{{Citation needed|date=February 2012}}
<math display="block">I = n\left(\mathbf{1}_{3\times 3} \operatorname{tr}(\Sigma) - \Sigma\right).</math>

This difference between moment of inertia in physics and in statistics is clear for points that are gathered along a line. Suppose many points are close to the ''x'' axis and distributed along it. The covariance matrix might look like
<math display="block">\Sigma = \begin{bmatrix} 10 & 0 & 0 \\ 0 & 0.1 & 0 \\ 0 & 0 & 0.1 \end{bmatrix}.</math>

That is, there is the most variance in the ''x'' direction.  Physicists would consider this to have a low moment ''about'' the ''x'' axis so the moment-of-inertia tensor is
<math display="block">I = n\begin{bmatrix} 0.2 & 0 & 0 \\ 0 & 10.1 & 0 \\ 0 & 0 & 10.1 \end{bmatrix}.</math>

==Semivariance==

The ''semivariance'' is calculated in the same manner as the variance but only those observations that fall below the mean are included in the calculation:
<math display="block">\text{Semivariance} = \frac{1}{n} \sum_{i:x_i < \mu} {\left(x_i - \mu\right)}^2</math>
It is also described as a specific measure in different fields of application. For skewed distributions, the semivariance can provide additional information that a variance does not.<ref>{{Cite web|url=https://famafrench.dimensional.com/questions-answers/qa-semi-variance-a-better-risk-measure.aspx|title=Q&A: Semi-Variance: A Better Risk Measure?|last1=Fama|first1=Eugene F.|last2=French|first2=Kenneth R.|date=2010-04-21|website=Fama/French Forum}}</ref>

For inequalities associated with the semivariance, see {{Section link|Chebyshev's inequality|Semivariances}}.

==Etymology==
The term ''variance'' was first introduced by [[Ronald Fisher]] in his 1918 paper ''[[The Correlation Between Relatives on the Supposition of Mendelian Inheritance]]'':<ref>[[Ronald Fisher]] (1918) [http://digital.library.adelaide.edu.au/dspace/bitstream/2440/15097/1/9.pdf The correlation between relatives on the supposition of Mendelian Inheritance]</ref>

<blockquote>The great body of available statistics show us that the deviations of a [[biometry|human measurement]] from its mean follow very closely the [[Normal distribution|Normal Law of Errors]], and, therefore, that the variability may be uniformly measured by the [[standard deviation]] corresponding to the [[square root]] of the [[mean square error]]. When there are two independent causes of variability capable of producing in an otherwise uniform population distributions with standard deviations <math>\sigma_1</math> and <math>\sigma_2</math>, it is found that the distribution, when both causes act together, has a standard deviation <math>\sqrt{\sigma_1^2 + \sigma_2^2}</math>.  It is therefore desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability.  We shall term this quantity the Variance...</blockquote>

==Generalizations==

===For complex variables===
If <math>x</math> is a scalar [[complex number|complex]]-valued random variable, with values in <math>\mathbb{C},</math> then its variance is <math>\operatorname{E}\left[(x - \mu)(x - \mu)^*\right],</math> where <math>x^*</math> is the [[complex conjugate]] of <math>x.</math>  This variance is a real scalar.

===For vector-valued random variables===

====As a matrix====
If <math>X</math> is a [[vector space|vector]]-valued random variable, with values in <math>\mathbb{R}^n,</math> and thought of as a column vector, then a natural generalization of variance is <math>\operatorname{E}\left[(X - \mu) {(X - \mu)}^{\mathsf{T}}\right],</math> where <math>\mu = \operatorname{E}(X)</math> and <math>X^{\mathsf{T}}</math> is the transpose of {{mvar|X}}, and so is a row vector.  The result is a [[positive definite matrix|positive semi-definite square matrix]], commonly referred to as the [[variance-covariance matrix]] (or simply as the ''covariance matrix'').

If <math>X</math> is a vector- and complex-valued random variable, with values in <math>\mathbb{C}^n,</math> then the [[Covariance matrix#Complex random vectors|covariance matrix is]] <math>\operatorname{E}\left[(X - \mu){(X - \mu)}^\dagger\right],</math> where <math>X^\dagger</math> is the [[conjugate transpose]] of <math>X.</math>{{Citation needed|date=September 2016}}  This matrix is also positive semi-definite and square.

====As a scalar====
Another generalization of variance for vector-valued random variables <math>X</math>, which results in a scalar value rather than in a matrix, is the [[generalized variance]] <math>\det(C)</math>, the [[determinant]] of the covariance matrix. The generalized variance can be shown to be related to the multidimensional scatter of points around their mean.<ref>{{cite book |last1=Kocherlakota |first1=S. |title=Encyclopedia of Statistical Sciences |last2=Kocherlakota |first2=K. |chapter=Generalized Variance |publisher=Wiley Online Library |doi=10.1002/0471667196.ess0869 |year=2004 |isbn=0-471-66719-6 }}</ref>

A different generalization is obtained by considering the equation for the scalar variance, <math> \operatorname{Var}(X) = \operatorname{E}\left[(X - \mu)^2 \right] </math>, and reinterpreting <math>(X - \mu)^2</math> as the squared [[Euclidean distance]] between the random variable and its mean, or, simply as the scalar product of the vector <math>X - \mu</math> with itself. This results in <math>\operatorname{E}\left[(X - \mu)^{\mathsf{T}}(X - \mu)\right] = \operatorname{tr}(C),</math> which is the [[Trace (linear algebra)|trace]] of the covariance matrix.