Editing Variance (section)

===Sample variance===
{{see also|Sample standard deviation}}

===={{visible anchor|Biased sample variance}}====
In many practical situations, the true variance of a population is not known ''a priori'' and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a [[sample (statistics)|sample]] of the population.<ref>{{cite book | last = Navidi | first = William | year = 2006 | title = Statistics for Engineers and Scientists | publisher = McGraw-Hill | page = 14 }}</ref> This is generally referred to as '''sample variance''' or '''empirical variance'''. Sample variance can also be applied to the estimation of the variance of a continuous distribution from a sample of that distribution.

We take a [[statistical sample|sample with replacement]] of {{mvar|n}} values {{math|''Y''<sub>1</sub>, ..., ''Y''<sub>''n''</sub>}} from the population of size {{mvar|N}}, where {{math|''n'' < ''N''}}, and estimate the variance on the basis of this sample.<ref>Montgomery, D. C. and Runger, G. C. (1994) ''Applied statistics and probability for engineers'', page 201. John Wiley & Sons New York</ref> Directly taking the variance of the sample data gives the average of the [[squared deviations]]:<ref>{{cite conference | author1 = Yuli Zhang | author2 = Huaiyu Wu | author3 = Lei Cheng | date = June 2012 | title = Some new deformation formulas about variance and covariance | conference = Proceedings of 4th International Conference on Modelling, Identification and Control(ICMIC2012) | pages = 987–992 }}</ref>

<math display="block">\tilde{S}_Y^2 =
  \frac{1}{n} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 = \left(\frac 1n \sum_{i=1}^n Y_i^2\right) - \overline{Y}^2 =
  \frac{1}{n^2} \sum_{i,j\,:\,i<j}\left(Y_i - Y_j\right)^2.
</math>

(See the section [[Variance#Population variance|Population variance]] for the derivation of this formula.) Here, <math>\overline{Y}</math> denotes the [[sample mean]]: 
<math display="block">\overline{Y} = \frac{1}{n} \sum_{i=1}^n Y_i .</math>

Since the {{math|''Y''<sub>''i''</sub>}} are selected randomly, both <math>\overline{Y}</math> and <math>\tilde{S}_Y^2</math> are [[Random variable|random variables]]. Their expected values can be evaluated by averaging over the ensemble of all possible samples {{math|{''Y''<sub>''i''</sub>}<nowiki/>}} of size {{mvar|n}} from the population. For <math>\tilde{S}_Y^2</math> this gives:
<math display="block">\begin{align}
  \operatorname{E}[\tilde{S}_Y^2]
    &= \operatorname{E}\left[ \frac{1}{n} \sum_{i=1}^n {\left(Y_i - \frac{1}{n} \sum_{j=1}^n Y_j \right)}^2 \right] \\[5pt]
    &= \frac 1n \sum_{i=1}^n \operatorname{E}\left[ Y_i^2 - \frac{2}{n} Y_i \sum_{j=1}^n Y_j + \frac{1}{n^2} \sum_{j=1}^n Y_j \sum_{k=1}^n Y_k \right] \\[5pt]
    &= \frac 1n \sum_{i=1}^n \left( \operatorname{E}\left[Y_i^2\right] - \frac{2}{n} \left( \sum_{j \neq i} \operatorname{E}\left[Y_i Y_j\right] + \operatorname{E}\left[Y_i^2\right] \right) + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j}^n \operatorname{E}\left[Y_j Y_k\right] +\frac{1}{n^2} \sum_{j=1}^n \operatorname{E}\left[Y_j^2\right] \right) \\[5pt]
    &= \frac 1n \sum_{i=1}^n \left( \frac{n - 2}{n} \operatorname{E}\left[Y_i^2\right] - \frac{2}{n} \sum_{j \neq i} \operatorname{E}\left[Y_i Y_j\right] + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j}^n \operatorname{E}\left[Y_j Y_k\right] +\frac{1}{n^2} \sum_{j=1}^n \operatorname{E}\left[Y_j^2\right] \right) \\[5pt]
    &= \frac 1n \sum_{i=1}^n \left[ \frac{n - 2}{n} \left(\sigma^2 + \mu^2\right) - \frac{2}{n} (n - 1)\mu^2 + \frac{1}{n^2} n(n - 1)\mu^2 + \frac{1}{n} \left(\sigma^2 + \mu^2\right) \right] \\[5pt]
    &= \frac{n - 1}{n} \sigma^2.
\end{align}</math>

Here <math display="inline">\sigma^2 = \operatorname{E}[Y_i^2] - \mu^2 </math> derived in the section is [[Variance#Population variance|population variance]] and <math display="inline">\operatorname{E}[Y_i Y_j] = \operatorname{E}[Y_i] \operatorname{E}[Y_j] = \mu^2</math> due to independency of <math display="inline">Y_i</math> and <math display="inline">Y_j</math>.

Hence <math display="inline">\tilde{S}_Y^2</math> gives an estimate of the population variance <math display="inline">\sigma^2</math> that is biased by a factor of <math display="inline">\frac{n - 1}{n}</math> because the expectation value of <math display="inline">\tilde{S}_Y^2</math> is smaller than the population variance (true variance) by that factor. For this reason, <math display="inline">\tilde{S}_Y^2</math> is referred to as the ''biased sample variance''.

===={{visible anchor|Unbiased sample variance}}====
Correcting for this bias yields the ''unbiased sample variance'', denoted <math>S^2</math>:

<math display="block">S^2 = \frac{n}{n - 1} \tilde{S}_Y^2 = \frac{n}{n - 1} \left[ \frac{1}{n} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 \right] = \frac{1}{n - 1} \sum_{i=1}^n \left(Y_i - \overline{Y} \right)^2</math>

Either estimator may be simply referred to as the ''sample variance'' when the version can be determined by context. The same proof is also applicable for samples taken from a continuous probability distribution.

The use of the term {{math|''n'' − 1}} is called [[Bessel's correction]], and it is also used in [[sample covariance]] and the [[sample standard deviation]] (the square root of variance). The square root is a [[concave function]] and thus introduces negative bias (by [[Jensen's inequality]]), which depends on the distribution, and thus the corrected sample standard deviation (using Bessel's correction) is biased. The [[unbiased estimation of standard deviation]] is a technically involved problem, though for the normal distribution using the term {{math|''n'' − 1.5}} yields an almost unbiased estimator.

The unbiased sample variance is a [[U-statistic]] for the function {{math|1=''f''(''y''<sub>1</sub>, ''y''<sub>2</sub>) = (''y''<sub>1</sub> − ''y''<sub>2</sub>)<sup>2</sup>/2}}, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.

===== Example =====
For a set of numbers {10, 15, 30, 45, 57, 52, 63, 72, 81, 93, 102, 105}, if this set is the whole data population for some measurement, then variance is the population variance 932.743 as the sum of the squared deviations about the mean of this set, divided by 12 as the number of the set members. If the set is a sample from the whole population, then the unbiased sample variance can be calculated as 1017.538 that is the sum of the squared deviations about the mean of the sample, divided by 11 instead of 12. A function VAR.S in [[Microsoft Excel]] gives the unbiased sample variance while VAR.P is for population variance.

====Distribution of the sample variance====
{{multiple image
<!-- Essential parameters -->
| align     = right <!-- left/right/center/none --> 
| direction = vertical <!-- horizontal/vertical -->
| width     = 250 <!-- Digits only; no "px" suffix, please -->

<!-- Image 1 -->
| image1   = Scaled chi squared.svg <!-- Filename only; no "File:" or "Image:" prefix, please -->
| width1   =
| alt1     =
| caption1 =

<!-- Image 2 -->
| image2   = Scaled chi squared cdf.svg <!-- Filename only; no "File:" or "Image:" prefix, please -->
| width2   =
| alt2     =
| caption2 = Distribution and cumulative distribution of ''S''<sup>2</sup>/&sigma;<sup>2</sup>, for various values of ''ν'' = ''n'' − 1, when the ''y<sub>i</sub>'' are independent normally distributed.
}}

Being a function of [[random variable]]s, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that ''Y''<sub>''i''</sub> are independent observations from a [[normal distribution]], [[Cochran's theorem]] shows that the [[Variance#Unbiased sample variance|unbiased sample variance]] ''S''<sup>2</sup> follows a scaled [[chi-squared distribution]] (see also: [[chi-squared distribution#Asymptotic properties|asymptotic properties]] and an [[Chi-squared distribution#Cochran's_theorem|elementary proof]]):<ref>{{cite book | last = Knight | first = K. | year = 2000 | title = Mathematical Statistics | publisher = Chapman and Hall | location = New York | at = proposition 2.11 }}</ref>
<math display="block">
(n - 1) \frac{S^2}{\sigma^2} \sim \chi^2_{n-1} 
</math>

where {{math|''σ''<sup>2</sup>}} is the [[Variance#Population variance|population variance]]. As a direct consequence, it follows that 
<math display="block">
\operatorname{E}\left(S^2\right) = \operatorname{E}\left(\frac{\sigma^2}{n - 1} \chi^2_{n-1}\right) = \sigma^2 ,
</math>

and<ref>{{cite book | author1-last = Casella | author1-first = George | author2-last = Berger | author2-first = Roger L. | year = 2002 | title = Statistical Inference | at = Example 7.3.3, p.&nbsp;331 | edition = 2nd | isbn = 0-534-24312-6 }}</ref>

<math display="block">
 \operatorname{Var}\left[S^2\right] = \operatorname{Var}\left(\frac{\sigma^2}{n - 1} \chi^2_{n-1}\right) = \frac{\sigma^4}{{\left(n - 1\right)}^2} \operatorname{Var}\left(\chi^2_{n-1}\right) = \frac{2\sigma^4}{n - 1}.
</math>

If ''Y''<sub>''i''</sub> are independent and identically distributed, but not necessarily normally distributed, then<ref>Mood, A. M., Graybill, F. A., and Boes, D.C. (1974) ''Introduction to the Theory of Statistics'', 3rd Edition, McGraw-Hill, New York, p. 229</ref>

<math display="block">
    \operatorname{E}\left[S^2\right] = \sigma^2, \quad
    \operatorname{Var}\left[S^2\right] = \frac{\sigma^4}{n} \left(\kappa - 1 + \frac{2}{n - 1} \right) = \frac{1}{n} \left(\mu_4 - \frac{n - 3}{n - 1}\sigma^4\right),
</math>

where ''κ'' is the [[kurtosis]] of the distribution and ''μ''<sub>4</sub> is the fourth [[central moment]].

If the conditions of the [[law of large numbers]] hold for the squared observations, ''S''<sup>2</sup> is a [[consistent estimator]] of&nbsp;''σ''<sup>2</sup>. One can see indeed that the variance of the estimator tends asymptotically to zero.  An asymptotically equivalent formula was given in Kenney and Keeping (1951:164), Rose and Smith (2002:264), and Weisstein (n.d.).<ref>{{cite book |last1=Kenney |first1=John F. |last2=Keeping |first2=E.S. |date=1951 |title=Mathematics of Statistics. Part Two. |edition=2nd |publisher=D. Van Nostrand Company, Inc. |location=Princeton, New Jersey |url=http://krishikosh.egranth.ac.in/bitstream/1/2025521/1/G2257.pdf |via=KrishiKosh |url-status=dead |archive-url=https://web.archive.org/web/20181117022434/http://krishikosh.egranth.ac.in/bitstream/1/2025521/1/G2257.pdf |archive-date= Nov 17, 2018 }}</ref><ref>Rose, Colin; Smith, Murray D. (2002). "[http://www.mathstatica.com/book/Mathematical_Statistics_with_Mathematica.pdf Mathematical Statistics with Mathematica]". Springer-Verlag, New York.</ref><ref>Weisstein, Eric W. "[http://mathworld.wolfram.com/SampleVarianceDistribution.html Sample Variance Distribution]". MathWorld Wolfram.</ref>

====Samuelson's inequality====

[[Samuelson's inequality]] is a result that states bounds on the values that individual observations in a sample can take, given that the sample mean and (biased) variance have been calculated.<ref>{{cite journal |last=Samuelson |first=Paul |title=How Deviant Can You Be? |journal=[[Journal of the American Statistical Association]] |volume=63 |issue=324 |year=1968 |pages=1522–1525 |jstor=2285901 |doi=10.1080/01621459.1968.10480944}}</ref> Values must lie within the limits <math>\bar y \pm \sigma_Y (n-1)^{1/2}.</math>

===Relations with the harmonic and arithmetic means===

It has been shown<ref>{{cite journal |first=A. McD. |last=Mercer |title=Bounds for A–G, A–H, G–H, and a family of inequalities of Ky Fan's type, using a general method |journal=J. Math. Anal. Appl. |volume=243 |issue=1 |pages=163–173 |year=2000 |doi=10.1006/jmaa.1999.6688 |doi-access=free }}</ref> that for a sample {''y''<sub>''i''</sub>} of positive real numbers,

<math display="block">  \sigma_y^2 \le 2y_{\max} (A - H), </math>

where {{math|''y''<sub>max</sub>}} is the maximum of the sample, {{mvar|A}} is the arithmetic mean, {{mvar|H}} is the [[harmonic mean]] of the sample and <math>\sigma_y^2</math> is the (biased) variance of the sample.

This bound has been improved, and it is known that variance is bounded by

<math display="block">\begin{align}
 \sigma_y^2 &\le \frac{y_{\max} (A - H)(y_\max - A)}{y_\max - H}, \\[1ex]
 \sigma_y^2 &\ge \frac{y_{\min} (A - H)(A - y_\min)}{H - y_\min},
\end{align} </math>

where {{math|''y''<sub>min</sub>}} is the minimum of the sample.<ref name=Sharma2008>{{cite journal |first=R. |last=Sharma |title= Some more inequalities for arithmetic mean, harmonic mean and variance|journal= Journal of Mathematical Inequalities|volume=2 |issue=1 |pages=109–114 |year=2008 |doi=10.7153/jmi-02-11|citeseerx=10.1.1.551.9397 }}</ref>

==Tests of equality of variances==

The [[F-test of equality of variances]] and the [[chi square test]]s are adequate when the sample is normally distributed. Non-normality makes testing for the equality of two or more variances more difficult. 

Several non parametric tests have been proposed: these include the Barton–David–Ansari–Freund–Siegel–Tukey test, the [[Capon test]], [[Mood test]], the [[Klotz test]] and the [[Sukhatme test]]. The Sukhatme test applies to two variances and requires that both [[median]]s be known and equal to zero. The Mood, Klotz, Capon and Barton–David–Ansari–Freund–Siegel–Tukey tests also apply to two variances. They allow the median to be unknown but do require that the two medians are equal.

The [[Lehmann test]] is a parametric test of two variances. Of this test there are several variants known. Other tests of the equality of variances include the [[Box test]], the [[Box–Anderson test]] and the [[Moses test]].

Resampling methods, which include the [[Bootstrapping (statistics)|bootstrap]] and the [[Resampling (statistics)|jackknife]], may be used to test the equality of variances.

==Moment of inertia==
{{see also|Moment (physics)#Examples}}
The variance of a probability distribution is analogous to the  [[moment of inertia]] in [[classical mechanics]] of a corresponding mass distribution along a line, with respect to rotation about its center of mass.<ref name=pearson>{{Cite web |last=Magnello |first=M. Eileen |title=Karl Pearson and the Origins of Modern Statistics: An Elastician becomes a Statistician | url=https://rutherfordjournal.org/article010107.html |website=The Rutherford Journal}}</ref>  It is because of this analogy that such things as the variance are called ''[[moment (mathematics)|moment]]s'' of [[probability distribution]]s.<ref name=pearson/> The covariance matrix is related to the [[moment of inertia tensor]] for multivariate distributions. The moment of inertia of a cloud of ''n'' points with a covariance matrix of <math>\Sigma</math> is given by{{Citation needed|date=February 2012}}
<math display="block">I = n\left(\mathbf{1}_{3\times 3} \operatorname{tr}(\Sigma) - \Sigma\right).</math>

This difference between moment of inertia in physics and in statistics is clear for points that are gathered along a line. Suppose many points are close to the ''x'' axis and distributed along it. The covariance matrix might look like
<math display="block">\Sigma = \begin{bmatrix} 10 & 0 & 0 \\ 0 & 0.1 & 0 \\ 0 & 0 & 0.1 \end{bmatrix}.</math>

That is, there is the most variance in the ''x'' direction.  Physicists would consider this to have a low moment ''about'' the ''x'' axis so the moment-of-inertia tensor is
<math display="block">I = n\begin{bmatrix} 0.2 & 0 & 0 \\ 0 & 10.1 & 0 \\ 0 & 0 & 10.1 \end{bmatrix}.</math>

==Semivariance==

The ''semivariance'' is calculated in the same manner as the variance but only those observations that fall below the mean are included in the calculation:
<math display="block">\text{Semivariance} = \frac{1}{n} \sum_{i:x_i < \mu} {\left(x_i - \mu\right)}^2</math>
It is also described as a specific measure in different fields of application. For skewed distributions, the semivariance can provide additional information that a variance does not.<ref>{{Cite web|url=https://famafrench.dimensional.com/questions-answers/qa-semi-variance-a-better-risk-measure.aspx|title=Q&A: Semi-Variance: A Better Risk Measure?|last1=Fama|first1=Eugene F.|last2=French|first2=Kenneth R.|date=2010-04-21|website=Fama/French Forum}}</ref>

For inequalities associated with the semivariance, see {{Section link|Chebyshev's inequality|Semivariances}}.

==Etymology==
The term ''variance'' was first introduced by [[Ronald Fisher]] in his 1918 paper ''[[The Correlation Between Relatives on the Supposition of Mendelian Inheritance]]'':<ref>[[Ronald Fisher]] (1918) [http://digital.library.adelaide.edu.au/dspace/bitstream/2440/15097/1/9.pdf The correlation between relatives on the supposition of Mendelian Inheritance]</ref>

<blockquote>The great body of available statistics show us that the deviations of a [[biometry|human measurement]] from its mean follow very closely the [[Normal distribution|Normal Law of Errors]], and, therefore, that the variability may be uniformly measured by the [[standard deviation]] corresponding to the [[square root]] of the [[mean square error]]. When there are two independent causes of variability capable of producing in an otherwise uniform population distributions with standard deviations <math>\sigma_1</math> and <math>\sigma_2</math>, it is found that the distribution, when both causes act together, has a standard deviation <math>\sqrt{\sigma_1^2 + \sigma_2^2}</math>.  It is therefore desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability.  We shall term this quantity the Variance...</blockquote>

==Generalizations==

===For complex variables===
If <math>x</math> is a scalar [[complex number|complex]]-valued random variable, with values in <math>\mathbb{C},</math> then its variance is <math>\operatorname{E}\left[(x - \mu)(x - \mu)^*\right],</math> where <math>x^*</math> is the [[complex conjugate]] of <math>x.</math>  This variance is a real scalar.

===For vector-valued random variables===

====As a matrix====
If <math>X</math> is a [[vector space|vector]]-valued random variable, with values in <math>\mathbb{R}^n,</math> and thought of as a column vector, then a natural generalization of variance is <math>\operatorname{E}\left[(X - \mu) {(X - \mu)}^{\mathsf{T}}\right],</math> where <math>\mu = \operatorname{E}(X)</math> and <math>X^{\mathsf{T}}</math> is the transpose of {{mvar|X}}, and so is a row vector.  The result is a [[positive definite matrix|positive semi-definite square matrix]], commonly referred to as the [[variance-covariance matrix]] (or simply as the ''covariance matrix'').

If <math>X</math> is a vector- and complex-valued random variable, with values in <math>\mathbb{C}^n,</math> then the [[Covariance matrix#Complex random vectors|covariance matrix is]] <math>\operatorname{E}\left[(X - \mu){(X - \mu)}^\dagger\right],</math> where <math>X^\dagger</math> is the [[conjugate transpose]] of <math>X.</math>{{Citation needed|date=September 2016}}  This matrix is also positive semi-definite and square.

====As a scalar====
Another generalization of variance for vector-valued random variables <math>X</math>, which results in a scalar value rather than in a matrix, is the [[generalized variance]] <math>\det(C)</math>, the [[determinant]] of the covariance matrix. The generalized variance can be shown to be related to the multidimensional scatter of points around their mean.<ref>{{cite book |last1=Kocherlakota |first1=S. |title=Encyclopedia of Statistical Sciences |last2=Kocherlakota |first2=K. |chapter=Generalized Variance |publisher=Wiley Online Library |doi=10.1002/0471667196.ess0869 |year=2004 |isbn=0-471-66719-6 }}</ref>

A different generalization is obtained by considering the equation for the scalar variance, <math> \operatorname{Var}(X) = \operatorname{E}\left[(X - \mu)^2 \right] </math>, and reinterpreting <math>(X - \mu)^2</math> as the squared [[Euclidean distance]] between the random variable and its mean, or, simply as the scalar product of the vector <math>X - \mu</math> with itself. This results in <math>\operatorname{E}\left[(X - \mu)^{\mathsf{T}}(X - \mu)\right] = \operatorname{tr}(C),</math> which is the [[Trace (linear algebra)|trace]] of the covariance matrix.