Editing Variance (section)

==Properties==

===Basic properties===
Variance is non-negative because the squares are positive or zero:
<math display="block">\operatorname{Var}(X)\ge 0.</math>

The variance of a constant is zero.
<math display="block">\operatorname{Var}(a) = 0.</math>

Conversely, if the variance of a random variable is 0, then it is [[almost surely]] a constant. That is, it always has the same value:
<math display="block">\operatorname{Var}(X)= 0 \iff \exists a : P(X=a) = 1.</math>

===Issues of finiteness===
If a distribution does not have a finite expected value, as is the case for the [[Cauchy distribution]], then the variance cannot be finite either. However, some distributions may not have a finite variance, despite their expected value being finite. An example is a [[Pareto distribution]] whose [[Pareto index|index]] <math>k</math> satisfies <math>1 < k \leq 2.</math>

===Decomposition===

The general formula for variance decomposition or the [[law of total variance]] is: If <math>X</math> and <math>Y</math> are two random variables, and the variance of <math>X</math> exists, then

<math display="block">\operatorname{Var}[X] = \operatorname{E}(\operatorname{Var}[X\mid Y]) + \operatorname{Var}(\operatorname{E}[X\mid Y]).</math>

The [[conditional expectation]] <math>\operatorname E(X\mid Y)</math> of <math>X</math> given <math>Y</math>, and the [[conditional variance]] <math>\operatorname{Var}(X\mid Y)</math> may be understood as follows. Given any particular value ''y'' of&nbsp;the random variable&nbsp;''Y'', there is a conditional expectation <math>\operatorname E(X\mid Y=y)</math>  given the event&nbsp;''Y''&nbsp;=&nbsp;''y''. This quantity depends on the particular value&nbsp;''y''; it is a function <math> g(y) = \operatorname E(X\mid Y=y)</math>. That same function evaluated at the random variable ''Y'' is the conditional expectation <math>\operatorname E(X\mid Y) = g(Y).</math>

In particular, if <math>Y</math> is a discrete random variable assuming possible values <math>y_1, y_2, y_3 \ldots</math> with corresponding probabilities <math>p_1, p_2, p_3 \ldots, </math>, then in the formula for total variance, the first term on the right-hand side becomes

<math display="block">\operatorname{E}(\operatorname{Var}[X \mid Y]) = \sum_i p_i \sigma^2_i,</math>

where <math>\sigma^2_i = \operatorname{Var}[X \mid Y = y_i]</math>. Similarly, the second term on the right-hand side becomes

<math display="block">\operatorname{Var}(\operatorname{E}[X \mid Y]) = \sum_i p_i \mu_i^2 - \left(\sum_i p_i \mu_i\right)^2 = \sum_i p_i \mu_i^2 - \mu^2,</math>

where <math>\mu_i = \operatorname{E}[X \mid Y = y_i]</math> and <math>\mu = \sum_i p_i \mu_i</math>. Thus the total variance is given by

<math display="block">\operatorname{Var}[X] = \sum_i p_i \sigma^2_i + \left( \sum_i p_i \mu_i^2 - \mu^2 \right).</math>

A similar formula is applied in [[analysis of variance]], where the corresponding formula is

<math display="block">\mathit{MS}_\text{total} = \mathit{MS}_\text{between} + \mathit{MS}_\text{within};</math>

here <math>\mathit{MS}</math> refers to the Mean of the Squares. In [[linear regression]] analysis the corresponding formula is

<math display="block">\mathit{MS}_\text{total} = \mathit{MS}_\text{regression} + \mathit{MS}_\text{residual}.</math>

This can also be derived from the additivity of variances, since the total (observed) score is the sum of the predicted score and the error score, where the latter two are uncorrelated.

Similar decompositions are possible for the sum of squared deviations (sum of squares, <math>\mathit{SS}</math>):
<math display="block">\mathit{SS}_\text{total} = \mathit{SS}_\text{between} + \mathit{SS}_\text{within},</math>
<math display="block">\mathit{SS}_\text{total} = \mathit{SS}_\text{regression} + \mathit{SS}_\text{residual}.</math>

===Calculation from the CDF===

The population variance for a non-negative random variable can be expressed in terms of the [[cumulative distribution function]] ''F'' using

<math display="block">2\int_0^\infty u(1 - F(u))\,du - {\left[\int_0^\infty (1 - F(u))\,du\right]}^2.</math>

This expression can be used to calculate the variance in situations where the CDF, but not the [[probability density function|density]], can be conveniently expressed.

===Characteristic property===
The second [[moment (mathematics)|moment]] of a random variable attains the minimum value when taken around the first moment (i.e., mean) of the random variable, i.e. <math>\mathrm{argmin}_m \, \mathrm{E}\left(\left(X - m\right)^2\right) = \mathrm{E}(X)</math>. Conversely, if a continuous function <math>\varphi</math> satisfies <math>\mathrm{argmin}_m\,\mathrm{E}(\varphi(X - m)) = \mathrm{E}(X)</math> for all random variables ''X'', then it is necessarily of the form <math>\varphi(x) = a x^2 + b</math>, where {{nowrap|''a'' > 0}}. This also holds in the multidimensional case.<ref>{{Cite journal | last1 = Kagan | first1 = A. | last2 = Shepp | first2 = L. A. | doi = 10.1016/S0167-7152(98)00041-8 | title = Why the variance? | journal = Statistics & Probability Letters | volume = 38 | issue = 4 | pages = 329–333 | year = 1998 }}</ref>

===Units of measurement===

Unlike the [[Average absolute deviation|expected absolute deviation]], the variance of a variable has units that are the square of the units of the variable itself.  For example, a variable measured in meters will have a variance measured in meters squared.  For this reason, describing data sets via their [[standard deviation]] or [[root mean square deviation]] is often preferred over using the variance.  In the dice example the standard deviation is {{math|{{sqrt|2.9}} ≈ 1.7}}, slightly larger than the expected absolute deviation of&nbsp;1.5.

The standard deviation and the expected absolute deviation can both be used as an indicator of the "spread" of a distribution.  The standard deviation is more amenable to algebraic manipulation than the expected absolute deviation, and, together with variance and its generalization [[covariance]], is used frequently in theoretical statistics; however the expected absolute deviation tends to be more [[Robust statistics|robust]] as it is less sensitive to [[outlier]]s arising from [[measurement error|measurement anomalies]] or an unduly [[heavy-tailed distribution]].