Editing Mean squared error (section)

===Variance===
{{further|Sample variance}}
The usual estimator for the variance is the ''corrected [[sample variance]]:''

:<math>S^2_{n-1} = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X} \right)^2 =\frac{1}{n-1}\left(\sum_{i=1}^n X_i^2-n\overline{X}^2\right).</math>

This is unbiased (its expected value is <math>\sigma^2</math>), hence also called the ''unbiased sample variance,'' and its MSE is<ref>{{cite book | last = Mood |first=A. | last2 = Graybill |first2=F. |last3=Boes |first3=D. | title = Introduction to the Theory of Statistics | url = https://archive.org/details/introductiontoth00mood_706 | url-access = limited |page=[https://archive.org/details/introductiontoth00mood_706/page/n241 229] | edition = 3rd | publisher = McGraw-Hill | year = 1974}}</ref>

:<math>\operatorname{MSE}(S^2_{n-1})= \frac{1}{n} \left(\mu_4-\frac{n-3}{n-1}\sigma^4\right) =\frac{1}{n} \left(\gamma_2+\frac{2n}{n-1}\right)\sigma^4,</math>

where <math>\mu_4</math> is the fourth [[central moment]] of the distribution or population, and <math>\gamma_2=\mu_4/\sigma^4-3</math> is the [[excess kurtosis]].

However, one can use other estimators for <math>\sigma^2</math> which are proportional to <math>S^2_{n-1}</math>, and an appropriate choice can always give a lower mean squared error. If we define

:<math>S^2_a = \frac{n-1}{a}S^2_{n-1}= \frac{1}{a}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2</math>

then we calculate:

:<math>\begin{align}
\operatorname{MSE}(S^2_a)
&=\operatorname{E}\left[\left(\frac{n-1}{a} S^2_{n-1}-\sigma^2\right)^2 \right] \\
&= \operatorname{E}\left[ \frac{(n-1)^2}{a^2} S^4_{n-1} -2 \left ( \frac{n-1}{a} S^2_{n-1} \right ) \sigma^2 + \sigma^4 \right ] \\
&= \frac{(n-1)^2}{a^2} \operatorname{E}\left[ S^4_{n-1} \right ] - 2 \left ( \frac{n-1}{a}\right )  \operatorname{E}\left[ S^2_{n-1} \right ] \sigma^2 + \sigma^4 \\
&= \frac{(n-1)^2}{a^2} \operatorname{E}\left[ S^4_{n-1} \right ] - 2 \left ( \frac{n-1}{a}\right )  \sigma^4 + \sigma^4 && \operatorname{E}\left[ S^2_{n-1} \right ]  = \sigma^2 \\
&= \frac{(n-1)^2}{a^2} \left ( \frac{\gamma_2}{n} + \frac{n+1}{n-1} \right ) \sigma^4- 2 \left ( \frac{n-1}{a}\right )  \sigma^4+\sigma^4 &&  \operatorname{E}\left[ S^4_{n-1} \right ] = \operatorname{MSE}(S^2_{n-1}) + \sigma^4 \\
&=\frac{n-1}{n a^2} \left ((n-1)\gamma_2+n^2+n \right ) \sigma^4- 2 \left ( \frac{n-1}{a}\right )  \sigma^4+\sigma^4   
\end{align}</math>

This is minimized when

:<math>a=\frac{(n-1)\gamma_2+n^2+n}{n} = n+1+\frac{n-1}{n}\gamma_2.</math>

For a [[Gaussian distribution]], where <math>\gamma_2=0</math>, this means that the MSE is minimized when dividing the sum by <math>a=n+1</math>. The minimum excess kurtosis is <math>\gamma_2=-2</math>,{{efn|1=This can be proved by [[Jensen's inequality]] as follows. The fourth [[central moment]] is an upper bound for the square of variance, so that the least value for their ratio is one, therefore, the least value for the [[excess kurtosis]] is −2, achieved, for instance, by a Bernoulli with ''p''=1/2.}} which is achieved by a [[Bernoulli distribution]] with ''p''&nbsp;=&nbsp;1/2 (a coin flip), and the MSE is minimized for <math>a=n-1+\tfrac{2}{n}.</math> Hence regardless of the kurtosis, we get a "better" estimate (in the sense of having a lower MSE) by scaling down the unbiased estimator a little bit; this is a simple example of a [[shrinkage estimator]]: one "shrinks" the estimator towards zero (scales down the unbiased estimator).

Further, while the corrected sample variance is the [[best unbiased estimator]] (minimum mean squared error among unbiased estimators) of variance for Gaussian distributions, if the distribution is not Gaussian, then even among unbiased estimators, the best unbiased estimator of the variance may not be <math>S^2_{n-1}.</math>