Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Mean squared error
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Variance=== {{further|Sample variance}} The usual estimator for the variance is the ''corrected [[sample variance]]:'' :<math>S^2_{n-1} = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X} \right)^2 =\frac{1}{n-1}\left(\sum_{i=1}^n X_i^2-n\overline{X}^2\right).</math> This is unbiased (its expected value is <math>\sigma^2</math>), hence also called the ''unbiased sample variance,'' and its MSE is<ref>{{cite book | last = Mood |first=A. | last2 = Graybill |first2=F. |last3=Boes |first3=D. | title = Introduction to the Theory of Statistics | url = https://archive.org/details/introductiontoth00mood_706 | url-access = limited |page=[https://archive.org/details/introductiontoth00mood_706/page/n241 229] | edition = 3rd | publisher = McGraw-Hill | year = 1974}}</ref> :<math>\operatorname{MSE}(S^2_{n-1})= \frac{1}{n} \left(\mu_4-\frac{n-3}{n-1}\sigma^4\right) =\frac{1}{n} \left(\gamma_2+\frac{2n}{n-1}\right)\sigma^4,</math> where <math>\mu_4</math> is the fourth [[central moment]] of the distribution or population, and <math>\gamma_2=\mu_4/\sigma^4-3</math> is the [[excess kurtosis]]. However, one can use other estimators for <math>\sigma^2</math> which are proportional to <math>S^2_{n-1}</math>, and an appropriate choice can always give a lower mean squared error. If we define :<math>S^2_a = \frac{n-1}{a}S^2_{n-1}= \frac{1}{a}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2</math> then we calculate: :<math>\begin{align} \operatorname{MSE}(S^2_a) &=\operatorname{E}\left[\left(\frac{n-1}{a} S^2_{n-1}-\sigma^2\right)^2 \right] \\ &= \operatorname{E}\left[ \frac{(n-1)^2}{a^2} S^4_{n-1} -2 \left ( \frac{n-1}{a} S^2_{n-1} \right ) \sigma^2 + \sigma^4 \right ] \\ &= \frac{(n-1)^2}{a^2} \operatorname{E}\left[ S^4_{n-1} \right ] - 2 \left ( \frac{n-1}{a}\right ) \operatorname{E}\left[ S^2_{n-1} \right ] \sigma^2 + \sigma^4 \\ &= \frac{(n-1)^2}{a^2} \operatorname{E}\left[ S^4_{n-1} \right ] - 2 \left ( \frac{n-1}{a}\right ) \sigma^4 + \sigma^4 && \operatorname{E}\left[ S^2_{n-1} \right ] = \sigma^2 \\ &= \frac{(n-1)^2}{a^2} \left ( \frac{\gamma_2}{n} + \frac{n+1}{n-1} \right ) \sigma^4- 2 \left ( \frac{n-1}{a}\right ) \sigma^4+\sigma^4 && \operatorname{E}\left[ S^4_{n-1} \right ] = \operatorname{MSE}(S^2_{n-1}) + \sigma^4 \\ &=\frac{n-1}{n a^2} \left ((n-1)\gamma_2+n^2+n \right ) \sigma^4- 2 \left ( \frac{n-1}{a}\right ) \sigma^4+\sigma^4 \end{align}</math> This is minimized when :<math>a=\frac{(n-1)\gamma_2+n^2+n}{n} = n+1+\frac{n-1}{n}\gamma_2.</math> For a [[Gaussian distribution]], where <math>\gamma_2=0</math>, this means that the MSE is minimized when dividing the sum by <math>a=n+1</math>. The minimum excess kurtosis is <math>\gamma_2=-2</math>,{{efn|1=This can be proved by [[Jensen's inequality]] as follows. The fourth [[central moment]] is an upper bound for the square of variance, so that the least value for their ratio is one, therefore, the least value for the [[excess kurtosis]] is β2, achieved, for instance, by a Bernoulli with ''p''=1/2.}} which is achieved by a [[Bernoulli distribution]] with ''p'' = 1/2 (a coin flip), and the MSE is minimized for <math>a=n-1+\tfrac{2}{n}.</math> Hence regardless of the kurtosis, we get a "better" estimate (in the sense of having a lower MSE) by scaling down the unbiased estimator a little bit; this is a simple example of a [[shrinkage estimator]]: one "shrinks" the estimator towards zero (scales down the unbiased estimator). Further, while the corrected sample variance is the [[best unbiased estimator]] (minimum mean squared error among unbiased estimators) of variance for Gaussian distributions, if the distribution is not Gaussian, then even among unbiased estimators, the best unbiased estimator of the variance may not be <math>S^2_{n-1}.</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)