Editing Mean squared error (section)

===Estimator===

The MSE of an estimator <math>\hat{\theta}</math> with respect to an unknown parameter <math>\theta</math> is defined as<ref name=":1" />

:<math>\operatorname{MSE}(\hat{\theta})=\operatorname{E}_{\theta}\left[(\hat{\theta}-\theta)^2\right].</math>

This definition depends on the unknown parameter, therefore the MSE is a ''priori property'' of an estimator. The MSE could be a function of unknown parameters, in which case any ''estimator'' of the MSE based on estimates of these parameters would be a function of the data (and thus a random variable). If the estimator <math>\hat{\theta}</math> is derived as a sample statistic and is used to estimate some population parameter, then the expectation is with respect to the [[sampling distribution]] of the sample statistic. 

The MSE can be written as the sum of the [[variance]] of the estimator and the squared [[Bias_of_an_estimator|bias]] of the estimator, providing a useful way to calculate the MSE and implying that in the case of unbiased estimators, the MSE and variance are equivalent.<ref name="wackerly">{{cite book |first1=Dennis |last1=Wackerly |first2=William|last2=Mendenhall |first3=Richard L.|last3=Scheaffer |title=Mathematical Statistics with Applications |publisher=Thomson Higher Education|location=Belmont, CA, USA |year=2008 |edition=7 |isbn=978-0-495-38508-0}}</ref>

:<math>\operatorname{MSE}(\hat{\theta})=\operatorname{Var}_{\theta}(\hat{\theta})+ \operatorname{Bias}(\hat{\theta},\theta)^2.</math>

====Proof of variance and bias relationship====

<math>\begin{align}
\operatorname{MSE}(\hat{\theta})
&= \operatorname{E}_\theta \left [(\hat{\theta}-\theta)^2 \right ] \\
&= \operatorname{E}_\theta\left[\left(\hat{\theta}-\operatorname{E}_\theta [\hat\theta]+\operatorname{E}_\theta[\hat\theta]-\theta\right)^2\right]\\ 
&= \operatorname{E}_\theta\left[\left(\hat{\theta}-\operatorname{E}_\theta[\hat\theta]\right)^2 +2\left (\hat{\theta}-\operatorname{E}_\theta[\hat\theta] \right ) \left (\operatorname{E}_\theta[\hat\theta]-\theta \right )+\left( \operatorname{E}_\theta[\hat\theta]-\theta \right)^2\right] \\ 
&= \operatorname{E}_\theta\left[\left(\hat{\theta}-\operatorname{E}_\theta[\hat\theta]\right)^2\right]+\operatorname{E}_\theta\left[2 \left (\hat{\theta}-\operatorname{E}_\theta[\hat\theta] \right ) \left (\operatorname{E}_\theta[\hat\theta]-\theta \right ) \right] + \operatorname{E}_\theta\left [ \left(\operatorname{E}_\theta[\hat\theta]-\theta\right)^2 \right] \\
&= \operatorname{E}_\theta\left[\left(\hat{\theta}-\operatorname{E}_\theta[\hat\theta]\right)^2\right]+ 2 \left(\operatorname{E}_\theta[\hat\theta]-\theta\right) \operatorname{E}_\theta\left[\hat{\theta}-\operatorname{E}_\theta[\hat\theta] \right] +  \left(\operatorname{E}_\theta[\hat\theta]-\theta\right)^2 && \operatorname{E}_\theta[\hat\theta]-\theta = \text{constant} \\
&= \operatorname{E}_\theta\left[\left(\hat{\theta}-\operatorname{E}_\theta[\hat\theta]\right)^2\right]+ 2 \left(\operatorname{E}_\theta [\hat\theta]-\theta\right) \left ( \operatorname{E}_\theta[\hat{\theta}]-\operatorname{E}_\theta[\hat\theta] \right )+  \left(\operatorname{E}_\theta[\hat\theta]-\theta\right)^2 && \operatorname{E}_\theta[\hat\theta] = \text{constant} \\
&= \operatorname{E}_\theta\left[\left(\hat\theta-\operatorname{E}_\theta[\hat\theta]\right)^2\right]+\left(\operatorname{E}_\theta [\hat\theta]-\theta\right)^2\\ 
&= \operatorname{Var}_\theta(\hat\theta)+ \operatorname{Bias}_\theta(\hat\theta,\theta)^2
\end{align}</math>

An even shorter proof can be achieved using the well-known formula that for a random variable <math display="inline">X</math>, <math display="inline">\mathbb{E}(X^2) = \operatorname{Var}(X) + (\mathbb{E}(X))^2</math>. By substituting <math display="inline">X</math> with, <math display="inline">\hat\theta-\theta</math>, we have
:<math display="block">\begin{aligned}
\operatorname{MSE}(\hat{\theta}) &= \mathbb{E}[(\hat\theta-\theta)^2] \\
&= \operatorname{Var}(\hat{\theta} - \theta) + (\mathbb{E}[\hat\theta - \theta])^2 \\
&= \operatorname{Var}(\hat\theta) + \operatorname{Bias}^2(\hat\theta,\theta)
\end{aligned}</math>
But in real modeling case, MSE could be described as the addition of model variance, model bias, and irreducible uncertainty (see [[Bias–variance tradeoff]]). According to the relationship, the MSE of the estimators could be simply used for the [[Efficiency (statistics)|efficiency]] comparison, which includes the information of estimator variance and bias. This is called MSE criterion.