Editing Estimator (section)

==Quantified properties==
The following definitions and attributes are relevant.<ref>Jaynes (2007), p.172.</ref>

===Error===
For a given sample <math> x </math>, the "[[Errors and residuals in statistics|error]]" of the estimator <math>\widehat{\theta}</math> is defined as
:<math>e(x)=\widehat{\theta}(x) - \theta,</math>
where <math>\theta </math> is the parameter being estimated. The error, ''e'', depends not only on the estimator (the estimation formula or procedure), but also on the sample.

===Mean squared error===
The [[mean squared error]] of <math>\widehat{\theta}</math> is defined as the expected value (probability-weighted average, over all samples) of the squared errors; that is,
:<math>\operatorname{MSE}(\widehat{\theta}) = \operatorname{E}[(\widehat{\theta}(X) - \theta)^2].</math>
It is used to indicate how far, on average, the collection of estimates are from the single parameter being estimated. Consider the following analogy. Suppose the parameter is the [[Bullseye (target)|bull's-eye of a target]], the estimator is the process of shooting arrows at the target, and the individual arrows are estimates (samples). Then high MSE means the average distance of the arrows from the bull's eye is high, and low MSE means the average distance from the bull's eye is low. The arrows may or may not be clustered. For example, even if all arrows hit the same point, yet grossly miss the target, the MSE is still relatively large. However, if the MSE is relatively low then the arrows are likely more highly clustered (than highly dispersed) around the target.

===Sampling deviation===
For a given sample <math> x </math>, the ''sampling deviation'' of the estimator <math>\widehat{\theta}</math> is defined as
:<math>d(x) =\widehat{\theta}(x) - \operatorname{E}( \widehat{\theta}(X) ) =\widehat{\theta}(x) - \operatorname{E}( \widehat{\theta} ),</math>
where <math> \operatorname{E}( \widehat{\theta}(X) ) </math> is the [[expected value]] of the estimator. The sampling deviation, ''d'', depends not only on the estimator, but also on the sample.

===Variance===
The [[variance]] of <math>\widehat{\theta}</math> is the expected value of the squared sampling deviations; that is, <math>\operatorname{Var}(\widehat{\theta}) = \operatorname{E}[(\widehat{\theta} - \operatorname{E}[\widehat{\theta}]) ^2]</math>. It is used to indicate how far, on average, the collection of estimates are from the ''expected value'' of the estimates. (Note the difference between MSE and variance.) If the parameter is the bull's-eye of a target, and the arrows are estimates, then a relatively high variance means the arrows are dispersed, and a relatively low variance means the arrows are clustered. Even if the variance is low, the cluster of arrows may still be far off-target, and even if the variance is high, the diffuse collection of arrows may still be unbiased. Finally, even if all arrows grossly miss the target, if they nevertheless all hit the same point, the variance is zero.

===Bias===
The [[bias of an estimator|bias]] of <math>\widehat{\theta}</math> is defined as <math>B(\widehat{\theta}) = \operatorname{E}(\widehat{\theta}) - \theta</math>. It is the distance between the average of the collection of estimates, and the single parameter being estimated. The bias of <math>\widehat{\theta}</math> is a function of the true value of <math>\theta</math> so saying that the bias of <math>\widehat{\theta}</math> is <math>b</math> means that for every <math>\theta</math> the bias of <math>\widehat{\theta}</math> is <math>b</math>.

There are two kinds of estimators: biased estimators and unbiased estimators. Whether an estimator is biased or not can be identified by the relationship between <math>\operatorname{E}(\widehat{\theta}) - \theta</math> and 0:
* If <math>\operatorname{E}(\widehat{\theta}) - \theta\neq0</math>, <math>\widehat{\theta}</math> is biased.
* If <math>\operatorname{E}(\widehat{\theta}) - \theta=0</math>, <math>\widehat{\theta}</math> is unbiased.

The bias is also the expected value of the error, since <math> \operatorname{E}(\widehat{\theta}) - \theta = \operatorname{E}(\widehat{\theta} - \theta ) </math>. If the parameter is the bull's eye of a target and the arrows are estimates, then a relatively high absolute value for the bias means the average position of the arrows is off-target, and a relatively low absolute bias means the average position of the arrows is on target. They may be dispersed, or may be clustered. The relationship between bias and variance is analogous to the relationship between [[accuracy and precision]].

The estimator <math>\widehat{\theta}</math> is an [[estimator bias|unbiased estimator]] of <math>\theta</math> [[if and only if]] <math>B(\widehat{\theta}) = 0</math>. Bias is a property of the estimator, not of the estimate. Often, people refer to a "biased estimate" or an "unbiased estimate", but they really are talking about an "estimate from a biased estimator", or an "estimate from an unbiased estimator". Also, people often confuse the "error" of a single estimate with the "bias" of an estimator. That the error for one estimate is large, does not mean the estimator is biased. In fact, even if all estimates have astronomical absolute values for their errors, if the expected value of the error is zero, the estimator is unbiased. Also, an estimator's being biased does not preclude the error of an estimate from being zero in a particular instance. The ideal situation is to have an unbiased estimator with low variance, and also try to limit the number of samples where the error is extreme (that is, to have few [[Outlier|outliers]]). Yet unbiasedness is not essential. Often, if just a little bias is permitted, then an estimator can be found with lower mean squared error and/or fewer outlier sample estimates.

An alternative to the version of "unbiased" above, is "median-unbiased", where the [[median]] of the distribution of estimates agrees with the true value; thus, in the long run half the estimates will be too low and half too high. While this applies immediately only to scalar-valued estimators, it can be extended to any measure of [[central tendency]] of a distribution: see [[Bias of an estimator#Median-unbiased estimators, and bias with respect to other loss functions|median-unbiased estimators]].

In a practical problem, <math>\widehat{\theta}</math> can always have functional relationship with <math>\theta</math>. For example, if a genetic theory states there is a type of leaf (starchy green) that occurs with probability <math>p_1=1/4\cdot(\theta + 2)</math>, with <math>0<\theta<1</math>.
Then, for <math>n</math> leaves, the random variable <math>N_1</math>, or the number of starchy green leaves, can be modeled with a <math>Bin(n,p_1)</math> distribution. The number can be used to express the following estimator for <math>\theta</math>: <math>\widehat{\theta}=4/n\cdot N_1-2</math>. One can show that <math>\widehat{\theta}</math> is an unbiased estimator for <math>\theta</math>:
<math>E[\widehat{\theta}]=E[4/n\cdot N_1-2]</math>
<math>=4/n\cdot E[N_1]-2</math>
<math>=4/n\cdot np_1-2</math>
<math>=4\cdot p_1-2</math>
<math>=4\cdot1/4\cdot(\theta+2)-2</math>
<math>=\theta+2-2</math>
<math>=\theta</math>.

===Unbiased===
[[File:Wiki Snipet Unbiased.png|right|thumb|upright|Difference between estimators: an unbiased estimator <math>\theta_2</math> is centered around <math>\theta</math> vs. a biased estimator <math>\theta_1</math>.]]

A desired property for estimators is the unbiased trait where an estimator is shown to have no systematic tendency to produce estimates larger or smaller than the true parameter. Additionally, unbiased estimators with smaller variances are preferred over larger variances because it will be closer to the "true" value of the parameter. The unbiased estimator with the smallest variance is known as the [[minimum-variance unbiased estimator]] (MVUE).

To find if your estimator is unbiased it is easy to follow along the equation <math>\operatorname E(\widehat{\theta}) - \theta=0</math>, <math>\widehat{\theta}</math>. With estimator ''T'' with and parameter of interest <math>\theta</math> solving the previous equation so it is shown as <math>\operatorname E[T] = \theta</math> the estimator is unbiased. Looking at the figure to the right despite <math>\hat{\theta_2}</math> being the only unbiased estimator, if the distributions overlapped and were both centered around <math>\theta</math> then distribution <math>\hat{\theta_1}</math> would actually be the preferred unbiased estimator.

'''Expectation'''
When looking at quantities in the interest of expectation for the model distribution there is an unbiased estimator which should satisfy the two equations below.
:<math>1. \quad \overline X_n = \frac{X_1 + X_2+ \cdots + X_n} n</math>
:<math>2. \quad \operatorname E\left[\overline X_n \right] = \mu</math>
'''Variance'''
Similarly, when looking at quantities in the interest of variance as the model distribution there is also an unbiased estimator that should satisfy the two equations below.
:<math>1. \quad S^2_n = \frac{1}{n-1}\sum_{i = 1}^n (X_i - \bar{X_n})^2</math>
:<math>  2. \quad \operatorname E\left[S^2_n\right] = \sigma^2</math>
Note we are dividing by ''n''&nbsp;−&nbsp;1 because if we divided with ''n'' we would obtain an estimator with a negative bias which would thus produce estimates that are too small for <math>\sigma^2</math>. It should also be mentioned that even though <math>S^2_n</math> is unbiased for <math>\sigma^2</math> the reverse is not true.<ref name=Dekker2005>{{Cite book|last1=Dekking|first1=Frederik Michel|last2=Kraaikamp|first2=Cornelis|last3=Lopuhaä|first3=Hendrik Paul|last4=Meester|first4=Ludolf Erwin|date=2005|title=A Modern Introduction to Probability and Statistics|url=https://archive.org/details/modernintroducti0000unse_h6a1 | series=Springer Texts in Statistics|language=en-gb|isbn=978-1-85233-896-1}}</ref>

===Relationships among the quantities===
*The mean squared error, variance, and bias, are related: <math>\operatorname{MSE}(\widehat{\theta}) = \operatorname{Var}(\widehat\theta) + (B(\widehat{\theta}))^2,</math> i.e. mean squared error = variance + square of bias. In particular, for an unbiased estimator, the variance equals the mean squared error.
*The [[standard deviation]] of an estimator <math>\widehat{\theta}</math> of <math>\theta</math> (the [[square root]] of the variance), or an estimate of the standard deviation of an estimator <math>\widehat{\theta}</math> of <math>\theta</math>, is called the ''[[Standard error (statistics)|standard error]]'' of <math>\widehat{\theta}</math>.
*The bias-variance tradeoff will be used in model complexity, over-fitting and under-fitting. It is mainly used in the field of [[supervised learning]] and [[predictive modelling]] to diagnose the performance of algorithms.