Editing Normal distribution (section)

=== Estimation of parameters ===
{{See also|Maximum likelihood#Continuous distribution, continuous parameter space|Gaussian function#Estimation of parameters}}

It is often the case that we do not know the parameters of the normal distribution, but instead want to [[Estimation theory|estimate]] them. That is, having a sample <math display=inline>(x_1, \ldots, x_n)</math> from a normal <math display=inline>\mathcal{N}(\mu, \sigma^2)</math> population we would like to learn the approximate values of parameters {{tmath|\mu}} and <math display=inline>\sigma^2</math>. The standard approach to this problem is the [[maximum likelihood]] method, which requires maximization of the ''[[log-likelihood function]]'':{{anchor|Log-likelihood}}
<math display=block>
   \ln\mathcal{L}(\mu,\sigma^2)
     = \sum_{i=1}^n \ln f(x_i\mid\mu,\sigma^2)
     = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i-\mu)^2.
</math>
Taking derivatives with respect to {{tmath|\mu}} and <math display=inline>\sigma^2</math> and solving the resulting system of first order conditions yields the ''maximum likelihood estimates'':
<math display=block>
    \hat{\mu} = \overline{x} \equiv \frac{1}{n}\sum_{i=1}^n x_i, \qquad
    \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \overline{x})^2.
</math>

Then <math display=inline>\ln\mathcal{L}(\hat{\mu},\hat{\sigma}^2)</math> is as follows:

<math display=block>\ln\mathcal{L}(\hat{\mu},\hat{\sigma}^2)
     = (-n/2) [\ln(2 \pi \hat{\sigma}^2)+1]</math>

==== Sample mean ====
{{See also|Standard error of the mean}}

Estimator <math style="vertical-align:-.3em">\textstyle\hat\mu</math> is called the ''[[sample mean]]'', since it is the arithmetic mean of all observations. The statistic <math style="vertical-align:0">\textstyle\overline{x}</math> is [[complete statistic|complete]] and [[sufficient statistic|sufficient]] for {{tmath|\mu}}, and therefore by the [[Lehmann–Scheffé theorem]], <math style="vertical-align:-.3em">\textstyle\hat\mu</math> is the [[uniformly minimum variance unbiased]] (UMVU) estimator.<ref name="Krishnamoorthy">{{harvtxt |Krishnamoorthy |2006 |p=127 }}</ref> In finite samples it is distributed normally:
<math display=block>
    \hat\mu \sim \mathcal{N}(\mu,\sigma^2/n).
</math>
The variance of this estimator is equal to the ''μμ''-element of the inverse [[Fisher information matrix]] <math style="vertical-align:0">\textstyle\mathcal{I}^{-1}</math>. This implies that the estimator is [[efficient estimator|finite-sample efficient]]. Of practical importance is the fact that the [[standard error]] of <math style="vertical-align:-.3em">\textstyle\hat\mu</math> is proportional to <math style="vertical-align:-.3em">\textstyle1/\sqrt{n}</math>, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion polls and the number of trials in [[Monte Carlo simulation]]s.

From the standpoint of the [[asymptotic theory (statistics)|asymptotic theory]], <math style="vertical-align:-.3em">\textstyle\hat\mu</math> is [[consistent estimator|consistent]], that is, it [[converges in probability]] to {{tmath|\mu}} as <math display=inline>n\rightarrow\infty</math>. The estimator is also [[asymptotic normality|asymptotically normal]], which is a simple corollary of the fact that it is normal in finite samples:
<math display=block>
    \sqrt{n}(\hat\mu-\mu) \,\xrightarrow{d}\, \mathcal{N}(0,\sigma^2).
</math>

==== Sample variance ====
{{See also|Standard deviation#Estimation|Variance#Estimation}}

The estimator <math style="vertical-align:0">\textstyle\hat\sigma^2</math> is called the ''[[sample variance]]'', since it is the variance of the sample (<math display=inline>(x_1, \ldots, x_n)</math>). In practice, another estimator is often used instead of the <math style="vertical-align:0">\textstyle\hat\sigma^2</math>. This other estimator is denoted  <math display=inline>s^2</math>, and is also called the ''sample variance'', which represents a certain ambiguity in terminology; its square root {{tmath|s}} is called the ''sample standard deviation''. The estimator <math display=inline>s^2</math> differs from <math style="vertical-align:0">\textstyle\hat\sigma^2</math> by having {{math|(''n'' − 1)}} instead of&nbsp;''n'' in the denominator (the so-called [[Bessel's correction]]):
<math display=block>
    s^2 = \frac{n}{n-1} \hat\sigma^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2.
</math>
The difference between <math display=inline>s^2</math> and <math style="vertical-align:0">\textstyle\hat\sigma^2</math> becomes negligibly small for large ''n''{{'}}s. In finite samples however, the motivation behind the use of <math display=inline>s^2</math> is that it is an [[unbiased estimator]] of the underlying parameter <math display=inline>\sigma^2</math>, whereas <math style="vertical-align:0">\textstyle\hat\sigma^2</math> is biased. Also, by the Lehmann–Scheffé theorem the estimator <math display=inline>s^2</math> is uniformly minimum variance unbiased ([[UMVU]]),<ref name="Krishnamoorthy" /> which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator <math style="vertical-align:0">\textstyle\hat\sigma^2</math> is better than the <math display=inline>s^2</math> in terms of the [[mean squared error]] (MSE) criterion. In finite samples both <math display=inline>s^2</math> and <math style="vertical-align:0">\textstyle\hat\sigma^2</math> have scaled [[chi-squared distribution]] with {{math|(''n'' − 1)}} degrees of freedom:
<math display=block>
    s^2 \sim \frac{\sigma^2}{n-1} \cdot \chi^2_{n-1}, \qquad
    \hat\sigma^2 \sim \frac{\sigma^2}{n} \cdot \chi^2_{n-1}.
</math>
The first of these expressions shows that the variance of <math display=inline>s^2</math> is equal to <math display=inline>2\sigma^4/(n-1)</math>, which is slightly greater than the ''σσ''-element of the inverse Fisher information matrix <math style="vertical-align:0">\textstyle\mathcal{I}^{-1}</math>, which is <math display=inline>2\sigma^4/n</math>. Thus, <math display=inline>s^2</math> is not an efficient estimator for <math display=inline>\sigma^2</math>, and moreover, since <math display=inline>s^2</math> is UMVU, we can conclude that the finite-sample efficient estimator for <math display=inline>\sigma^2</math> does not exist.

Applying the asymptotic theory, both estimators <math display=inline>s^2</math> and <math style="vertical-align:0">\textstyle\hat\sigma^2</math> are consistent, that is they converge in probability to <math display=inline>\sigma^2</math> as the sample size <math display=inline>n\rightarrow\infty</math>. The two estimators are also both asymptotically normal:
<math display=block>
    \sqrt{n}(\hat\sigma^2 - \sigma^2) \simeq
    \sqrt{n}(s^2-\sigma^2) \,\xrightarrow{d}\, \mathcal{N}(0,2\sigma^4).
</math>
In particular, both estimators are asymptotically efficient for  <math display=inline>\sigma^2</math>.