Editing Standard deviation (section)

==Estimation==
{{anchor|Estimation}}{{anchor|Sample standard deviation}}
{{see also|Sample variance}}
{{main|Unbiased estimation of standard deviation}}

One can find the standard deviation of an entire population in cases (such as [[Standardized testing (statistics)|standardized testing]]) where every member of a population is sampled. In cases where that cannot be done, the standard deviation ''σ'' is estimated by examining a random sample taken from the population and computing a [[statistic]] of the sample, which is used as an estimate of the population standard deviation. Such a statistic is called an [[estimator]], and the estimator (or the value of the estimator, namely the estimate) is called a sample standard deviation, and is denoted by ''s'' (possibly with modifiers).

Unlike in the case of estimating the population mean of a normal distribution, for which the [[sample mean]] is a simple estimator with many desirable properties ([[unbiased estimator|unbiased]], [[Efficient estimator|efficient]], maximum likelihood), there is no single estimator for the standard deviation with all these properties, and [[unbiased estimation of standard deviation]] is a very technically involved problem. Most often, the standard deviation is estimated using the ''[[#Corrected sample standard deviation|corrected sample standard deviation]]'' (using ''N''&nbsp;−&nbsp;1), defined below, and this is often referred to as the "sample standard deviation", without qualifiers. However, other estimators are better in other respects: the uncorrected estimator (using ''N'') yields lower mean squared error, while using ''N''&nbsp;−&nbsp;1.5 (for the normal distribution) almost completely eliminates bias.

===Uncorrected sample standard deviation===
The formula for the ''population'' standard deviation (of a finite population) can be applied to the sample, using the size of the sample as the size of the population (though the actual population size from which the sample is drawn may be much larger). This estimator, denoted by ''s''<sub>''N''</sub>, is known as the ''uncorrected sample standard deviation'', or sometimes the ''standard deviation of the sample'' (considered as the entire population), and is defined as follows:<ref name=":1">{{Cite web| last=Weisstein |first=Eric W.|title=Standard Deviation |url=https://mathworld.wolfram.com/StandardDeviation.html|access-date=21 August 2020 |website=mathworld.wolfram.com |language=en}}</ref>
<math display="block">s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2},</math>

where <math>\{x_1, \, x_2, \, \ldots, \, x_N\}</math> are the observed values of the sample items, and <math>\bar{x}</math> is the mean value of these observations, while the denominator&nbsp;''N'' stands for the size of the sample: this is the square root of the sample variance, which is the average of the [[squared deviations]] about the sample mean.

This is a [[consistent estimator]] (it converges in probability to the population value as the number of samples goes to infinity), and is the [[maximum likelihood|maximum-likelihood estimate]] when the population is normally distributed.<ref>{{Cite web |title=Consistent estimator |url=https://www.statlect.com/glossary/consistent-estimator |access-date=2022-10-10 |website=www.statlect.com}}</ref> However, this is a [[biased estimator]], as the estimates are generally too low. The bias decreases as sample size grows, dropping off as 1/''N'', and thus is most significant for small or moderate sample sizes; for <math>N > 75</math> the bias is below 1%. Thus for very large sample sizes, the uncorrected sample standard deviation is generally acceptable. This estimator also has a uniformly smaller [[mean squared error]] than the corrected sample standard deviation.

===Corrected sample standard deviation===

If the ''biased [[sample variance]]'' (the second [[central moment]] of the sample, which is a downward-biased estimate of the population variance) is used to compute an estimate of the population's standard deviation, the result is
<math display="block">s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}.</math>

Here taking the square root introduces further downward bias, by [[Jensen's inequality]], due to the square root's being a [[concave function]]. The bias in the variance is easily corrected, but the bias from the square root is more difficult to correct, and depends on the distribution in question.

An unbiased estimator for the ''variance'' is given by applying [[Bessel's correction]], using ''N''&nbsp;−&nbsp;1 instead of ''N'' to yield the ''unbiased sample variance,'' denoted ''s''<sup>2</sup>:
<math display="block">s^2 = \frac{1}{N - 1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2.</math>

This estimator is unbiased if the variance exists and the sample values are drawn independently with replacement. ''N''&nbsp;−&nbsp;1 corresponds to the number of [[Degrees of freedom (statistics)|degrees of freedom]] in the vector of deviations from the mean, <math>\textstyle(x_1 - \bar{x},\; \dots,\; x_n - \bar{x}).</math>

Taking square roots reintroduces bias (because the square root is a nonlinear function which does not [[Commutative property|commute]] with the expectation, i.e. often <math display="inline">E[\sqrt{X}]\neq \sqrt{E[X]}</math>), yielding the ''corrected sample standard deviation,'' denoted by ''s:''
<math display="block">s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}.</math>

As explained above, while ''s''<sup>2</sup> is an unbiased estimator for the population variance, ''s'' is still a biased estimator for the population standard deviation, though markedly less biased than the uncorrected sample standard deviation. This estimator is commonly used and generally known simply as the "sample standard deviation". The bias may still be large for small samples (''N'' less than 10). As sample size increases, the amount of bias decreases. We obtain more information and the difference between <math>\frac{1}{N}</math> and <math>\frac{1}{N-1}</math> becomes smaller.

===Unbiased sample standard deviation===
For [[unbiased estimation of standard deviation]], there is no formula that works across all distributions, unlike for mean and variance. Instead, {{mvar|s}} is used as a basis, and is scaled by a correction factor to produce an unbiased estimate. For the normal distribution, an unbiased estimator is given by {{math|{{sfrac|{{var|s}}|{{var|c}}{{sub|4}}}}}}, where the correction factor (which depends on {{mvar|N}}) is given in terms of the [[Gamma function]], and equals:
<math display="block">c_4(N)\,=\,\sqrt{\frac{2}{N-1}}\,\,\,\frac{\Gamma\left(\frac{N}{2}\right)}{\Gamma\left(\frac{N-1}{2}\right)}.</math>

This arises because the sampling distribution of the sample standard deviation follows a (scaled) [[chi distribution]], and the correction factor is the mean of the chi distribution.

An approximation can be given by replacing {{math|{{var|N}}&nbsp;−&nbsp;1}} with {{math|{{var|N}}&nbsp;−&nbsp;1.5}}, yielding:
<math display="block">\hat\sigma = \sqrt{\frac{1}{N - 1.5} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2},</math>

The error in this approximation decays quadratically (as {{math|{{sfrac|1|{{var|N}}{{sup|2}}}}}}), and it is suited for all but the smallest samples or highest precision: for {{math|1={{var|N}} = 3}} the bias is equal to 1.3%, and for {{math|1={{var|N}} = 9}} the bias is already less than 0.1%.

A more accurate approximation is to replace {{math|{{var|N}} − 1.5}} above with {{math|{{var|N}} − 1.5 + {{sfrac|1|8({{var|N}} − 1)}}}}.<ref>{{Citation|first1=John |last1=Gurland |first2=Ram C. |last2=Tripathi|title=A Simple Approximation for Unbiased Estimation of the Standard Deviation|journal=The American Statistician|volume=25|issue=4|year=1971|pages=30–32|doi=10.2307/2682923|jstor=2682923 }}</ref>

For other distributions, the correct formula depends on the distribution, but a rule of thumb is to use the further refinement of the approximation:
<math display="block">\hat\sigma = \sqrt{\frac{1}{N - 1.5 - \frac{1}{4}\gamma_2} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2},</math>

where {{math|{{var|γ}}{{sub|2}}}} denotes the population [[excess kurtosis]]. The excess kurtosis may be either known beforehand for certain distributions, or estimated from the data.<ref>{{Cite web|date=2021-07-11|title=Standard Deviation Calculator|url=https://purecalculators.com/standard-deviation-calculator|access-date=2021-09-14|website=PureCalculators|language=en}}</ref>

===Confidence interval of a sampled standard deviation===
{{see also|Margin of error|Variance#Distribution of the sample variance|Student's t-distribution#Robust parametric modeling}}
The standard deviation we obtain by sampling a distribution is itself not absolutely accurate, both for mathematical reasons (explained here by the confidence interval) and for practical reasons of measurement (measurement error). The mathematical effect can be described by the [[confidence interval]] or CI. 

To show how a larger sample will make the confidence interval narrower, consider the following examples: 
A small population of {{math|{{var|N}} {{=}} 2}} has only one degree of freedom for estimating the standard deviation. The result is that a 95% CI of the SD runs from 0.45&nbsp;×&nbsp;SD to 31.9&nbsp;×&nbsp;SD; [[Confidence interval#Statistical theory|the factors here are as follows]]:

<math display="block">\Pr\left(q_\frac{\alpha}{2} < k \frac{s^2}{\sigma^2} < q_{1 - \frac{\alpha}{2}}\right) = 1 - \alpha,</math>

where <math>q_p</math> is the {{mvar|p}}-th quantile of the chi-square distribution with {{mvar|k}} degrees of freedom, and {{math|1 − {{var|α}}}} is the confidence level.  This is equivalent to the following:

<math display="block">\Pr\left(k\frac{s^2}{q_{1 - \frac{\alpha}{2}}} < \sigma^2 < k\frac{s^2}{q_{\frac{\alpha}{2}}}\right) = 1 - \alpha.</math>

With {{math|{{var|k}} {{=}} 1}}, {{math|{{var|q}}{{sub|0.025}} {{=}} 0.000982}} and {{math|{{var|q}}{{sub|0.975}} {{=}} 5.024}}. The reciprocals of the square roots of these two numbers give us the factors 0.45 and 31.9 given above.

A larger population of {{math|{{var|N}} {{=}} 10}} has 9 degrees of freedom for estimating the standard deviation. The same computations as above give us in this case a 95% CI running from 0.69&nbsp;×&nbsp;SD to 1.83&nbsp;×&nbsp;SD. So even with a sample population of 10, the actual SD can still be almost a factor 2 higher than the sampled SD. For a sample population {{math|{{var|N}} {{=}} 100}}, this is down to 0.88&nbsp;×&nbsp;SD to 1.16&nbsp;×&nbsp;SD. To be more certain that the sampled SD is close to the actual SD we need to sample a large number of points.

These same formulae can be used to obtain confidence intervals on the variance of residuals from a [[least squares]] fit under standard normal theory, where {{mvar|k}} is now the number of [[Degrees of freedom (statistics)|degrees of freedom]] for error.

===Bounds on standard deviation===
For a set of {{math|{{var|N}} > 4}} data spanning a range of values {{mvar|R}}, an upper bound on the standard deviation {{mvar|s}} is given by {{math|{{var|s}} {{=}} 0.6{{var|R}}}}.<ref>{{Cite journal | doi=10.1111/j.1467-9639.1980.tb00398.x| title=Upper and Lower Bounds for the Sample Standard Deviation| journal=Teaching Statistics| volume=2| issue=3| pages=84–86| year=1980| last1=Shiffler| first1=Ronald E.| last2=Harsha| first2=Phillip D.}}</ref> 
An estimate of the standard deviation for {{math|{{var|N}} > 100}} data taken to be approximately normal follows from the heuristic that 95% of the area under the normal curve lies roughly two standard deviations to either side of the mean, so that, with 95% probability the total range of values {{mvar|R}} represents four standard deviations so that {{math|{{var|s}} ≈ {{var|R}}/4}}. This so-called range rule is useful in [[sample size]] estimation, as the range of possible values is easier to estimate than the standard deviation. Other divisors {{math|{{var|K}}({{var|N}})}} of the range such that {{math|{{var|s}} ≈ {{var|R}}/{{var|K}}({{var|N}})}} are available for other values of {{mvar|N}} and for non-normal distributions.<ref>{{Cite journal |jstor = 2685690|title = Using the Sample Range as a Basis for Calculating Sample Size in Power Calculations|journal = The American Statistician|volume = 55|issue = 4|pages = 293–298|last1 = Browne|first1 = Richard H.|year = 2001|doi = 10.1198/000313001753272420|s2cid = 122328846}}</ref>