Editing Weighted arithmetic mean (section)

===Weighted sample variance===
{{see also|#Correcting for over- or under-dispersion}}

Typically when a mean is calculated it is important to know the [[variance]] and [[standard deviation]] about that mean. When a weighted mean <math>\mu^*</math> is used, the variance of the weighted sample is different from the variance of the unweighted sample.

The ''biased'' weighted [[sample variance]] <math>\hat \sigma^2_\mathrm{w}</math> is defined similarly to the normal ''biased'' sample variance <math>\hat \sigma^2</math>:

:<math>
\begin{align}
\hat \sigma^2\ &= \frac{\sum\limits_{i=1}^N \left(x_i - \mu\right)^2} N \\
\hat \sigma^2_\mathrm{w} &= \frac{\sum\limits_{i=1}^N w_i \left(x_i - \mu^{*}\right)^2 }{\sum_{i=1}^N w_i}
\end{align}
</math>
where <math>\sum_{i=1}^N w_i = 1</math> for normalized weights. If the weights are ''frequency weights'' (and thus are random variables), it can be shown{{Citation needed|date=March 2022}}  that <math>\hat \sigma^2_\mathrm{w}</math> is the maximum likelihood estimator of <math>\sigma^2</math> for [[Independent and identically distributed random variables|iid]] Gaussian observations.

For small samples, it is customary to use an [[unbiased estimator]] for the population variance. In normal unweighted samples, the ''N'' in the denominator (corresponding to the sample size) is changed to ''N''&nbsp;−&nbsp;1 (see [[Bessel's correction]]). In the weighted setting, there are actually two different unbiased estimators, one for the case of ''frequency weights'' and another for the case of ''reliability weights''.

====Frequency weights====
If the weights are ''frequency weights'' (where a weight equals the number of occurrences), then the unbiased estimator is:

:<math>
s^2\ = \frac {\sum\limits_{i=1}^N w_i \left(x_i - \mu^*\right)^2} {\sum_{i=1}^N w_i - 1}
</math>

This effectively applies Bessel's correction for frequency weights. For example, if values <math>\{2, 2, 4, 5, 5, 5\}</math> are drawn from the same distribution, then we can treat this set as an unweighted sample, or we can treat it as the weighted sample <math>\{2, 4, 5\}</math> with corresponding weights <math>\{2, 1, 3\}</math>, and we get the same result either way.

If the frequency weights <math>\{w_i\}</math> are normalized to 1, then the correct expression after Bessel's correction becomes

:<math>s^2\ = \frac {\sum_{i=1}^N w_i} {\sum_{i=1}^N w_i - 1}\sum_{i=1}^N w_i \left(x_i - \mu^*\right)^2</math>

where the total number of samples is <math>\sum_{i=1}^N w_i</math> (not <math>N</math>). In any case, the information on total number of samples is necessary in order to obtain an unbiased correction, even if <math>w_i</math> has a different meaning other than frequency weight.

The estimator can be unbiased only if the weights are not [[Standard score|standardized]] nor [[Normalization (statistics)|normalized]], these processes changing the data's mean and variance and thus leading to a [[Base rate fallacy|loss of the base rate]] (the population count, which is a requirement for Bessel's correction).

====Reliability weights====
If the weights are instead ''reliability weights'' (non-random values reflecting the sample's relative trustworthiness, often derived from sample variance), we can determine a correction factor to yield an unbiased estimator. Assuming each random variable is sampled from the same distribution with mean <math>\mu</math> and actual variance <math>\sigma_{\text{actual}}^2</math>, taking expectations we have,

:<math>
\begin{align}
\operatorname{E} [\hat \sigma^2]
&=  \frac{ \sum\limits_{i=1}^N \operatorname{E} [(x_i - \mu)^2]} N \\
&= \operatorname{E} [(X - \operatorname{E}[X])^2] - \frac{1}{N} \operatorname{E} [(X - \operatorname{E}[X])^2] \\
&= \left( \frac{N - 1} N \right) \sigma_{\text{actual}}^2 \\
\operatorname{E} [\hat \sigma^2_\mathrm{w}] &= \frac{\sum\limits_{i=1}^N w_i \operatorname{E} [(x_i - \mu^*)^2] }{V_1} \\
&= \operatorname{E}[(X - \operatorname{E}[X])^2] - \frac{V_2}{V_1^2} \operatorname{E}[(X - \operatorname{E}[X])^2] \\
&= \left(1 - \frac{V_2 }{ V_1^2}\right) \sigma_{\text{actual}}^2
\end{align}
</math>

where <math>V_1 = \sum_{i=1}^N w_i</math> and <math>V_2 = \sum_{i=1}^N w_i^2</math>. Therefore, the bias in our estimator is <math>\left(1 - \frac{V_2 }{ V_1^2}\right) </math>, analogous to the <math> \left( \frac{N - 1} {N} \right)</math> bias in the unweighted estimator (also notice that <math>\ V_1^2 / V_2 = N_{eff} </math> is the [[effective sample size#weighted samples|effective sample size]]). This means that to unbias our estimator we need to pre-divide by <math>1 - \left(V_2 / V_1^2\right) </math>, ensuring that the expected value of the estimated variance equals the actual variance of the sampling distribution. The final unbiased estimate of sample variance is:
:<math>
\begin{align}
s^2_{\mathrm{w}}\ &=  \frac{\hat \sigma^2_\mathrm{w}} {1 - (V_2 / V_1^2)} \\[4pt]
&= \frac {\sum\limits_{i=1}^N w_i (x_i - \mu^*)^2} {V_1 - (V_2 / V_1)},
\end{align}
</math><ref>{{cite web|url=https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html|title=GNU Scientific Library – Reference Manual: Weighted Samples|website=Gnu.org|access-date=22 December 2017}}</ref>
where <math>\operatorname{E}[s^2_{\mathrm{w}}] =  \sigma_{\text{actual}}^2</math>. The degrees of freedom of this weighted, unbiased sample variance vary accordingly from ''N''&nbsp;−&nbsp;1 down to&nbsp;0. The standard deviation is simply the square root of the variance above.

As a side note, other approaches have been described to compute the weighted sample variance.<ref>{{cite web |url=http://www.analyticalgroup.com/download/WEIGHTED_MEAN.pdf |title=Weighted Standard Error and its Impact on Significance Testing (WinCross vs. Quantum & SPSS), Dr. Albert Madansky| website=Analyticalgroup.com| access-date=22 December 2017}}</ref>