Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Weighted arithmetic mean
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Weighted sample variance=== {{see also|#Correcting for over- or under-dispersion}} Typically when a mean is calculated it is important to know the [[variance]] and [[standard deviation]] about that mean. When a weighted mean <math>\mu^*</math> is used, the variance of the weighted sample is different from the variance of the unweighted sample. The ''biased'' weighted [[sample variance]] <math>\hat \sigma^2_\mathrm{w}</math> is defined similarly to the normal ''biased'' sample variance <math>\hat \sigma^2</math>: :<math> \begin{align} \hat \sigma^2\ &= \frac{\sum\limits_{i=1}^N \left(x_i - \mu\right)^2} N \\ \hat \sigma^2_\mathrm{w} &= \frac{\sum\limits_{i=1}^N w_i \left(x_i - \mu^{*}\right)^2 }{\sum_{i=1}^N w_i} \end{align} </math> where <math>\sum_{i=1}^N w_i = 1</math> for normalized weights. If the weights are ''frequency weights'' (and thus are random variables), it can be shown{{Citation needed|date=March 2022}} that <math>\hat \sigma^2_\mathrm{w}</math> is the maximum likelihood estimator of <math>\sigma^2</math> for [[Independent and identically distributed random variables|iid]] Gaussian observations. For small samples, it is customary to use an [[unbiased estimator]] for the population variance. In normal unweighted samples, the ''N'' in the denominator (corresponding to the sample size) is changed to ''N'' β 1 (see [[Bessel's correction]]). In the weighted setting, there are actually two different unbiased estimators, one for the case of ''frequency weights'' and another for the case of ''reliability weights''. ====Frequency weights==== If the weights are ''frequency weights'' (where a weight equals the number of occurrences), then the unbiased estimator is: :<math> s^2\ = \frac {\sum\limits_{i=1}^N w_i \left(x_i - \mu^*\right)^2} {\sum_{i=1}^N w_i - 1} </math> This effectively applies Bessel's correction for frequency weights. For example, if values <math>\{2, 2, 4, 5, 5, 5\}</math> are drawn from the same distribution, then we can treat this set as an unweighted sample, or we can treat it as the weighted sample <math>\{2, 4, 5\}</math> with corresponding weights <math>\{2, 1, 3\}</math>, and we get the same result either way. If the frequency weights <math>\{w_i\}</math> are normalized to 1, then the correct expression after Bessel's correction becomes :<math>s^2\ = \frac {\sum_{i=1}^N w_i} {\sum_{i=1}^N w_i - 1}\sum_{i=1}^N w_i \left(x_i - \mu^*\right)^2</math> where the total number of samples is <math>\sum_{i=1}^N w_i</math> (not <math>N</math>). In any case, the information on total number of samples is necessary in order to obtain an unbiased correction, even if <math>w_i</math> has a different meaning other than frequency weight. The estimator can be unbiased only if the weights are not [[Standard score|standardized]] nor [[Normalization (statistics)|normalized]], these processes changing the data's mean and variance and thus leading to a [[Base rate fallacy|loss of the base rate]] (the population count, which is a requirement for Bessel's correction). ====Reliability weights==== If the weights are instead ''reliability weights'' (non-random values reflecting the sample's relative trustworthiness, often derived from sample variance), we can determine a correction factor to yield an unbiased estimator. Assuming each random variable is sampled from the same distribution with mean <math>\mu</math> and actual variance <math>\sigma_{\text{actual}}^2</math>, taking expectations we have, :<math> \begin{align} \operatorname{E} [\hat \sigma^2] &= \frac{ \sum\limits_{i=1}^N \operatorname{E} [(x_i - \mu)^2]} N \\ &= \operatorname{E} [(X - \operatorname{E}[X])^2] - \frac{1}{N} \operatorname{E} [(X - \operatorname{E}[X])^2] \\ &= \left( \frac{N - 1} N \right) \sigma_{\text{actual}}^2 \\ \operatorname{E} [\hat \sigma^2_\mathrm{w}] &= \frac{\sum\limits_{i=1}^N w_i \operatorname{E} [(x_i - \mu^*)^2] }{V_1} \\ &= \operatorname{E}[(X - \operatorname{E}[X])^2] - \frac{V_2}{V_1^2} \operatorname{E}[(X - \operatorname{E}[X])^2] \\ &= \left(1 - \frac{V_2 }{ V_1^2}\right) \sigma_{\text{actual}}^2 \end{align} </math> where <math>V_1 = \sum_{i=1}^N w_i</math> and <math>V_2 = \sum_{i=1}^N w_i^2</math>. Therefore, the bias in our estimator is <math>\left(1 - \frac{V_2 }{ V_1^2}\right) </math>, analogous to the <math> \left( \frac{N - 1} {N} \right)</math> bias in the unweighted estimator (also notice that <math>\ V_1^2 / V_2 = N_{eff} </math> is the [[effective sample size#weighted samples|effective sample size]]). This means that to unbias our estimator we need to pre-divide by <math>1 - \left(V_2 / V_1^2\right) </math>, ensuring that the expected value of the estimated variance equals the actual variance of the sampling distribution. The final unbiased estimate of sample variance is: :<math> \begin{align} s^2_{\mathrm{w}}\ &= \frac{\hat \sigma^2_\mathrm{w}} {1 - (V_2 / V_1^2)} \\[4pt] &= \frac {\sum\limits_{i=1}^N w_i (x_i - \mu^*)^2} {V_1 - (V_2 / V_1)}, \end{align} </math><ref>{{cite web|url=https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html|title=GNU Scientific Library β Reference Manual: Weighted Samples|website=Gnu.org|access-date=22 December 2017}}</ref> where <math>\operatorname{E}[s^2_{\mathrm{w}}] = \sigma_{\text{actual}}^2</math>. The degrees of freedom of this weighted, unbiased sample variance vary accordingly from ''N'' β 1 down to 0. The standard deviation is simply the square root of the variance above. As a side note, other approaches have been described to compute the weighted sample variance.<ref>{{cite web |url=http://www.analyticalgroup.com/download/WEIGHTED_MEAN.pdf |title=Weighted Standard Error and its Impact on Significance Testing (WinCross vs. Quantum & SPSS), Dr. Albert Madansky| website=Analyticalgroup.com| access-date=22 December 2017}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)