Editing Variance (section)

====Sum of correlated variables====

=====Sum of correlated variables with fixed sample size=====
{{main article|Bienaymé's identity}}
In general, the variance of the sum of {{math|n}} variables is the sum of their [[covariance]]s:

<math display="block">\operatorname{Var}\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \sum_{j=1}^n \operatorname{Cov}\left(X_i, X_j\right) = \sum_{i=1}^n \operatorname{Var}\left(X_i\right) + 2 \sum_{1 \leq i < j\leq n} \operatorname{Cov}\left(X_i, X_j\right).</math>

(Note: The second equality comes from the fact that {{math|1=Cov(''X''<sub>''i''</sub>,''X''<sub>''i''</sub>) = Var(''X''<sub>''i''</sub>)}}.)

Here, <math>\operatorname{Cov}(\cdot,\cdot)</math> is the [[covariance]], which is zero for independent random variables (if it exists). The formula states that the variance of a sum is equal to the sum of all elements in the covariance matrix of the components. The next expression states equivalently that the variance of the sum is the sum of the diagonal of covariance matrix plus two times the sum of its upper triangular elements (or its lower triangular elements); this emphasizes that the covariance matrix is symmetric. This formula is used in the theory of [[Cronbach's alpha]] in [[classical test theory]].

So, if the variables have equal variance ''σ''<sup>2</sup> and the average [[correlation]] of distinct variables is ''ρ'', then the variance of their mean is

<math display="block">\operatorname{Var}\left(\overline{X}\right) = \frac{\sigma^2}{n} + \frac{n - 1}{n}\rho\sigma^2.</math>

This implies that the variance of the mean increases with the average of the correlations. In other words, additional correlated observations are not as effective as additional independent observations at reducing the [[standard error|uncertainty of the mean]]. Moreover, if the variables have unit variance, for example if they are standardized, then this simplifies to

<math display="block">\operatorname{Var}\left(\overline{X}\right) = \frac{1}{n} + \frac{n - 1}{n}\rho.</math>

This formula is used in the [[Spearman–Brown prediction formula]] of classical test theory. This converges to ''ρ'' if ''n'' goes to infinity, provided that the average correlation remains constant or converges too. So for the variance of the mean of standardized variables with equal correlations or converging average correlation we have

<math display="block">\lim_{n \to \infty} \operatorname{Var}\left(\overline{X}\right) = \rho.</math>

Therefore, the variance of the mean of a large number of standardized variables is approximately equal to their average correlation. This makes clear that the sample mean of correlated variables does not generally converge to the population mean, even though the [[law of large numbers]] states that the sample mean will converge for independent variables.

=====Sum of uncorrelated variables with random sample size=====
There are cases when a sample is taken without knowing, in advance, how many observations will be acceptable according to some criterion. In such cases, the sample size {{math|N}} is a random variable whose variation adds to the variation of {{math|X}}, such that,<ref>Cornell, J R, and Benjamin, C A, ''Probability, Statistics, and Decisions for Civil Engineers,'' McGraw-Hill, NY, 1970, pp.178-9.</ref>
<math display="block">\operatorname{Var}\left(\sum_{i=1}^{N}X_i\right)=\operatorname{E}\left[N\right]\operatorname{Var}(X)+\operatorname{Var}(N)(\operatorname{E}\left[X\right])^2</math>
which follows from the [[law of total variance]].

If {{math|N}} has a [[Poisson distribution]], then <math>\operatorname{E}[N]=\operatorname{Var}(N)</math> with estimator {{math|n}} = {{math|N}}. So, the estimator of <math>\operatorname{Var}\left(\sum_{i=1}^{n}X_i\right)</math> becomes <math>n{S_x}^2+n\bar{X}^2</math>, giving <math>\operatorname{SE}(\bar{X})=\sqrt{\frac{{S_x}^2+\bar{X}^2}{n}}</math>
(see [[Standard error#Standard_error_of_the_sample_mean|standard error of the sample mean]]).