Law of total variance
The law of total variance is a fundamental result in probability theory that expresses the variance of a random variable Template:Mvar in terms of its conditional variances and conditional means given another random variable Template:Mvar. Informally, it states that the overall variability of Template:Mvar can be split into an “unexplained” component (the average of within-group variances) and an “explained” component (the variance of group means).
Formally, if Template:Mvar and Template:Mvar are random variables on the same probability space, and Template:Mvar has finite variance, then:
<math display="block">\operatorname{Var}(Y) \;=\; \operatorname{E}\bigl[\operatorname{Var}(Y \mid X)\bigr] \;+\; \operatorname{Var}\!\bigl(\operatorname{E}[Y \mid X]\bigr).\!</math>
This identity is also known as the variance decomposition formula, the conditional variance formula, the law of iterated variances, or colloquially as Eve’s law,<ref>Joe Blitzstein and Jessica Hwang, Introduction to Probability, Final Review Notes.</ref> in parallel to the “Adam’s law” naming for the law of total expectation.
In actuarial science (particularly in credibility theory), the two terms <math>\operatorname{E}[\operatorname{Var}(Y \mid X)]</math> and <math>\operatorname{Var}(\operatorname{E}[Y \mid X])</math> are called the expected value of the process variance (EVPV) and the variance of the hypothetical means (VHM) respectively.<ref name="FCAS4ed">Template:Cite book</ref>
ExplanationEdit
Let Template:Mvar be a random variable and Template:Mvar another random variable on the same probability space. The law of total variance can be understood by noting:
- <math>\operatorname{Var}(Y \mid X)</math> measures how much Template:Mvar varies around its conditional mean <math>\operatorname{E}[Y\mid X].</math>
- Taking the expectation of this conditional variance across all values of Template:Mvar gives <math>\operatorname{E}[\operatorname{Var}(Y \mid X)]</math>, often termed the “unexplained” or within-group part.
- The variance of the conditional mean, <math>\operatorname{Var}(\operatorname{E}[Y\mid X])</math>, measures how much these conditional means differ (i.e. the “explained” or between-group part).
Adding these components yields the total variance <math>\operatorname{Var}(Y)</math>, mirroring how analysis of variance partitions variation.
ExamplesEdit
Example 1 (Exam Scores)Edit
Suppose five students take an exam scored 0–100. Let Template:Mvar = student’s score and Template:Mvar indicate whether the student is *international* or *domestic*:
Student | Template:Mvar (Score) | Template:Mvar |
---|---|---|
1 | 20 | International |
2 | 30 | International |
3 | 100 | International |
4 | 40 | Domestic |
5 | 60 | Domestic |
- Mean and variance for international: <math>\operatorname{E}[Y\mid X=\text{Intl}] = 50,\; \operatorname{Var}(Y\mid X=\text{Intl}) \approx 1266.7.</math>
- Mean and variance for domestic: <math>\operatorname{E}[Y\mid X=\text{Dom}] = 50,\; \operatorname{Var}(Y\mid X=\text{Dom}) = 100.</math>
Both groups share the same mean (50), so the explained variance <math>\operatorname{Var}(\operatorname{E}[Y\mid X])</math> is 0, and the total variance equals the average of the within-group variances (weighted by group size), i.e. 800.
Example 2 (Mixture of Two Gaussians)Edit
Let Template:Mvar be a coin flip taking values Template:Math with probability Template:Mvar and Template:Math with probability Template:Mvar. Given Heads, Template:Mvar ~ Normal(<math>\mu_h,\sigma_h^2</math>); given Tails, Template:Mvar ~ Normal(<math>\mu_t,\sigma_t^2</math>). Then <math>\operatorname{E}[\operatorname{Var}(Y\mid X)] = h\,\sigma_h^2 + (1 - h)\,\sigma_t^2,</math> <math>\operatorname{Var}(\operatorname{E}[Y\mid X]) = h\,(1 - h)\,(\mu_h - \mu_t)^2,</math> so <math>\operatorname{Var}(Y) = h\,\sigma_h^2 + (1 - h)\,\sigma_t^2 \;+\; h\,(1 - h)\,(\mu_h-\mu_t)^2.</math>
Example 3 (Dice and Coins)Edit
Consider a two-stage experiment:
- Roll a fair die (values 1–6) to choose one of six biased coins.
- Flip that chosen coin; let Template:Mvar=1 if Heads, 0 if Tails.
Then <math>\operatorname{E}[Y\mid X=i] = p_i, \; \operatorname{Var}(Y\mid X=i)=p_i(1-p_i).</math> The overall variance of Template:Mvar becomes <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[p_X(1 - p_X)\bigr] + \operatorname{Var}\bigl(p_X\bigr),</math> with <math>p_X</math> uniform on <math>\{p_1,\dots,p_6\}.</math>
ProofEdit
Discrete/Finite ProofEdit
Let <math>(X_i,Y_i)</math>, <math>i=1,\ldots,n</math>, be observed pairs. Define <math>\overline{Y} = \operatorname{E}[Y].</math> Then <math>\operatorname{Var}(Y) = \frac{1}{n}\sum_{i=1}^n \bigl(Y_i - \overline{Y}\bigr)^2 = \frac{1}{n}\sum_{i=1}^n \Bigl[(Y_i - \overline{Y}_{X_i}) + (\overline{Y}_{X_i} - \overline{Y})\Bigr]^2,</math> where <math>\overline{Y}_{X_i}=\operatorname{E}[Y\mid X=X_i].</math> Expanding the square and noting the cross term cancels in summation yields: <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X)\bigr] \;+\; \operatorname{Var}\!\bigl(\operatorname{E}[Y\mid X]\bigr).\!</math>
General CaseEdit
Using <math>\operatorname{Var}(Y) = \operatorname{E}[Y^2] - \operatorname{E}[Y]^2</math> and the law of total expectation: <math>\operatorname{E}[Y^2] = \operatorname{E}\bigl[\operatorname{E}(Y^2 \mid X)\bigr] = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X) + \operatorname{E}[Y\mid X]^2\bigr].</math> Subtract <math>\operatorname{E}[Y]^2 = \bigl(\operatorname{E}[\operatorname{E}(Y\mid X)]\bigr)^2</math> and regroup to arrive at <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X)\bigr] + \operatorname{Var}\!\bigl(\operatorname{E}[Y\mid X]\bigr).\!</math>
ApplicationsEdit
Analysis of Variance (ANOVA)Edit
In a one-way analysis of variance, the total sum of squares (proportional to <math>\operatorname{Var}(Y)</math>) is split into a “between-group” sum of squares (<math>\operatorname{Var}(\operatorname{E}[Y\mid X])</math>) plus a “within-group” sum of squares (<math>\operatorname{E}[\operatorname{Var}(Y\mid X)]</math>). The F-test examines whether the explained component is sufficiently large to indicate Template:Mvar has a significant effect on Template:Mvar.<ref>Analysis of variance — R.A. Fisher’s 1920s development.</ref>
Regression and R²Edit
In linear regression and related models, if <math>\hat{Y}=\operatorname{E}[Y\mid X],</math> the fraction of variance explained is <math>R^2 = \frac{\operatorname{Var}(\hat{Y})}{\operatorname{Var}(Y)} = \frac{\operatorname{Var}(\operatorname{E}[Y\mid X])}{\operatorname{Var}(Y)} = 1 - \frac{\operatorname{E}[\operatorname{Var}(Y\mid X)]}{\operatorname{Var}(Y)}.</math> In the simple linear case (one predictor), <math>R^2</math> also equals the square of the Pearson correlation coefficient between Template:Mvar and Template:Mvar.
Machine Learning and Bayesian InferenceEdit
In many Bayesian and ensemble methods, one decomposes prediction uncertainty via the law of total variance. For a Bayesian neural network with random parameters <math>\theta</math>: <math>\operatorname{Var}(Y)=\operatorname{E}\bigl[\operatorname{Var}(Y\mid \theta)\bigr] + \operatorname{Var}\bigl(\operatorname{E}[Y\mid \theta]\bigr),</math> often referred to as “aleatoric” (within-model) vs. “epistemic” (between-model) uncertainty.<ref>See for instance AWS ML quantifying uncertainty guidance.</ref>
Actuarial ScienceEdit
Credibility theory uses the same partitioning: the expected value of process variance (EVPV), <math>\operatorname{E}[\operatorname{Var}(Y\mid X)],</math> and the variance of hypothetical means (VHM), <math>\operatorname{Var}(\operatorname{E}[Y\mid X]).</math> The ratio of explained to total variance determines how much “credibility” to give to individual risk classifications.<ref name="FCAS4ed" />
Information TheoryEdit
For jointly Gaussian <math>(X,Y)</math>, the fraction <math>\operatorname{Var}(\operatorname{E}[Y\mid X])/\operatorname{Var}(Y)</math> relates directly to the mutual information <math>I(Y;X).</math><ref>C. G. Bowsher & P. S. Swain (2012). "Identifying sources of variation and the flow of information in biochemical networks," PNAS 109 (20): E1320–E1328.</ref> In non-Gaussian settings, a high explained-variance ratio still indicates significant information about Template:Mvar contained in Template:Mvar.
GeneralizationsEdit
The law of total variance generalizes to multiple or nested conditionings. For example, with two conditioning variables <math>X_1</math> and <math>X_2</math>: <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X_1,X_2)\bigr] + \operatorname{E}\bigl[\operatorname{Var}(\operatorname{E}[Y\mid X_1,X_2]\mid X_1)\bigr] + \operatorname{Var}(\operatorname{E}[Y\mid X_1]).</math> More generally, the law of total cumulance extends this approach to higher moments.
See alsoEdit
- Law of total expectation (Adam’s law)
- Law of total covariance
- Law of total cumulance
- Analysis of variance
- Conditional expectation
- R-squared
- Fraction of variance unexplained
- Variance decomposition
ReferencesEdit
- {{#invoke:citation/CS1|citation
|CitationClass=web }}
- {{#invoke:citation/CS1|citation
|CitationClass=web }}