Law of total variance

Template:Short description

The law of total variance is a fundamental result in probability theory that expresses the variance of a random variable Template:Mvar in terms of its conditional variances and conditional means given another random variable Template:Mvar. Informally, it states that the overall variability of Template:Mvar can be split into an “unexplained” component (the average of within-group variances) and an “explained” component (the variance of group means).

Formally, if Template:Mvar and Template:Mvar are random variables on the same probability space, and Template:Mvar has finite variance, then:

<math display="block">\operatorname{Var}(Y) \;=\; \operatorname{E}\bigl[\operatorname{Var}(Y \mid X)\bigr] \;+\; \operatorname{Var}\!\bigl(\operatorname{E}[Y \mid X]\bigr).\!</math>

This identity is also known as the variance decomposition formula, the conditional variance formula, the law of iterated variances, or colloquially as Eve’s law,<ref>Joe Blitzstein and Jessica Hwang, Introduction to Probability, Final Review Notes.</ref> in parallel to the “Adam’s law” naming for the law of total expectation.

In actuarial science (particularly in credibility theory), the two terms <math>\operatorname{E}[\operatorname{Var}(Y \mid X)]</math> and <math>\operatorname{Var}(\operatorname{E}[Y \mid X])</math> are called the expected value of the process variance (EVPV) and the variance of the hypothetical means (VHM) respectively.<ref name="FCAS4ed">Template:Cite book</ref>

ExplanationEdit

Let Template:Mvar be a random variable and Template:Mvar another random variable on the same probability space. The law of total variance can be understood by noting:

<math>\operatorname{Var}(Y \mid X)</math> measures how much Template:Mvar varies around its conditional mean <math>\operatorname{E}[Y\mid X].</math>
Taking the expectation of this conditional variance across all values of Template:Mvar gives <math>\operatorname{E}[\operatorname{Var}(Y \mid X)]</math>, often termed the “unexplained” or within-group part.
The variance of the conditional mean, <math>\operatorname{Var}(\operatorname{E}[Y\mid X])</math>, measures how much these conditional means differ (i.e. the “explained” or between-group part).

Adding these components yields the total variance <math>\operatorname{Var}(Y)</math>, mirroring how analysis of variance partitions variation.

ExamplesEdit

Example 1 (Exam Scores)Edit

Suppose five students take an exam scored 0–100. Let Template:Mvar = student’s score and Template:Mvar indicate whether the student is *international* or *domestic*:

Student	Template:Mvar (Score)	Template:Mvar
1	20	International
2	30	International
3	100	International
4	40	Domestic
5	60	Domestic

Mean and variance for international: <math>\operatorname{E}[Y\mid X=\text{Intl}] = 50,\; \operatorname{Var}(Y\mid X=\text{Intl}) \approx 1266.7.</math>
Mean and variance for domestic: <math>\operatorname{E}[Y\mid X=\text{Dom}] = 50,\; \operatorname{Var}(Y\mid X=\text{Dom}) = 100.</math>

Both groups share the same mean (50), so the explained variance <math>\operatorname{Var}(\operatorname{E}[Y\mid X])</math> is 0, and the total variance equals the average of the within-group variances (weighted by group size), i.e. 800.

Example 2 (Mixture of Two Gaussians)Edit

Let Template:Mvar be a coin flip taking values Template:Math with probability Template:Mvar and Template:Math with probability Template:Mvar. Given Heads, Template:Mvar ~ Normal(<math>\mu_h,\sigma_h^2</math>); given Tails, Template:Mvar ~ Normal(<math>\mu_t,\sigma_t^2</math>). Then <math>\operatorname{E}[\operatorname{Var}(Y\mid X)] = h\,\sigma_h^2 + (1 - h)\,\sigma_t^2,</math> <math>\operatorname{Var}(\operatorname{E}[Y\mid X]) = h\,(1 - h)\,(\mu_h - \mu_t)^2,</math> so <math>\operatorname{Var}(Y) = h\,\sigma_h^2 + (1 - h)\,\sigma_t^2 \;+\; h\,(1 - h)\,(\mu_h-\mu_t)^2.</math>

Example 3 (Dice and Coins)Edit

Consider a two-stage experiment:

Roll a fair die (values 1–6) to choose one of six biased coins.
Flip that chosen coin; let Template:Mvar=1 if Heads, 0 if Tails.

Then <math>\operatorname{E}[Y\mid X=i] = p_i, \; \operatorname{Var}(Y\mid X=i)=p_i(1-p_i).</math> The overall variance of Template:Mvar becomes <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[p_X(1 - p_X)\bigr] + \operatorname{Var}\bigl(p_X\bigr),</math> with <math>p_X</math> uniform on <math>\{p_1,\dots,p_6\}.</math>

ProofEdit

Discrete/Finite ProofEdit

Let <math>(X_i,Y_i)</math>, <math>i=1,\ldots,n</math>, be observed pairs. Define <math>\overline{Y} = \operatorname{E}[Y].</math> Then <math>\operatorname{Var}(Y) = \frac{1}{n}\sum_{i=1}^n \bigl(Y_i - \overline{Y}\bigr)^2 = \frac{1}{n}\sum_{i=1}^n \Bigl[(Y_i - \overline{Y}_{X_i}) + (\overline{Y}_{X_i} - \overline{Y})\Bigr]^2,</math> where <math>\overline{Y}_{X_i}=\operatorname{E}[Y\mid X=X_i].</math> Expanding the square and noting the cross term cancels in summation yields: <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X)\bigr] \;+\; \operatorname{Var}\!\bigl(\operatorname{E}[Y\mid X]\bigr).\!</math>

General CaseEdit

Using <math>\operatorname{Var}(Y) = \operatorname{E}[Y^2] - \operatorname{E}[Y]^2</math> and the law of total expectation: <math>\operatorname{E}[Y^2] = \operatorname{E}\bigl[\operatorname{E}(Y^2 \mid X)\bigr] = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X) + \operatorname{E}[Y\mid X]^2\bigr].</math> Subtract <math>\operatorname{E}[Y]^2 = \bigl(\operatorname{E}[\operatorname{E}(Y\mid X)]\bigr)^2</math> and regroup to arrive at <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X)\bigr] + \operatorname{Var}\!\bigl(\operatorname{E}[Y\mid X]\bigr).\!</math>

ApplicationsEdit

Analysis of Variance (ANOVA)Edit

In a one-way analysis of variance, the total sum of squares (proportional to <math>\operatorname{Var}(Y)</math>) is split into a “between-group” sum of squares (<math>\operatorname{Var}(\operatorname{E}[Y\mid X])</math>) plus a “within-group” sum of squares (<math>\operatorname{E}[\operatorname{Var}(Y\mid X)]</math>). The F-test examines whether the explained component is sufficiently large to indicate Template:Mvar has a significant effect on Template:Mvar.<ref>Analysis of variance — R.A. Fisher’s 1920s development.</ref>

Regression and R²Edit

In linear regression and related models, if <math>\hat{Y}=\operatorname{E}[Y\mid X],</math> the fraction of variance explained is <math>R^2 = \frac{\operatorname{Var}(\hat{Y})}{\operatorname{Var}(Y)} = \frac{\operatorname{Var}(\operatorname{E}[Y\mid X])}{\operatorname{Var}(Y)} = 1 - \frac{\operatorname{E}[\operatorname{Var}(Y\mid X)]}{\operatorname{Var}(Y)}.</math> In the simple linear case (one predictor), <math>R^2</math> also equals the square of the Pearson correlation coefficient between Template:Mvar and Template:Mvar.

Machine Learning and Bayesian InferenceEdit

In many Bayesian and ensemble methods, one decomposes prediction uncertainty via the law of total variance. For a Bayesian neural network with random parameters <math>\theta</math>: <math>\operatorname{Var}(Y)=\operatorname{E}\bigl[\operatorname{Var}(Y\mid \theta)\bigr] + \operatorname{Var}\bigl(\operatorname{E}[Y\mid \theta]\bigr),</math> often referred to as “aleatoric” (within-model) vs. “epistemic” (between-model) uncertainty.<ref>See for instance AWS ML quantifying uncertainty guidance.</ref>

Actuarial ScienceEdit

Credibility theory uses the same partitioning: the expected value of process variance (EVPV), <math>\operatorname{E}[\operatorname{Var}(Y\mid X)],</math> and the variance of hypothetical means (VHM), <math>\operatorname{Var}(\operatorname{E}[Y\mid X]).</math> The ratio of explained to total variance determines how much “credibility” to give to individual risk classifications.<ref name="FCAS4ed" />

Information TheoryEdit

For jointly Gaussian <math>(X,Y)</math>, the fraction <math>\operatorname{Var}(\operatorname{E}[Y\mid X])/\operatorname{Var}(Y)</math> relates directly to the mutual information <math>I(Y;X).</math><ref>C. G. Bowsher & P. S. Swain (2012). "Identifying sources of variation and the flow of information in biochemical networks," PNAS 109 (20): E1320–E1328.</ref> In non-Gaussian settings, a high explained-variance ratio still indicates significant information about Template:Mvar contained in Template:Mvar.

GeneralizationsEdit

The law of total variance generalizes to multiple or nested conditionings. For example, with two conditioning variables <math>X_1</math> and <math>X_2</math>: <math>\operatorname{Var}(Y) = \operatorname{E}\bigl[\operatorname{Var}(Y\mid X_1,X_2)\bigr] + \operatorname{E}\bigl[\operatorname{Var}(\operatorname{E}[Y\mid X_1,X_2]\mid X_1)\bigr] + \operatorname{Var}(\operatorname{E}[Y\mid X_1]).</math> More generally, the law of total cumulance extends this approach to higher moments.

ReferencesEdit

Template:Reflist

{{#invoke:citation/CS1|citation

|CitationClass=web }}

{{#invoke:citation/CS1|citation