Law of total expectation
Template:Short description The proposition in probability theory known as the law of total expectation,<ref>Template:Cite book</ref> the law of iterated expectations<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> (LIE), Adam's law,<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> the tower rule,<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> and the smoothing property of conditional expectation,<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> among other names, states that if <math>X</math> is a random variable whose expected value <math>\operatorname{E}(X)</math> is defined, and <math>Y</math> is any random variable on the same probability space, then
- <math>\operatorname{E} (X) = \operatorname{E} ( \operatorname{E} ( X \mid Y)),</math>
i.e., the expected value of the conditional expected value of <math>X</math> given <math>Y</math> is the same as the expected value of <math>X</math>.
The conditional expected value <math>\operatorname{E}( X \mid Y )</math>, with <math>Y</math> a random variable, is not a simple number; it is a random variable whose value depends on the value of <math>Y</math>. That is, the conditional expected value of <math>X</math> given the event <math>Y = y</math> is a number and it is a function of <math>y</math>. If we write <math>g(y)</math> for the value of <math>\operatorname{E} ( X \mid Y = y) </math> then the random variable <math>\operatorname{E}( X \mid Y )</math> is <math> g( Y ) </math>.
One special case states that if <math>{\left\{A_i\right\}}</math> is a finite or countable partition of the sample space, then
- <math>\operatorname{E} (X) = \sum_i{\operatorname{E}(X \mid A_i) \operatorname{P}(A_i)}.</math>
ExampleEdit
Suppose that only two factories supply light bulbs to the market. Factory <math>X</math>'s bulbs work for an average of 5000 hours, whereas factory <math>Y</math>'s bulbs work for an average of 4000 hours. It is known that factory <math>X</math> supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?
Applying the law of total expectation, we have:
- <math>\begin{align}
\operatorname{E} (L) &= \operatorname{E}(L \mid X) \operatorname{P}(X)+\operatorname{E}(L \mid Y) \operatorname{P}(Y) \\[3pt] &= 5000(0.6)+4000(0.4)\\[2pt] &=4600
\end{align}</math>
where
- <math>\operatorname{E} (L)</math> is the expected life of the bulb;
- <math>\operatorname{P}(X)={6 \over 10}</math> is the probability that the purchased bulb was manufactured by factory <math>X</math>;
- <math>\operatorname{P}(Y)={4 \over 10}</math> is the probability that the purchased bulb was manufactured by factory <math>Y</math>;
- <math>\operatorname{E}(L \mid X)=5000</math> is the expected lifetime of a bulb manufactured by <math>X</math>;
- <math>\operatorname{E}(L \mid Y)=4000</math> is the expected lifetime of a bulb manufactured by <math>Y</math>.
Thus each purchased light bulb has an expected lifetime of 4600 hours.
Informal proofEdit
When a joint probability density function is well defined and the expectations are integrable, we write for the general case <math display="block">\begin{align} \operatorname E(X) &= \int x \Pr[X=x] ~dx \\ \operatorname E(X\mid Y=y) &= \int x \Pr[X=x\mid Y=y] ~dx \\ \operatorname E( \operatorname E(X\mid Y)) &= \int \left(\int x \Pr[X=x\mid Y=y] ~dx \right) \Pr[Y=y] ~dy \\ &= \int \int x \Pr[X = x, Y= y] ~dx ~dy \\ &= \int x \left( \int \Pr[X = x, Y = y] ~dy \right) ~dx \\ &= \int x \Pr[X = x] ~dx \\ &= \operatorname E(X)\,.\end{align}</math> A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.
Proof in the general caseEdit
Let <math> (\Omega,\mathcal{F},\operatorname{P}) </math> be a probability space on which two sub σ-algebras <math> \mathcal{G}_1 \subseteq \mathcal{G}_2 \subseteq \mathcal{F} </math> are defined. For a random variable <math> X </math> on such a space, the smoothing law states that if <math>\operatorname{E}[X]</math> is defined, i.e. <math>\min(\operatorname{E}[X_+], \operatorname{E}[X_-])<\infty</math>, then
- <math> \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] = \operatorname{E}[X \mid \mathcal{G}_1]\quad\text{(a.s.)}.</math>
Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:
- <math> \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \mbox{ is } \mathcal{G}_1</math>-measurable
- <math> \int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P} = \int_{G_1} X \, d\operatorname{P},</math> for all <math>G_1 \in \mathcal{G}_1.</math>
The first of these properties holds by definition of the conditional expectation. To prove the second one,
- <math>
\begin{align} \min\left(\int_{G_1}X_+\, d\operatorname{P}, \int_{G_1}X_-\, d\operatorname{P} \right) &\leq \min\left(\int_\Omega X_+\, d\operatorname{P}, \int_\Omega X_-\, d\operatorname{P}\right)\\[4pt] &=\min(\operatorname{E}[X_+], \operatorname{E}[X_-]) < \infty, \end{align} </math>
so the integral <math>\textstyle \int_{G_1}X\, d\operatorname{P}</math> is defined (not equal <math>\infty - \infty</math>).
The second property thus holds since <math>G_1 \in \mathcal{G}_1 \subseteq \mathcal{G}_2 </math> implies
- <math>
\int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P}
= \int_{G_1} \operatorname{E}[X \mid \mathcal{G}_2] \, d\operatorname{P} = \int_{G_1} X \, d\operatorname{P}. </math>
Corollary. In the special case when <math>\mathcal{G}_1 = \{\empty,\Omega \}</math> and <math>\mathcal{G}_2 = \sigma(Y)</math>, the smoothing law reduces to
- <math>
\operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X].
</math>
Alternative proof for <math> \operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X].</math>
This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, <math> \operatorname{E}[X \mid Y] := \operatorname{E}[X \mid \sigma(Y)] </math> is a <math>\sigma(Y)</math>-measurable random variable that satisfies
- <math>
\int_A \operatorname{E}[X \mid Y] \, d\operatorname{P} = \int_A X \, d\operatorname{P},
</math> for every measurable set <math> A \in \sigma(Y) </math>. Taking <math> A = \Omega </math> proves the claim.
See alsoEdit
- The fundamental theorem of poker for one practical application.
- Law of total probability
- Law of total variance
- Law of total covariance
- Law of total cumulance
- Product distribution#expectation (application of the Law for proving that the product expectation is the product of expectations)
ReferencesEdit
- Template:Cite book (Theorem 34.4)
- Christopher Sims, "Notes on Random Variables, Expectations, Probability Densities, and Martingales", especially equations (16) through (18)