Editing Conditional expectation (section)

== Basic properties ==
All the following formulas are to be understood in an almost sure sense.  The ''σ''-algebra <math>\mathcal{H}</math> could be replaced by a random variable <math>Z</math>, i.e. <math>\mathcal{H}=\sigma(Z)</math>.

* Pulling out independent factors:
** If <math>X</math> is [[Independence (probability theory)|independent]] of <math>\mathcal{H}</math>, then <math>E(X\mid\mathcal{H}) = E(X)</math>.
{{hidden begin|toggle=left|title=Proof}}
Let <math>B \in \mathcal{H}</math>. Then <math>X</math> is independent of <math>1_B</math>, so we get that
:<math>\int_B X\,dP = E(X1_B) = E(X)E(1_B) = E(X)P(B) = \int_B E(X)\,dP.</math>
Thus the definition of conditional expectation is satisfied by the constant random variable <math>E(X)</math>, as desired. <math>\square</math>
{{hidden end}}
** If <math>X</math> is independent of <math>\sigma(Y, \mathcal{H})</math>, then <math>E(XY\mid \mathcal{H}) = E(X) \, E(Y\mid\mathcal{H})</math>. Note that this is not necessarily the case if <math>X</math> is only independent of <math>\mathcal{H}</math> and of <math>Y</math>.
** If <math>X,Y</math> are independent, <math>\mathcal{G},\mathcal{H}</math> are independent, <math>X</math> is independent of <math>\mathcal{H}</math> and <math>Y</math> is independent of <math>\mathcal{G}</math>, then <math>E(E(XY\mid\mathcal{G})\mid\mathcal{H}) = E(X) E(Y) = E(E(XY\mid\mathcal{H})\mid\mathcal{G})</math>.
* Stability:
** If <math>X</math> is <math>\mathcal{H}</math>-measurable, then <math>E(X\mid\mathcal{H}) = X</math>.
{{hidden begin|toggle=left|title=Proof}}
For each <math>H\in \mathcal{H}</math> we have <math>\int_H E(X\mid\mathcal{H}) \, dP = \int_H X \, dP</math>, or equivalently
:<math> \int_H \big( E(X\mid\mathcal{H}) - X \big) \, dP = 0 </math>
Since this is true for each <math>H \in \mathcal{H}</math>, and both <math>E(X\mid\mathcal{H})</math> and <math>X</math> are <math>\mathcal{H}</math>-measurable (the former property holds by definition; the latter property is key here), from this one can show
:<math> \int_H \big| E(X\mid\mathcal{H}) - X  \big| \, dP = 0 </math>
And this implies <math> E(X\mid\mathcal{H}) = X</math> almost everywhere. <math>\square</math>
{{hidden end}}
** In particular, for sub-''σ''-algebras <math>\mathcal{H}_1\subset\mathcal{H}_2 \subset\mathcal{F}</math> we have <math>E(E(X\mid\mathcal{H}_1)\mid\mathcal{H}_2)  = E(X\mid\mathcal{H}_1)</math>. (Note this is different from the tower property below.)
** If ''Z'' is a random variable, then <math>\operatorname{E}(f(Z) \mid Z)=f(Z)</math>.  In its simplest form, this says <math>\operatorname{E}(Z \mid Z)=Z</math>.
* Pulling out known factors:
** If <math>X</math> is <math>\mathcal{H}</math>-measurable, then <math>E(XY\mid\mathcal{H}) = X \, E(Y\mid\mathcal{H})</math>.

{{hidden begin|toggle=left|title=Proof}}
All random variables here are assumed without loss of generality to be non-negative. The general case can be treated with <math>X = X^+ - X^-</math>.

Fix <math>A \in \mathcal{H}</math> and let <math>X = 1_A</math>. Then for any <math>H \in \mathcal{H}</math>
:<math>\int_H E(1_A Y \mid \mathcal{H}) \, dP = \int_H 1_A Y \, dP = \int_{A \cap H} Y \, dP = \int_{A\cap H} E(Y\mid\mathcal{H}) \, dP = \int_H 1_A E(Y \mid \mathcal{H}) \, dP </math>
Hence <math> E(1_A Y \mid \mathcal{H}) = 1_A E(Y\mid\mathcal{H})</math> almost everywhere. 

Any simple function is a finite linear combination of indicator functions. By linearity the above property holds for simple functions: if <math>X_n</math> is a simple function then <math>E(X_n Y \mid \mathcal{H}) = X_n \, E(Y\mid \mathcal{H})</math>.

Now let <math>X</math> be <math>\mathcal{H}</math>-measurable. Then there exists a sequence of simple functions <math>\{ X_n \}_{n\geq 1}</math> converging monotonically (here meaning <math>X_n \leq X_{n+1}</math>) and pointwise to <math>X</math>. Consequently, for <math>Y \geq 0 </math>, the sequence <math>\{ X_n Y \}_{n\geq 1}</math> converges monotonically and pointwise to <math> X Y </math>. 

Also, since <math>E(Y\mid\mathcal{H}) \geq 0</math>, the sequence <math>\{ X_n E(Y\mid\mathcal{H}) \}_{n\geq 1}</math> converges monotonically and pointwise to <math>X \, E(Y\mid\mathcal{H})</math>

Combining the special case proved for simple functions, the definition of conditional expectation, and deploying the monotone convergence theorem:
:<math> 
 \int_H X \, E(Y\mid\mathcal{H}) \, dP
= \int_H \lim_{n \to \infty} X_n \, E(Y\mid\mathcal{H}) \, dP
= \lim_{n \to \infty} \int_H X_n E(Y\mid\mathcal{H}) \, dP
=
\lim_{n \to \infty} \int_H E(X_n Y\mid\mathcal{H}) \, dP
=
\lim_{n \to \infty} \int_H X_n Y \, dP  = \int_H \lim_{n\to \infty} X_n Y \, dP = \int_H XY \, dP = \int_H E(XY\mid\mathcal{H}) \, dP</math>

This holds for all <math>H\in \mathcal{H}</math>, whence <math>X \, E(Y\mid\mathcal{H}) = E(XY\mid\mathcal{H})</math> almost everywhere. <math>\square</math>
{{hidden end}}
** If ''Z'' is a random variable, then <math>\operatorname{E}(f(Z) Y \mid Z)=f(Z)\operatorname{E}(Y \mid Z)</math>.
* [[Law of total expectation]]: <math>E(E(X \mid \mathcal{H})) = E(X)</math>.<ref>{{Cite web|title=Conditional expectation|url=https://www.statlect.com/fundamentals-of-probability/conditional-expectation|access-date=2020-09-11|website=www.statlect.com}}</ref>
* Tower property:
** For sub-''σ''-algebras <math>\mathcal{H}_1\subset\mathcal{H}_2 \subset\mathcal{F}</math> we have <math>E(E(X\mid\mathcal{H}_2)\mid\mathcal{H}_1)  = E(X\mid\mathcal{H}_1)</math>.
*** A special case <math>\mathcal{H}_1=\{\emptyset, \Omega\}</math> recovers the Law of total expectation: <math>E(E(X\mid\mathcal{H}_2) )  = E(X)</math>.
*** A special case is when ''Z'' is a <math>\mathcal{H}</math>-measurable random variable. Then <math>\sigma(Z) \subset \mathcal{H}</math> and thus <math>E(E(X \mid \mathcal{H}) \mid Z) = E(X \mid Z)</math>.
*** [[Doob martingale]] property: the above with <math>Z = E(X \mid \mathcal{H})</math> (which is <math>\mathcal{H}</math>-measurable), and using also <math>\operatorname{E}(Z \mid Z)=Z</math>, gives <math>E(X \mid E(X \mid \mathcal{H})) = E(X \mid \mathcal{H})</math>.
** For random variables <math>X,Y</math> we have <math>E(E(X\mid Y)\mid f(Y))  = E(X\mid f(Y))</math>.
** For random variables <math>X,Y,Z</math> we have <math>E(E(X\mid Y,Z)\mid Y)  = E(X\mid Y)</math>.
* Linearity: we have <math>E(X_1 + X_2 \mid \mathcal{H}) = E(X_1 \mid \mathcal{H}) + E(X_2 \mid \mathcal{H})</math> and <math>E(a X \mid \mathcal{H}) = a\,E(X \mid \mathcal{H})</math> for <math>a\in\R</math>.
* Positivity: If <math>X \ge 0</math> then <math>E(X \mid \mathcal{H}) \ge 0</math>.
* Monotonicity: If <math>X_1 \le X_2</math> then <math>E(X_1 \mid \mathcal{H}) \le E(X_2 \mid \mathcal{H})</math>.
* [[Monotone convergence theorem|Monotone convergence]]: If <math>0\leq X_n \uparrow X</math> then <math>E(X_n \mid \mathcal{H}) \uparrow E(X \mid \mathcal{H})</math>.
* [[Dominated convergence theorem|Dominated convergence]]: If <math>X_n \to X</math> and <math>|X_n| \le Y</math> with <math>Y \in L^1</math>, then <math>E(X_n \mid \mathcal{H}) \to E(X \mid \mathcal{H})</math>.
* [[Fatou's lemma]]: If <math>\textstyle E(\inf_n X_n \mid \mathcal{H}) > -\infty</math> then <math>\textstyle E(\liminf_{n\to\infty} X_n \mid \mathcal{H}) \le \liminf_{n\to\infty} E(X_n \mid \mathcal{H})</math>.
* [[Jensen's inequality]]: If <math>f \colon \mathbb{R} \rightarrow \mathbb{R}</math> is a [[convex function]], then <math>f(E(X\mid \mathcal{H})) \le E(f(X)\mid\mathcal{H})</math>.
* [[Conditional variance]]: Using the conditional expectation we can define, by analogy with the definition of the [[variance]] as the mean square deviation from the average, the conditional variance
** Definition: <math>\operatorname{Var}(X \mid  \mathcal{H}) = \operatorname{E}\bigl( (X - \operatorname{E}(X \mid \mathcal{H}))^2 \mid  \mathcal{H} \bigr)</math>
**Algebraic formula for the variance: <math>\operatorname{Var}(X \mid  \mathcal{H}) = \operatorname{E}(X^2 \mid  \mathcal{H}) - \bigl(\operatorname{E}(X \mid  \mathcal{H})\bigr)^2</math>
** [[Law of total variance]]: <math>\operatorname{Var}(X) = \operatorname{E}(\operatorname{Var}(X \mid \mathcal{H})) + \operatorname{Var}(\operatorname{E}(X \mid \mathcal{H}))</math>.
* [[Martingale convergence theorem|Martingale convergence]]: For a random variable <math>X</math>, that has finite expectation, we have <math>E(X\mid\mathcal{H}_n) \to E(X\mid\mathcal{H})</math>, if either <math>\mathcal{H}_1 \subset \mathcal{H}_2 \subset \dotsb</math> is an increasing series of sub-''σ''-algebras and <math>\textstyle \mathcal{H} = \sigma(\bigcup_{n=1}^\infty \mathcal{H}_n)</math> or if <math>\mathcal{H}_1 \supset \mathcal{H}_2 \supset \dotsb</math> is a decreasing series of sub-''σ''-algebras and <math>\textstyle \mathcal{H} = \bigcap_{n=1}^\infty \mathcal{H}_n</math>.
* Conditional expectation as <math>L^2</math>-projection: If <math>X,Y</math> are in the [[Hilbert space]] of [[square-integrable]] real random variables (real random variables with finite second moment) then
** for <math>\mathcal{H}</math>-measurable <math>Y</math>, we have <math>E(Y(X - E(X\mid\mathcal{H}))) = 0</math>, i.e. the conditional expectation <math>E(X\mid\mathcal{H})</math>  is in the sense of the [[Lp space|''L''<sup>2</sup>(''P'')]] scalar product the [[orthogonal projection]] from <math>X</math> to the [[linear subspace]] of <math>\mathcal{H}</math>-measurable functions. (This allows to define and prove the existence of the conditional expectation based on the [[Hilbert projection theorem]].)
** the mapping <math>X \mapsto \operatorname{E}(X\mid\mathcal{H})</math> is [[self-adjoint operator|self-adjoint]]: <math>\operatorname E(X \operatorname E(Y \mid \mathcal{H})) = \operatorname E\left(\operatorname E(X \mid \mathcal{H}) \operatorname E(Y \mid \mathcal{H})\right) = \operatorname E(\operatorname E(X \mid \mathcal{H}) Y)</math>
* Conditioning is a [[Contraction (operator theory)|contractive]] projection of [[Lp space|''L''<sup>p</sup>]] spaces <math>L^p(\Omega, \mathcal{F}, P) \rightarrow L^p(\Omega, \mathcal{H}, P)</math>.  I.e., <math>\operatorname{E}\big(|\operatorname{E}(X \mid\mathcal{H})|^p \big) \le \operatorname{E}\big(|X|^p\big)</math> for any ''p''&nbsp;≥&nbsp;1.
* Doob's conditional independence property:<ref>{{Cite book|title=Foundations of Modern Probability|last=Kallenberg|first=Olav|publisher=Springer|year=2001|isbn=0-387-95313-2|edition=2nd|location=York, PA, USA|pages=110}}</ref> If <math>X,Y</math> are [[conditionally independent]] given <math>Z</math>, then <math>P(X \in B\mid Y,Z) = P(X \in B\mid Z)</math> (equivalently, <math>E(1_{\{X \in B\}}\mid Y,Z) = E(1_{\{X \in B\}} \mid Z)</math>).