Editing Conditional independence (section)

==Rules of conditional independence==

A set of rules governing statements of conditional independence have been derived from the basic definition.<ref>{{cite journal
 |first=A. P. |last=Dawid |authorlink=Philip Dawid
 |title=Conditional Independence in Statistical Theory
 |journal=[[Journal of the Royal Statistical Society, Series B]]
 |year=1979
 |volume=41 |issue=1 |pages=1–31
 |mr=0535541
 |jstor=2984718
}}</ref><ref name=pearl:2000>J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press</ref>

These rules were termed "[[Graphoid]] Axioms"
by Pearl and Paz,<ref name=pearl:paz85>{{cite conference
 | last1 = Pearl | first1 = Judea | author1-link = Judea Pearl
 | last2 = Paz | first2 = Azaria
 | editor1-last = du Boulay | editor1-first = Benedict
 | editor2-last = Hogg | editor2-first = David C.
 | editor3-last = Steels | editor3-first = Luc
 | contribution = Graphoids: Graph-Based Logic for Reasoning about Relevance Relations or When would x tell you more about y if you already know z?
 | pages = 357–363
 | publisher = North-Holland
 | title = Advances in Artificial Intelligence II, Seventh European Conference on Artificial Intelligence, ECAI 1986, Brighton, UK, July 20–25, 1986, Proceedings
 | url = https://ftp.cs.ucla.edu/pub/stat_ser/r53-L.pdf
 | year = 1986}}</ref> because they hold in graphs, where <math>X \perp\!\!\!\perp A\mid B</math> is interpreted to mean: "All paths from ''X'' to ''A'' are intercepted by the set ''B''".<ref name=pearl:88>{{cite book|last1=Pearl|first1=Judea|title=Probabilistic reasoning in intelligent systems: networks of plausible inference|url=https://archive.org/details/probabilisticrea00pear|url-access=registration|date=1988|publisher=Morgan Kaufmann|isbn=9780934613736}}</ref>

===Symmetry===
: <math>
X \perp\!\!\!\perp Y \mid Z \quad
\Leftrightarrow
\quad
Y \perp\!\!\!\perp X \mid Z
</math>
'''Proof:'''

From the definition of conditional independence,
: <math>
X \perp\!\!\!\perp Y \mid Z \quad
\Leftrightarrow
\quad P(X, Y \mid Z) = P(X \mid Z) P(Y \mid Z) \quad
\Leftrightarrow
\quad
Y \perp\!\!\!\perp X \mid Z
</math>

===Decomposition===
: <math>
X \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
h(X) \perp\!\!\!\perp Y \mid Z
</math>

'''Proof'''
From the definition of conditional independence, we seek to show that:
: <math>
X \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
P(h(X), Y \mid Z) = P(h(X) \mid Z) P(Y \mid Z)
</math>
. The left side of this equality is:
: <math>
P(h(X)=a, Y=y \mid Z=z) = \sum_{X \colon h(X)=a} P(X=x, Y=y \mid Z=z)
</math>
, where the expression on the right side of this equality is the summation over <math>X</math> such that <math>h(X)=a</math> of the conditional probability of <math>X, Y</math> on <math>Z</math>.
Further decomposing,
: <math>
\begin{align}
\sum_{X \colon h(X)=a} P(X=x, Y=y \mid Z=z) =& \sum_{X \colon h(X)=a} P(X=x \mid Z=z) P(Y=y \mid Z=z) \\
=& P(Y=y \mid Z=z) \sum_{X \colon h(X)=a} P(X=x \mid Z=z) \\
=& P(Y \mid Z) P (h(X) \mid Z)
\end{align}
</math>
. Special cases of this property include
* <math>
(X, W) \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid Z
</math>
** '''Proof:''' Let us define <math> A = (X,W) </math> and <math> h(\cdot) </math> be an 'extraction' function <math> h(X,W) = X</math>. Then:
: <math>
\begin{align}
(X,W) \perp\!\!\!\perp Y \mid Z
\quad &\Leftrightarrow \quad
A \perp\!\!\!\perp Y \mid Z \\
&\Rightarrow \quad
h(A) \perp\!\!\!\perp Y \mid Z \quad &\text{Decomposition} \\
&\Leftrightarrow \quad
X \perp\!\!\!\perp Y \mid Z
\end{align}
</math>
* <math>
X \perp\!\!\!\perp (Y, W) \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid Z
</math>
** '''Proof:''' Let us define <math> V = (Y,W) </math> and <math> h(\cdot) </math> be again an 'extraction' function <math> h(Y,W) = Y</math>. Then:
: <math>
\begin{align}
X \perp\!\!\!\perp (Y,W) \mid Z
\quad &\Leftrightarrow \quad 
X \perp\!\!\!\perp V \mid Z \\
&\Leftrightarrow \quad
V \perp\!\!\!\perp X \mid Z \quad &\text{Symmetry} \\
&\Rightarrow \quad
h(V) \perp\!\!\!\perp X \mid Z \quad &\text{Decomposition} \\
&\Leftrightarrow \quad
Y \perp\!\!\!\perp X \mid Z \\
&\Leftrightarrow \quad
X \perp\!\!\!\perp Y \mid Z \quad &\text{Symmetry}
\end{align}
</math>

===Weak union===

: <math>
X \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid (Z, h(X))
</math>

'''Proof:'''

Given <math> X \perp\!\!\!\perp Y \mid Z </math>, we aim to show
: <math>
\begin{align}
X \perp\!\!\!\perp Y \mid (Z, h(X))
\quad &\Leftrightarrow \quad
X \perp\!\!\!\perp Y \mid U \quad &\text{where} \quad U = (Z, h(X)) \\
&\Leftrightarrow \quad
Y \perp\!\!\!\perp X \mid U \quad &\text{Symmetry} \\
&\Leftrightarrow \quad
P(Y\mid X, U) = P(Y\mid U) \\
&\Leftrightarrow \quad
P(Y \mid X, Z, h(X)) = P(Y \mid Z, h(X))
\end{align}
</math>
. We begin with the left side of the equation
: <math>
\begin{align}
P(Y \mid X, Z, h(X)) &= P(Y \mid X, Z) \\
&= P(Y \mid Z) &\text{Since by symmetry } Y \perp\!\!\!\perp X \mid Z
\end{align}
</math>
. From the given condition
: <math>
\begin{align}
X \perp\!\!\!\perp Y \mid Z
\quad &\Rightarrow \quad
h(X) \perp\!\!\!\perp Y \mid Z
\quad &\text{Decomposition} \\
&\Leftrightarrow \quad
Y \perp\!\!\!\perp h(X) \mid Z
\quad &\text{Symmetry} \\
&\Rightarrow \quad
P(Y \mid Z, h(X)) = P(Y \mid Z)
\end{align}
</math>
. Thus <math> P(Y \mid X, Z, h(X)) = P(Y \mid Z, h(X))
</math>, so we have shown that <math>
X \perp\!\!\!\perp Y \mid (Z, h(X))
</math>.

'''Special Cases:'''

Some textbooks present the property as
* <math> X \perp\!\!\!\perp (Y, W) \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid (Z, W) </math> <ref name="Koller">{{cite book |last1=Koller |first1=Daphne |last2=Friedman |first2=Nir |title=Probabilistic Graphical Models |date=2009 |publisher=The MIT Press |location=Cambridge, MA |isbn=9780262013192}}</ref>.

* <math> (X,W)  \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid (Z,W) </math>.
Both versions can be shown to follow from the weak union property given initially via the same method as in the decomposition section above.

===Contraction===

: <math>
\left.\begin{align}
  X \perp\!\!\!\perp A \mid B \\
  X \perp\!\!\!\perp B
\end{align}\right\}\text{ and }
\quad \Rightarrow \quad
X \perp\!\!\!\perp A,B
</math>

'''Proof'''

This property can be proved by noticing <math>\Pr(X\mid A,B) = \Pr(X\mid B) = \Pr(X)</math>, each equality of which is asserted by <math>X \perp\!\!\!\perp A \mid B</math> and <math>X \perp\!\!\!\perp B</math>, respectively.

===Intersection===

For strictly positive probability distributions,<ref name=pearl:2000 /> the following also holds:

: <math>
\left.\begin{align}
  X \perp\!\!\!\perp Y \mid Z, W\\
  X \perp\!\!\!\perp W \mid Z, Y
\end{align}\right\}\text{ and }
\quad \Rightarrow \quad
X \perp\!\!\!\perp W, Y \mid Z
</math>

'''Proof'''

By assumption:

: <math>P(X|Z, W, Y) = P(X|Z, W) \land P(X|Z, W, Y) = P(X|Z, Y) \implies P(X|Z, Y) = P(X|Z, W)</math>

Using this equality, together with the [[Law of total probability]] applied to <math>P(X|Z)</math>:

: <math>\begin{align}
P(X|Z) &= \sum_{w \in W} P(X|Z, W=w)P(W=w|Z) \\[4pt]
&= \sum_{w \in W} P(X|Y, Z)P(W=w|Z) \\[4pt]
&= P(X|Z, Y) \sum_{w \in W} P(W=w|Z) \\[4pt]
&= P(X|Z, Y)
\end{align}</math>

Since <math>P(X|Z, W, Y) = P(X|Z, Y)</math> and <math>P(X|Z, Y) = P(X|Z)</math>, it follows that <math>P(X|Z, W, Y) = P(X|Z) \iff X \perp\!\!\!\perp Y,W | Z</math>.

Technical note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say&nbsp;''K''. For example, <math>X \perp\!\!\!\perp Y \Rightarrow Y \perp\!\!\!\perp X</math> would also mean that <math>X \perp\!\!\!\perp Y \mid K  \Rightarrow Y \perp\!\!\!\perp X \mid K</math>.