Editing Independence (probability theory)

{{Short description|When the occurrence of one event does not affect the likelihood of another}}
{{Probability fundamentals}}

'''Independence''' is a fundamental notion in [[probability theory]], as in [[statistics]] and the theory of [[stochastic processes]]. Two [[event (probability theory)|event]]s are '''independent''', '''statistically independent''', or '''stochastically independent'''<ref name="Artificial Intelligence">{{cite book | last1 = Russell| first1 =Stuart| last2 = Norvig | first2 = Peter | title = Artificial Intelligence: A Modern Approach | url = https://archive.org/details/artificialintell00russ_726| url-access = limited| page = [https://archive.org/details/artificialintell00russ_726/page/n506 478] | publisher = [[Prentice Hall]] | year = 2002 | isbn = 0-13-790395-2}}</ref> if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the [[odds]]. Similarly, two [[random variable]]s are independent if the realization of one does not affect the [[probability distribution]] of the other.

When dealing with collections of more than two events, two notions of independence need to be distinguished. The events are called [[Pairwise independence|pairwise independent]] if any two events in the collection are independent of each other, while '''mutual independence''' (or '''collective independence''') of events means, informally speaking, that each event is independent of any combination of other events in the collection. A similar notion exists for collections of random variables. Mutual independence implies pairwise independence, but not the other way around. In the standard literature of probability theory, statistics, and stochastic processes, '''independence''' without further qualification usually refers to mutual independence.

==Definition==
===For events===
====Two events====
Two events <math>A</math> and <math>B</math> are independent (often written as <math>A \perp B</math> or <math>A \perp\!\!\!\perp B</math>, where the latter symbol often is also used for [[conditional independence]]) if and only if their [[joint probability]] equals the product of their probabilities:<ref name=Florescu>{{cite book | author=Florescu, Ionut| title=Probability and Stochastic Processes| publisher=Wiley| year=2014 | isbn=978-0-470-62455-5}}</ref>{{rp|p. 29}}<ref name=Gallager/>{{rp|p. 10}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>\mathrm{P}(A \cap B) = \mathrm{P}(A)\mathrm{P}(B)</math>|{{EquationRef|Eq.1}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

<math>A \cap B \neq \emptyset</math> indicates that two independent events <math>A</math> and <math>B</math> have common elements in their [[sample space]] so that they are not [[Mutual exclusivity|mutually exclusive]] (mutually exclusive iff <math>A \cap B = \emptyset</math>). Why this defines independence is made clear by rewriting with [[Conditional probability|conditional probabilities]] <math>P(A \mid B) = \frac{P(A \cap B)}{P(B)}</math> as the probability at which the event <math>A</math> occurs provided that the event <math>B</math> has or is assumed to have occurred:

:<math>\mathrm{P}(A \cap B) = \mathrm{P}(A)\mathrm{P}(B) \iff \mathrm{P}(A\mid B) = \frac{\mathrm{P}(A \cap B)}{\mathrm{P}(B)} = \mathrm{P}(A).</math>

and similarly

:<math>\mathrm{P}(A \cap B) = \mathrm{P}(A)\mathrm{P}(B) \iff\mathrm{P}(B\mid A) = \frac{\mathrm{P}(A \cap B)}{\mathrm{P}(A)} = \mathrm{P}(B).</math>

Thus, the occurrence of <math>B</math> does not affect the probability of <math>A</math>, and vice versa. In other words, <math>A</math> and <math>B</math> are independent of each other. Although the derived expressions may seem more intuitive, they are not the preferred definition, as the conditional probabilities may be undefined if <math>\mathrm{P}(A)</math> or <math>\mathrm{P}(B)</math> are 0. Furthermore, the preferred definition makes clear by symmetry that when <math>A</math> is independent of <math>B</math>, <math>B</math> is also independent of <math>A</math>.

====Odds====
Stated in terms of [[odds]], two events are independent if and only if the [[odds ratio]] of {{tmath|A}} and {{tmath|B}} is unity (1). Analogously with probability, this is equivalent to the conditional odds being equal to the unconditional odds:
:<math>O(A \mid B) = O(A) \text{ and } O(B \mid A) = O(B),</math>
or to the odds of one event, given the other event, being the same as the odds of the event, given the other event not occurring:
:<math>O(A \mid B) = O(A \mid \neg B) \text{ and } O(B \mid A) = O(B \mid \neg A).</math>
The odds ratio can be defined as
:<math>O(A \mid B) : O(A \mid \neg B),</math>
or symmetrically for odds of {{tmath|B}} given {{tmath|A}}, and thus is 1 if and only if the events are independent.

====More than two events====
A finite set of events <math>\{ A_i \} _{i=1}^{n}</math> is [[Pairwise independence|pairwise independent]] if every pair of events is independent<ref name ="Feller">{{cite book | last = Feller | first = W | year = 1971 | title = An Introduction to Probability Theory and Its Applications | publisher = [[John Wiley & Sons|Wiley]] | chapter = Stochastic Independence}}</ref>&mdash;that is, if and only if for all distinct pairs of indices <math>m,k</math>,

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>\mathrm{P}(A_m \cap A_k) = \mathrm{P}(A_m)\mathrm{P}(A_k)</math>|{{EquationRef|Eq.2}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

A finite set of events is '''mutually independent''' if every event is independent of any intersection of the other events<ref name="Feller" /><ref name=Gallager/>{{rp|p. 11}}&mdash;that is, if and only if for every <math>k \leq n</math> and for every k indices <math>1\le i_1 < \dots < i_k \le n</math>,

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>\mathrm{P}\left(\bigcap_{j=1}^k A_{i_j} \right)=\prod_{j=1}^k \mathrm{P}(A_{i_j} )</math>|{{EquationRef|Eq.3}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

This is called the ''multiplication rule'' for independent events. It is [[#Triple-independence but no pairwise-independence|not a single condition]] involving only the product of all the probabilities of all single events; it must hold true for all subsets of events.

For more than two events, a mutually independent set of events is (by definition) pairwise independent; but the converse is [[#Pairwise and mutual independence|not necessarily true]].<ref name=Florescu/>{{rp|p. 30}}

====Log probability and information content====
Stated in terms of [[log probability]], two events are independent if and only if the log probability of the joint event is the sum of the log probability of the individual events:
:<math>\log \mathrm{P}(A \cap B) = \log \mathrm{P}(A) + \log \mathrm{P}(B)</math>
In [[information theory]], negative log probability is interpreted as [[information content]], and thus two events are independent if and only if the information content of the combined event equals the sum of information content of the individual events:
:<math>\mathrm{I}(A \cap B) = \mathrm{I}(A) + \mathrm{I}(B)</math>
See ''{{slink|Information content|Additivity of independent events}}'' for details.

===For real valued random variables===
====Two random variables====
Two random variables <math>X</math> and <math>Y</math> are independent [[if and only if]] (iff) the elements of the [[Pi system|{{pi}}-system]] generated by them are independent; that is to say, for every <math>x</math> and <math>y</math>, the events <math>\{ X \le x\}</math> and <math>\{ Y \le y\}</math> are independent events (as defined above in {{EquationNote|Eq.1}}). That is, <math>X</math> and <math>Y</math> with [[cumulative distribution function]]s <math>F_X(x)</math> and <math>F_Y(y)</math>, are independent [[if and only if|iff]] the combined random variable <math>(X,Y)</math> has a [[joint distribution|joint]] cumulative distribution function<ref name=Gallager>{{cite book | first=Robert G. | last=Gallager| title=Stochastic Processes Theory for Applications| publisher=Cambridge University Press| year=2013 | isbn=978-1-107-03975-9}}</ref>{{rp|p. 15}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X,Y}(x,y) = F_X(x) F_Y(y) \quad \text{for all } x,y</math>|{{EquationRef|Eq.4}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

or equivalently, if the [[probability density function|probability densities]] <math>f_X(x)</math> and <math>f_Y(y)</math> and the joint probability density <math>f_{X,Y}(x,y)</math> exist,

:<math>f_{X,Y}(x,y) = f_X(x) f_Y(y) \quad \text{for all } x,y.</math>

====More than two random variables====
A finite set of <math>n</math> random variables <math>\{X_1,\ldots,X_n\}</math> is [[pairwise independent]] if and only if every pair of random variables is independent. Even if the set of random variables is pairwise independent, it is not necessarily ''mutually independent'' as defined next.

A finite set of <math>n</math> random variables <math>\{X_1,\ldots,X_n\}</math> is '''mutually independent''' if and only if for any sequence of numbers <math>\{x_1, \ldots, x_n\}</math>, the events <math>\{X_1 \le x_1\}, \ldots, \{X_n \le x_n \}</math> are mutually independent events (as defined above in {{EquationNote|Eq.3}}). This is equivalent to the following condition on the joint cumulative distribution function {{nowrap|<math>F_{X_1,\ldots,X_n}(x_1,\ldots,x_n)</math>.}} A finite set of <math>n</math> random variables <math>\{X_1,\ldots,X_n\}</math> is mutually independent if and only if<ref name=Gallager/>{{rp|p. 16}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = F_{X_1}(x_1) \cdot \ldots \cdot F_{X_n}(x_n) \quad \text{for all } x_1,\ldots,x_n</math>|{{EquationRef|Eq.5}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

It is not necessary here to require that the probability distribution factorizes for all possible {{nowrap|<math>k</math>-element}} subsets as in the case for <math>n</math> events. This is not required because e.g. <math>F_{X_1,X_2,X_3}(x_1,x_2,x_3) = F_{X_1}(x_1) \cdot F_{X_2}(x_2) \cdot F_{X_3}(x_3)</math> implies <math>F_{X_1,X_3}(x_1,x_3) = F_{X_1}(x_1) \cdot F_{X_3}(x_3)</math>.

The measure-theoretically inclined reader may prefer to substitute events <math>\{ X \in A \}</math> for events <math>\{ X \leq x \}</math> in the above definition, where <math>A</math> is any [[Borel algebra|Borel set]]. That definition is exactly equivalent to the one above when the values of the random variables are [[real number]]s. It has the advantage of working also for complex-valued random variables or for random variables taking values in any [[measurable space]] (which includes [[topological space]]s endowed by appropriate σ-algebras).

===For real valued random vectors===
Two random vectors <math>\mathbf{X}=(X_1,\ldots,X_m)^\mathrm{T}</math> and <math>\mathbf{Y}=(Y_1,\ldots,Y_n)^\mathrm{T}</math> are called independent if<ref name="Papoulis">{{cite book | last = Papoulis| first =Athanasios| title = Probability, Random Variables and Stochastic Processes | publisher = MCGraw Hill | year = 1991| isbn = 0-07-048477-5}}</ref>{{rp|p. 187}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{\mathbf{X,Y}}(\mathbf{x,y}) = F_{\mathbf{X}}(\mathbf{x}) \cdot F_{\mathbf{Y}}(\mathbf{y}) \quad \text{for all } \mathbf{x},\mathbf{y}</math>|{{EquationRef|Eq.6}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

where <math>F_{\mathbf{X}}(\mathbf{x})</math> and <math>F_{\mathbf{Y}}(\mathbf{y})</math> denote the cumulative distribution functions of <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> and <math>F_{\mathbf{X,Y}}(\mathbf{x,y})</math> denotes their joint cumulative distribution function. Independence of <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> is often denoted by <math>\mathbf{X} \perp\!\!\!\perp \mathbf{Y}</math>.
Written component-wise, <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> are called independent if
:<math>F_{X_1,\ldots,X_m,Y_1,\ldots,Y_n}(x_1,\ldots,x_m,y_1,\ldots,y_n) = F_{X_1,\ldots,X_m}(x_1,\ldots,x_m) \cdot F_{Y_1,\ldots,Y_n}(y_1,\ldots,y_n) \quad \text{for all } x_1,\ldots,x_m,y_1,\ldots,y_n.</math>

===For stochastic processes===
====For one stochastic process====
The definition of independence may be extended from random vectors to a [[stochastic process]]. Therefore, it is required for an independent stochastic process that the random variables obtained by sampling the process at any <math>n</math> times <math>t_1,\ldots,t_n</math> are independent random variables for any <math>n</math>.<ref name=HweiHsu>{{cite book| last1=Hwei| first1=Piao| title=Theory and Problems of Probability, Random Variables, and Random Processes| publisher=McGraw-Hill| year=1997| isbn=0-07-030644-3| url-access=registration| url=https://archive.org/details/schaumsoutlineof00hsuh}}</ref>{{rp|p. 163}}

Formally, a stochastic process <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> is called independent, if and only if for all <math>n\in \mathbb{N}</math> and for all <math>t_1,\ldots,t_n\in\mathcal{T}</math>

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X_{t_1},\ldots,X_{t_n}}(x_1,\ldots,x_n) = F_{X_{t_1}}(x_1) \cdot \ldots \cdot F_{X_{t_n}}(x_n) \quad \text{for all } x_1,\ldots,x_n</math>|{{EquationRef|Eq.7}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

where {{nowrap|<math>F_{X_{t_1},\ldots,X_{t_n}}(x_1,\ldots,x_n) = \mathrm{P}(X(t_1) \leq x_1,\ldots,X(t_n) \leq x_n)</math>.}} Independence of a stochastic process is a property ''within'' a stochastic process, not between two stochastic processes.

====For two stochastic processes====
Independence of two stochastic processes is a property between two stochastic processes <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> and <math>\left\{ Y_t \right\}_{t\in\mathcal{T}}</math> that are defined on the same probability space <math>(\Omega,\mathcal{F},P)</math>. Formally, two stochastic processes <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> and <math>\left\{ Y_t \right\}_{t\in\mathcal{T}}</math> are said to be independent if for all <math>n\in \mathbb{N}</math> and for all <math>t_1,\ldots,t_n\in\mathcal{T}</math>, the random vectors <math>(X(t_1),\ldots,X(t_n))</math> and <math>(Y(t_1),\ldots,Y(t_n))</math> are independent,<ref name="Lapidoth2017">{{cite book|author=Amos Lapidoth|title=A Foundation in Digital Communication|url=https://books.google.com/books?id=6oTuDQAAQBAJ&q=independence|date=8 February 2017|publisher=Cambridge University Press|isbn=978-1-107-17732-1}}</ref>{{rp|p. 515}} i.e. if

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X_{t_1},\ldots,X_{t_n},Y_{t_1},\ldots,Y_{t_n}}(x_1,\ldots,x_n,y_1,\ldots,y_n) = F_{X_{t_1},\ldots,X_{t_n}}(x_1,\ldots,x_n) \cdot F_{Y_{t_1},\ldots,Y_{t_n}}(y_1,\ldots,y_n) \quad \text{for all } x_1,\ldots,x_n</math>|{{EquationRef|Eq.8}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

===Independent σ-algebras===
The definitions above ({{EquationNote|Eq.1}} and {{EquationNote|Eq.2}}) are both generalized by the following definition of independence for [[sigma algebra|σ-algebras]]. Let <math>(\Omega, \Sigma, \mathrm{P})</math> be a probability space and let <math>\mathcal{A}</math> and <math>\mathcal{B}</math> be two sub-σ-algebras of <math>\Sigma</math>. <math>\mathcal{A}</math> and <math>\mathcal{B}</math> are said to be independent if, whenever <math>A \in \mathcal{A}</math> and <math>B \in \mathcal{B}</math>,

:<math>\mathrm{P}(A \cap B) = \mathrm{P}(A) \mathrm{P}(B).</math>

Likewise, a finite family of σ-algebras <math>(\tau_i)_{i\in I}</math>, where <math>I</math> is an [[index set]], is said to be independent if and only if

:<math>\forall \left(A_i\right)_{i\in I} \in \prod\nolimits_{i\in I}\tau_i \ : \ \mathrm{P}\left(\bigcap\nolimits_{i\in I}A_i\right) = \prod\nolimits_{i\in I}\mathrm{P}\left(A_i\right)</math>

and an infinite family of σ-algebras is said to be independent if all its finite subfamilies are independent.

The new definition relates to the previous ones very directly:
* Two events are independent (in the old sense) [[if and only if]] the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by an event <math>E \in \Sigma</math> is, by definition,
::<math>\sigma(\{E\}) = \{ \emptyset, E, \Omega \setminus E, \Omega \}.</math>
* Two random variables <math>X</math> and <math>Y</math> defined over <math>\Omega</math> are independent (in the old sense) if and only if the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by a random variable <math>X</math> taking values in some [[measurable space]] <math>S</math> consists, by definition, of all subsets of <math>\Omega</math> of the form <math>X^{-1}(U)</math>, where <math>U</math> is any measurable subset of <math>S</math>.

Using this definition, it is easy to show that if <math>X</math> and <math>Y</math> are random variables and <math>Y</math> is constant, then <math>X</math> and <math>Y</math> are independent, since the σ-algebra generated by a constant random variable is the trivial σ-algebra <math>\{ \varnothing, \Omega \}</math>. Probability zero events cannot affect independence so independence also holds if <math>Y</math> is only Pr-[[almost surely]] constant.

==Properties==
===Self-independence===
Note that an event is independent of itself if and only if

:<math>\mathrm{P}(A) = \mathrm{P}(A \cap A) = \mathrm{P}(A) \cdot \mathrm{P}(A) \iff \mathrm{P}(A) = 0 \text{ or } \mathrm{P}(A) = 1.</math>

Thus an event is independent of itself if and only if it [[almost surely]] occurs or its [[Complement (set theory)|complement]] almost surely occurs; this fact is useful when proving [[zero–one law]]s.<ref>{{cite book|last=Durrett|first=Richard|author-link=Rick Durrett|title=Probability: theory and examples|edition=Second|year=1996}} page 62</ref>

===Expectation and covariance===
{{main|Correlation and dependence}}
If <math>X</math> and <math>Y</math> are statistically independent random variables, then the [[expected value|expectation operator]] <math>\operatorname{E}</math> has the property

:<math>\operatorname{E}[X^n Y^m] = \operatorname{E}[X^n] \operatorname{E}[Y^m],</math><ref name=JakemanBook>{{cite book | author=E Jakeman| title=MODELING FLUCTUATIONS IN SCATTERED WAVES| isbn=978-0-7503-1005-5}}</ref>{{rp|p. 10}}

and the [[covariance]] <math>\operatorname{cov}[X,Y]</math> is zero, as follows from

:<math>\operatorname{cov}[X,Y] = \operatorname{E}[X Y] - \operatorname{E}[X] \operatorname{E}[Y].</math>

The converse does not hold: if two random variables have a covariance of 0 they still may be not independent. {{See also|Uncorrelatedness (probability theory)}}

Similarly for two stochastic processes <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> and <math>\left\{ Y_t \right\}_{t\in\mathcal{T}}</math>: If they are independent, then they are [[Uncorrelatedness (probability theory)|uncorrelated]].<ref name=KunIlPark>{{cite book | author=Park, Kun Il| title=Fundamentals of Probability and Stochastic Processes with Applications to Communications| publisher=Springer | year=2018 | isbn=978-3-319-68074-3}}</ref>{{rp|p. 151}}

===Characteristic function===
Two random variables <math>X</math> and <math>Y</math> are independent if and only if the [[characteristic function (probability theory)|characteristic function]] of the random vector <math>(X,Y)</math> satisfies
:<math>\varphi_{(X,Y)}(t,s) = \varphi_{X}(t)\cdot \varphi_{Y}(s).</math>

In particular the characteristic function of their sum is the product of their marginal characteristic functions:
:<math>\varphi_{X+Y}(t) = \varphi_X(t)\cdot\varphi_Y(t),</math>

though the reverse implication is not true. Random variables that satisfy the latter condition are called [[subindependence|subindependent]].

==Examples==
===Rolling dice===
The event of getting a 6 the first time a die is rolled and the event of getting a 6 the second time are ''independent''. By contrast, the event of getting a 6 the first time a die is rolled and the event that the sum of the numbers seen on the first and second trial is 8 are ''not'' independent.

===Drawing cards===
If two cards are drawn ''with'' replacement from a deck of cards, the event of drawing a red card on the first trial and that of drawing a red card on the second trial are ''independent''. By contrast, if two cards are drawn ''without'' replacement from a deck of cards, the event of drawing a red card on the first trial and that of drawing a red card on the second trial are ''not'' independent, because a deck that has had a red card removed has proportionately fewer red cards.

===Pairwise and mutual independence===
[[File:Pairwise independent.svg|thumb|Pairwise independent, but not mutually independent, events]]
[[File:Mutually independent.svg|thumb|Mutually independent events]]

Consider the two probability spaces shown. In both cases, <math>\mathrm{P}(A) = \mathrm{P}(B) = 1/2</math> and <math>\mathrm{P}(C) = 1/4</math>. The events in the first space are pairwise independent because <math>\mathrm{P}(A|B) = \mathrm{P}(A|C)=1/2=\mathrm{P}(A)</math>, <math>\mathrm{P}(B|A) = \mathrm{P}(B|C)=1/2=\mathrm{P}(B)</math>, and <math>\mathrm{P}(C|A) = \mathrm{P}(C|B)=1/4=\mathrm{P}(C)</math>; but the three events are not mutually independent. The events in the second space are both pairwise independent and mutually independent. To illustrate the difference, consider conditioning on two events. In the pairwise independent case, although any one event is independent of each of the other two individually, it is not independent of the intersection of the other two:

:<math>\mathrm{P}(A|BC) = \frac{\frac{4}{40}}{\frac{4}{40} + \frac{1}{40}} = \tfrac{4}{5} \ne \mathrm{P}(A)</math>
:<math>\mathrm{P}(B|AC) = \frac{\frac{4}{40}}{\frac{4}{40} + \frac{1}{40}} = \tfrac{4}{5} \ne \mathrm{P}(B)</math>
:<math>\mathrm{P}(C|AB) = \frac{\frac{4}{40}}{\frac{4}{40} + \frac{6}{40}} = \tfrac{2}{5} \ne \mathrm{P}(C)</math>

In the mutually independent case, however,
:<math>\mathrm{P}(A|BC) = \frac{\frac{1}{16}}{\frac{1}{16} + \frac{1}{16}} = \tfrac{1}{2} = \mathrm{P}(A)</math>
:<math>\mathrm{P}(B|AC) = \frac{\frac{1}{16}}{\frac{1}{16} + \frac{1}{16}} = \tfrac{1}{2} = \mathrm{P}(B)</math>
:<math>\mathrm{P}(C|AB) = \frac{\frac{1}{16}}{\frac{1}{16} + \frac{3}{16}} = \tfrac{1}{4} = \mathrm{P}(C)</math>

===Triple-independence but no pairwise-independence===
It is possible to create a three-event example in which

:<math>\mathrm{P}(A \cap B \cap C) = \mathrm{P}(A)\mathrm{P}(B)\mathrm{P}(C),</math>

and yet no two of the three events are pairwise independent (and hence the set of events are not mutually independent).<ref>George, Glyn, "Testing for the independence of three events," ''Mathematical Gazette'' 88, November 2004, 568. [http://www.engr.mun.ca/~ggeorge/MathGaz04.pdf PDF]</ref> This example shows that mutual independence involves requirements on the products of probabilities of all combinations of events, not just the single events as in this example.

==Conditional independence==
{{main|Conditional independence}}

===For events===
The events <math>A</math> and <math>B</math> are conditionally independent given an event <math>C</math> when

<math>\mathrm{P}(A \cap B \mid C) = \mathrm{P}(A \mid C) \cdot \mathrm{P}(B \mid C)</math>.

===For random variables===
Intuitively, two random variables <math>X</math> and <math>Y</math> are conditionally independent given <math>Z</math> if, once <math>Z</math> is known, the value of <math>Y</math> does not add any additional information about <math>X</math>. For instance, two measurements <math>X</math> and <math>Y</math> of the same underlying quantity <math>Z</math> are not independent, but they are conditionally independent given <math>Z</math> (unless the errors in the two measurements are somehow connected).

The formal definition of conditional independence is based on the idea of [[conditional distribution]]s. If <math>X</math>, <math>Y</math>, and <math>Z</math> are [[discrete random variable]]s, then we define <math>X</math> and <math>Y</math> to be conditionally independent given <math>Z</math> if

:<math>\mathrm{P}(X \le x, Y \le y\;|\;Z = z) = \mathrm{P}(X \le x\;|\;Z = z) \cdot \mathrm{P}(Y \le y\;|\;Z = z)</math>

for all <math>x</math>, <math>y</math> and <math>z</math> such that <math>\mathrm{P}(Z=z)>0</math>. On the other hand, if the random variables are [[Continuous random variable|continuous]] and have a joint [[probability density function]] <math>f_{XYZ}(x,y,z)</math>, then <math>X</math> and <math>Y</math> are conditionally independent given <math>Z</math> if

:<math>f_{XY|Z}(x, y | z) = f_{X|Z}(x | z) \cdot f_{Y|Z}(y | z)</math>

for all real numbers <math>x</math>, <math>y</math> and <math>z</math> such that <math>f_Z(z)>0</math>.

If discrete <math>X</math> and <math>Y</math> are conditionally independent given <math>Z</math>, then

:<math>\mathrm{P}(X = x | Y = y , Z = z) = \mathrm{P}(X = x | Z = z)</math>

for any <math>x</math>, <math>y</math> and <math>z</math> with <math>\mathrm{P}(Z=z)>0</math>. That is, the conditional distribution for <math>X</math> given <math>Y</math> and <math>Z</math> is the same as that given <math>Z</math> alone. A similar equation holds for the conditional probability density functions in the continuous case.

Independence can be seen as a special kind of conditional independence, since probability can be seen as a kind of conditional probability given no events.

== History ==
Before 1933, independence, in probability theory, was defined in a verbal manner. For example, [[de Moivre]] gave the following definition: “Two events are independent, when they have no connexion one with the other, and that the happening of one neither forwards nor obstructs the happening of the other”.<ref>Cited according to: Grinstead and Snell’s Introduction to Probability. In: The CHANCE Project. Version of July 4, 2006.</ref> If there are n independent events, the probability of the event, that all of them happen was computed as the product of the probabilities of these n events. Apparently, there was the conviction, that  this formula was a consequence of the above definition. (Sometimes this was called the Multiplication Theorem.), Of course, a proof of his assertion cannot work without further more formal tacit assumptions.

The definition of independence, given in this article, became the standard definition (now used in all books) after it appeared in 1933 as part of Kolmogorov's axiomatization of probability.<ref>[[Andrey Kolmogorov|Kolmogorov, Andrey]] (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung (in German). Berlin: Julius SpringerTranslation: Kolmogorov, Andrey (1956). Translation:Foundations of the Theory of Probability (2nd ed.). New York: Chelsea. ISBN 978-0-8284-0023-7.</ref> [[Andrey Kolmogorov|Kolmogorov]] credited it to [[Sergei Bernstein|S.N. Bernstein]], and quoted a publication which had appeared in Russian in 1927.<ref>[[Sergei Bernstein|S.N. Bernstein]], Probability Theory (Russian), Moscow, 1927 (4 editions, latest 1946)</ref>

Unfortunately, both Bernstein and Kolmogorov had not been aware of the work of the [[Georg Bohlmann]]. Bohlmann had given the same definition for two events in 1901<ref>[[Georg Bohlmann]]: Lebensversicherungsmathematik, Encyklop¨adie der mathematischen Wissenschaften, Bd I, Teil 2, Artikel I D 4b (1901), 852–917</ref> and for n events in 1908<ref>[[Georg Bohlmann]]: Die Grundbegriffe der Wahrscheinlichkeitsrechnung in ihrer Anwendung auf die Lebensversichrung, Atti del IV. Congr. Int. dei Matem. Rom, Bd. III (1908), 244–278.</ref> In the latter paper, he studied his notion in detail. For example, he gave the first example showing that pairwise  independence does not imply mutual independence.
Even today, Bohlmann is rarely quoted. More about his work can be found in ''On the contributions of Georg Bohlmann to probability theory'' from [[:de:Ulrich Krengel]].<ref>[[:de:Ulrich Krengel]]: On the contributions of Georg Bohlmann to probability theory (PDF; 6,4 MB), Electronic Journal for History of Probability and Statistics, 2011.</ref>

==See also==
*[[Copula (statistics)]]
*[[Independent and identically distributed random variables]]
*[[Mean dependence]]
*[[Normally distributed and uncorrelated does not imply independent]]

==References==
{{Reflist}}

==External links==
*{{Commons category-inline}}

[[Category:Independence (probability theory)| ]]
[[Category:Experiment (probability theory)]]