Editing Independence (probability theory) (section)

==Definition==
===For events===
====Two events====
Two events <math>A</math> and <math>B</math> are independent (often written as <math>A \perp B</math> or <math>A \perp\!\!\!\perp B</math>, where the latter symbol often is also used for [[conditional independence]]) if and only if their [[joint probability]] equals the product of their probabilities:<ref name=Florescu>{{cite book | author=Florescu, Ionut| title=Probability and Stochastic Processes| publisher=Wiley| year=2014 | isbn=978-0-470-62455-5}}</ref>{{rp|p. 29}}<ref name=Gallager/>{{rp|p. 10}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>\mathrm{P}(A \cap B) = \mathrm{P}(A)\mathrm{P}(B)</math>|{{EquationRef|Eq.1}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

<math>A \cap B \neq \emptyset</math> indicates that two independent events <math>A</math> and <math>B</math> have common elements in their [[sample space]] so that they are not [[Mutual exclusivity|mutually exclusive]] (mutually exclusive iff <math>A \cap B = \emptyset</math>). Why this defines independence is made clear by rewriting with [[Conditional probability|conditional probabilities]] <math>P(A \mid B) = \frac{P(A \cap B)}{P(B)}</math> as the probability at which the event <math>A</math> occurs provided that the event <math>B</math> has or is assumed to have occurred:

:<math>\mathrm{P}(A \cap B) = \mathrm{P}(A)\mathrm{P}(B) \iff \mathrm{P}(A\mid B) = \frac{\mathrm{P}(A \cap B)}{\mathrm{P}(B)} = \mathrm{P}(A).</math>

and similarly

:<math>\mathrm{P}(A \cap B) = \mathrm{P}(A)\mathrm{P}(B) \iff\mathrm{P}(B\mid A) = \frac{\mathrm{P}(A \cap B)}{\mathrm{P}(A)} = \mathrm{P}(B).</math>

Thus, the occurrence of <math>B</math> does not affect the probability of <math>A</math>, and vice versa. In other words, <math>A</math> and <math>B</math> are independent of each other. Although the derived expressions may seem more intuitive, they are not the preferred definition, as the conditional probabilities may be undefined if <math>\mathrm{P}(A)</math> or <math>\mathrm{P}(B)</math> are 0. Furthermore, the preferred definition makes clear by symmetry that when <math>A</math> is independent of <math>B</math>, <math>B</math> is also independent of <math>A</math>.

====Odds====
Stated in terms of [[odds]], two events are independent if and only if the [[odds ratio]] of {{tmath|A}} and {{tmath|B}} is unity (1). Analogously with probability, this is equivalent to the conditional odds being equal to the unconditional odds:
:<math>O(A \mid B) = O(A) \text{ and } O(B \mid A) = O(B),</math>
or to the odds of one event, given the other event, being the same as the odds of the event, given the other event not occurring:
:<math>O(A \mid B) = O(A \mid \neg B) \text{ and } O(B \mid A) = O(B \mid \neg A).</math>
The odds ratio can be defined as
:<math>O(A \mid B) : O(A \mid \neg B),</math>
or symmetrically for odds of {{tmath|B}} given {{tmath|A}}, and thus is 1 if and only if the events are independent.

====More than two events====
A finite set of events <math>\{ A_i \} _{i=1}^{n}</math> is [[Pairwise independence|pairwise independent]] if every pair of events is independent<ref name ="Feller">{{cite book | last = Feller | first = W | year = 1971 | title = An Introduction to Probability Theory and Its Applications | publisher = [[John Wiley & Sons|Wiley]] | chapter = Stochastic Independence}}</ref>&mdash;that is, if and only if for all distinct pairs of indices <math>m,k</math>,

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>\mathrm{P}(A_m \cap A_k) = \mathrm{P}(A_m)\mathrm{P}(A_k)</math>|{{EquationRef|Eq.2}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

A finite set of events is '''mutually independent''' if every event is independent of any intersection of the other events<ref name="Feller" /><ref name=Gallager/>{{rp|p. 11}}&mdash;that is, if and only if for every <math>k \leq n</math> and for every k indices <math>1\le i_1 < \dots < i_k \le n</math>,

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>\mathrm{P}\left(\bigcap_{j=1}^k A_{i_j} \right)=\prod_{j=1}^k \mathrm{P}(A_{i_j} )</math>|{{EquationRef|Eq.3}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

This is called the ''multiplication rule'' for independent events. It is [[#Triple-independence but no pairwise-independence|not a single condition]] involving only the product of all the probabilities of all single events; it must hold true for all subsets of events.

For more than two events, a mutually independent set of events is (by definition) pairwise independent; but the converse is [[#Pairwise and mutual independence|not necessarily true]].<ref name=Florescu/>{{rp|p. 30}}

====Log probability and information content====
Stated in terms of [[log probability]], two events are independent if and only if the log probability of the joint event is the sum of the log probability of the individual events:
:<math>\log \mathrm{P}(A \cap B) = \log \mathrm{P}(A) + \log \mathrm{P}(B)</math>
In [[information theory]], negative log probability is interpreted as [[information content]], and thus two events are independent if and only if the information content of the combined event equals the sum of information content of the individual events:
:<math>\mathrm{I}(A \cap B) = \mathrm{I}(A) + \mathrm{I}(B)</math>
See ''{{slink|Information content|Additivity of independent events}}'' for details.

===For real valued random variables===
====Two random variables====
Two random variables <math>X</math> and <math>Y</math> are independent [[if and only if]] (iff) the elements of the [[Pi system|{{pi}}-system]] generated by them are independent; that is to say, for every <math>x</math> and <math>y</math>, the events <math>\{ X \le x\}</math> and <math>\{ Y \le y\}</math> are independent events (as defined above in {{EquationNote|Eq.1}}). That is, <math>X</math> and <math>Y</math> with [[cumulative distribution function]]s <math>F_X(x)</math> and <math>F_Y(y)</math>, are independent [[if and only if|iff]] the combined random variable <math>(X,Y)</math> has a [[joint distribution|joint]] cumulative distribution function<ref name=Gallager>{{cite book | first=Robert G. | last=Gallager| title=Stochastic Processes Theory for Applications| publisher=Cambridge University Press| year=2013 | isbn=978-1-107-03975-9}}</ref>{{rp|p. 15}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X,Y}(x,y) = F_X(x) F_Y(y) \quad \text{for all } x,y</math>|{{EquationRef|Eq.4}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

or equivalently, if the [[probability density function|probability densities]] <math>f_X(x)</math> and <math>f_Y(y)</math> and the joint probability density <math>f_{X,Y}(x,y)</math> exist,

:<math>f_{X,Y}(x,y) = f_X(x) f_Y(y) \quad \text{for all } x,y.</math>

====More than two random variables====
A finite set of <math>n</math> random variables <math>\{X_1,\ldots,X_n\}</math> is [[pairwise independent]] if and only if every pair of random variables is independent. Even if the set of random variables is pairwise independent, it is not necessarily ''mutually independent'' as defined next.

A finite set of <math>n</math> random variables <math>\{X_1,\ldots,X_n\}</math> is '''mutually independent''' if and only if for any sequence of numbers <math>\{x_1, \ldots, x_n\}</math>, the events <math>\{X_1 \le x_1\}, \ldots, \{X_n \le x_n \}</math> are mutually independent events (as defined above in {{EquationNote|Eq.3}}). This is equivalent to the following condition on the joint cumulative distribution function {{nowrap|<math>F_{X_1,\ldots,X_n}(x_1,\ldots,x_n)</math>.}} A finite set of <math>n</math> random variables <math>\{X_1,\ldots,X_n\}</math> is mutually independent if and only if<ref name=Gallager/>{{rp|p. 16}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = F_{X_1}(x_1) \cdot \ldots \cdot F_{X_n}(x_n) \quad \text{for all } x_1,\ldots,x_n</math>|{{EquationRef|Eq.5}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

It is not necessary here to require that the probability distribution factorizes for all possible {{nowrap|<math>k</math>-element}} subsets as in the case for <math>n</math> events. This is not required because e.g. <math>F_{X_1,X_2,X_3}(x_1,x_2,x_3) = F_{X_1}(x_1) \cdot F_{X_2}(x_2) \cdot F_{X_3}(x_3)</math> implies <math>F_{X_1,X_3}(x_1,x_3) = F_{X_1}(x_1) \cdot F_{X_3}(x_3)</math>.

The measure-theoretically inclined reader may prefer to substitute events <math>\{ X \in A \}</math> for events <math>\{ X \leq x \}</math> in the above definition, where <math>A</math> is any [[Borel algebra|Borel set]]. That definition is exactly equivalent to the one above when the values of the random variables are [[real number]]s. It has the advantage of working also for complex-valued random variables or for random variables taking values in any [[measurable space]] (which includes [[topological space]]s endowed by appropriate σ-algebras).

===For real valued random vectors===
Two random vectors <math>\mathbf{X}=(X_1,\ldots,X_m)^\mathrm{T}</math> and <math>\mathbf{Y}=(Y_1,\ldots,Y_n)^\mathrm{T}</math> are called independent if<ref name="Papoulis">{{cite book | last = Papoulis| first =Athanasios| title = Probability, Random Variables and Stochastic Processes | publisher = MCGraw Hill | year = 1991| isbn = 0-07-048477-5}}</ref>{{rp|p. 187}}

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{\mathbf{X,Y}}(\mathbf{x,y}) = F_{\mathbf{X}}(\mathbf{x}) \cdot F_{\mathbf{Y}}(\mathbf{y}) \quad \text{for all } \mathbf{x},\mathbf{y}</math>|{{EquationRef|Eq.6}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

where <math>F_{\mathbf{X}}(\mathbf{x})</math> and <math>F_{\mathbf{Y}}(\mathbf{y})</math> denote the cumulative distribution functions of <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> and <math>F_{\mathbf{X,Y}}(\mathbf{x,y})</math> denotes their joint cumulative distribution function. Independence of <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> is often denoted by <math>\mathbf{X} \perp\!\!\!\perp \mathbf{Y}</math>.
Written component-wise, <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> are called independent if
:<math>F_{X_1,\ldots,X_m,Y_1,\ldots,Y_n}(x_1,\ldots,x_m,y_1,\ldots,y_n) = F_{X_1,\ldots,X_m}(x_1,\ldots,x_m) \cdot F_{Y_1,\ldots,Y_n}(y_1,\ldots,y_n) \quad \text{for all } x_1,\ldots,x_m,y_1,\ldots,y_n.</math>

===For stochastic processes===
====For one stochastic process====
The definition of independence may be extended from random vectors to a [[stochastic process]]. Therefore, it is required for an independent stochastic process that the random variables obtained by sampling the process at any <math>n</math> times <math>t_1,\ldots,t_n</math> are independent random variables for any <math>n</math>.<ref name=HweiHsu>{{cite book| last1=Hwei| first1=Piao| title=Theory and Problems of Probability, Random Variables, and Random Processes| publisher=McGraw-Hill| year=1997| isbn=0-07-030644-3| url-access=registration| url=https://archive.org/details/schaumsoutlineof00hsuh}}</ref>{{rp|p. 163}}

Formally, a stochastic process <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> is called independent, if and only if for all <math>n\in \mathbb{N}</math> and for all <math>t_1,\ldots,t_n\in\mathcal{T}</math>

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X_{t_1},\ldots,X_{t_n}}(x_1,\ldots,x_n) = F_{X_{t_1}}(x_1) \cdot \ldots \cdot F_{X_{t_n}}(x_n) \quad \text{for all } x_1,\ldots,x_n</math>|{{EquationRef|Eq.7}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

where {{nowrap|<math>F_{X_{t_1},\ldots,X_{t_n}}(x_1,\ldots,x_n) = \mathrm{P}(X(t_1) \leq x_1,\ldots,X(t_n) \leq x_n)</math>.}} Independence of a stochastic process is a property ''within'' a stochastic process, not between two stochastic processes.

====For two stochastic processes====
Independence of two stochastic processes is a property between two stochastic processes <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> and <math>\left\{ Y_t \right\}_{t\in\mathcal{T}}</math> that are defined on the same probability space <math>(\Omega,\mathcal{F},P)</math>. Formally, two stochastic processes <math>\left\{ X_t \right\}_{t\in\mathcal{T}}</math> and <math>\left\{ Y_t \right\}_{t\in\mathcal{T}}</math> are said to be independent if for all <math>n\in \mathbb{N}</math> and for all <math>t_1,\ldots,t_n\in\mathcal{T}</math>, the random vectors <math>(X(t_1),\ldots,X(t_n))</math> and <math>(Y(t_1),\ldots,Y(t_n))</math> are independent,<ref name="Lapidoth2017">{{cite book|author=Amos Lapidoth|title=A Foundation in Digital Communication|url=https://books.google.com/books?id=6oTuDQAAQBAJ&q=independence|date=8 February 2017|publisher=Cambridge University Press|isbn=978-1-107-17732-1}}</ref>{{rp|p. 515}} i.e. if

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>F_{X_{t_1},\ldots,X_{t_n},Y_{t_1},\ldots,Y_{t_n}}(x_1,\ldots,x_n,y_1,\ldots,y_n) = F_{X_{t_1},\ldots,X_{t_n}}(x_1,\ldots,x_n) \cdot F_{Y_{t_1},\ldots,Y_{t_n}}(y_1,\ldots,y_n) \quad \text{for all } x_1,\ldots,x_n</math>|{{EquationRef|Eq.8}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

===Independent σ-algebras===
The definitions above ({{EquationNote|Eq.1}} and {{EquationNote|Eq.2}}) are both generalized by the following definition of independence for [[sigma algebra|σ-algebras]]. Let <math>(\Omega, \Sigma, \mathrm{P})</math> be a probability space and let <math>\mathcal{A}</math> and <math>\mathcal{B}</math> be two sub-σ-algebras of <math>\Sigma</math>. <math>\mathcal{A}</math> and <math>\mathcal{B}</math> are said to be independent if, whenever <math>A \in \mathcal{A}</math> and <math>B \in \mathcal{B}</math>,

:<math>\mathrm{P}(A \cap B) = \mathrm{P}(A) \mathrm{P}(B).</math>

Likewise, a finite family of σ-algebras <math>(\tau_i)_{i\in I}</math>, where <math>I</math> is an [[index set]], is said to be independent if and only if

:<math>\forall \left(A_i\right)_{i\in I} \in \prod\nolimits_{i\in I}\tau_i \ : \ \mathrm{P}\left(\bigcap\nolimits_{i\in I}A_i\right) = \prod\nolimits_{i\in I}\mathrm{P}\left(A_i\right)</math>

and an infinite family of σ-algebras is said to be independent if all its finite subfamilies are independent.

The new definition relates to the previous ones very directly:
* Two events are independent (in the old sense) [[if and only if]] the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by an event <math>E \in \Sigma</math> is, by definition,
::<math>\sigma(\{E\}) = \{ \emptyset, E, \Omega \setminus E, \Omega \}.</math>
* Two random variables <math>X</math> and <math>Y</math> defined over <math>\Omega</math> are independent (in the old sense) if and only if the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by a random variable <math>X</math> taking values in some [[measurable space]] <math>S</math> consists, by definition, of all subsets of <math>\Omega</math> of the form <math>X^{-1}(U)</math>, where <math>U</math> is any measurable subset of <math>S</math>.

Using this definition, it is easy to show that if <math>X</math> and <math>Y</math> are random variables and <math>Y</math> is constant, then <math>X</math> and <math>Y</math> are independent, since the σ-algebra generated by a constant random variable is the trivial σ-algebra <math>\{ \varnothing, \Omega \}</math>. Probability zero events cannot affect independence so independence also holds if <math>Y</math> is only Pr-[[almost surely]] constant.