Editing De Finetti's theorem

{{Short description|Conditional independence of exchangeable observations}}
In [[probability theory]], '''de Finetti's theorem''' states that [[exchangeable random variables|exchangeable]] observations are [[conditionally independent]] relative to some [[latent variable]]. An  [[epistemic probability]] [[probability distribution|distribution]] could then be assigned to this variable. It is named in honor of [[Bruno de Finetti]], and one of its uses is in providing a pragmatic approach to de Finetti's well-known dictum "Probability does not exist".<ref>Spiegelhalter, D. 2024 [https://www.scientificamerican.com/article/why-probability-probably-doesnt-exist-but-its-useful-to-act-like-it-does] Why probability doesn't exist (but it's useful to act like it does) ''Scientific American'' 26 December </ref>

For the special case of an exchangeable sequence of [[Bernoulli distribution|Bernoulli]] random variables it states that such a sequence is a "[[mixture distribution|mixture]]" of sequences of [[independent and identically distributed]] (i.i.d.) Bernoulli random variables.

A sequence of random variables is called exchangeable if the joint distribution of the sequence is unchanged by any permutation of a finite set of indices. In general, while the variables of the exchangeable sequence are not ''themselves'' independent, only exchangeable, there is an ''underlying'' family of i.i.d. random variables. That is, there are underlying, generally unobservable, quantities that are i.i.d. – exchangeable sequences are mixtures of i.i.d. sequences.

== Background ==
A Bayesian statistician often seeks the conditional probability distribution of a random quantity given the data. The concept of [[exchangeability]] was introduced by de Finetti.  De Finetti's theorem explains a mathematical relationship between independence and exchangeability.<ref>See the Oxford lecture notes of Steffen Lauritzen  http://www.stats.ox.ac.uk/~steffen/teaching/grad/definetti.pdf</ref>

An infinite sequence

:<math>X_1, X_2, X_3, \dots </math>

of random variables is said to be exchangeable if for any [[natural number]] ''n'' and any finite sequence  ''i''<sub>1</sub>, ..., ''i''<sub>''n''</sub> and any permutation of the sequence π:{''i''<sub>1</sub>, ..., ''i''<sub>''n''</sub> } → {''i''<sub>1</sub>, ..., ''i''<sub>''n''</sub> },

:<math>(X_{i_1},\dots,X_{i_n}) \text{ and } (X_{\pi(i_1)},\dots,X_{\pi(i_n)}) </math>

both have the same [[joint probability distribution]].

If an identically distributed sequence is [[statistical independence|independent]], then the sequence is exchangeable; however, the converse is false—there exist exchangeable random variables that are not statistically independent, for example the [[Pólya urn model]].

== Statement of the theorem ==

A [[random variable]] ''X'' has a [[Bernoulli distribution]] if Pr(''X''&nbsp;=&nbsp;1)&nbsp;=&nbsp;''p'' and Pr(''X''&nbsp;=&nbsp;0)&nbsp;=&nbsp;1&nbsp;&minus;&nbsp;''p'' for some ''p''&nbsp;∈&nbsp;(0,&nbsp;1).

De Finetti's theorem states that the probability distribution of any infinite exchangeable sequence of Bernoulli random variables is a "[[mixture distribution|mixture]]" of the probability distributions of independent and identically distributed sequences of Bernoulli random variables.  "Mixture", in this sense, means a weighted average, but this need not mean a finite or countably infinite (i.e., discrete) weighted average: it can be an [[compound probability distribution|integral over a measure]]  rather than a sum.

More precisely, suppose ''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... is an infinite exchangeable sequence of Bernoulli-distributed random variables.  Then there is some probability measure ''m'' on the interval [0,&nbsp;1] and some random variable ''Y'' such that
* The probability measure of ''Y'' is ''m'', and
* The [[conditional probability distribution]] of the whole sequence ''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... given the value of ''Y'' is described by saying that
** ''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... are [[conditional independence|conditionally independent]] given ''Y'', and
** For any ''i'' ∈ {1, 2, 3, ...}, the conditional probability that ''X''<sub>''i''</sub> = 1, given the value of ''Y'', is ''Y''.

=== Another way of stating the theorem ===

Suppose <math>X_1,X_2,X_3,\ldots</math> is an infinite exchangeable sequence of Bernoulli random variables.  Then <math>X_1,X_2,X_3,\ldots</math> are conditionally independent and identically distributed given the [[Invariant sigma-algebra#Exchangeable sigma-algebra|exchangeable sigma-algebra]] (i.e., the sigma-algebra consisting of events that are measurable with respect to <math>X_1,X_2,\ldots</math> and invariant under finite permutations of the indices).

=== A plain-language consequence of the theorem ===
According to [[David Spiegelhalter]] (ref 1) the theorem provides a pragmatic approach to de Finetti's statement that "Probability does not exist". If our view of the probability of a sequence of events is subjective but remains unaffected by the order in which we make our observations, then the sequence can be regarded as [[Exchangeable_random_variables|''exchangeable'']]. De Finetti's theorem then implies that believing the sequence to be exchangeable is mathematically equivalent to acting as if the events are ''independent'' and have an objective underlying probability of occurring, with our uncertainty about what that probability is being expressed by a subjective probability distribution function. According to Spiegelhalter:  ″This is remarkable: it shows that, starting from a specific, but purely subjective, expression of convictions, we should act as if events were driven by objective chances."

== Example ==

As a concrete example, we construct a sequence

:<math>X_1, X_2, X_3, \dots </math>

of random variables, by "mixing" two i.i.d. sequences as follows.

We assume ''p'' = 2/3 with probability 1/2 and ''p'' = 9/10 with probability 1/2. Given the event ''p'' = 2/3, the conditional distribution of the sequence is that the ''X''<sub>i</sub> are independent and identically distributed and ''X''<sub>1</sub>&nbsp;=&nbsp;1 with probability 2/3 and ''X''<sub>1</sub>&nbsp;=&nbsp;0 with probability 1&nbsp;&minus;&nbsp;2/3.  Given the event  ''p''&nbsp;=&nbsp;9/10, the conditional distribution of the sequence is that the ''X''<sub>i</sub> are independent and identically distributed and ''X''<sub>1</sub> = 1 with probability 9/10 and ''X''<sub>1</sub>&nbsp;=&nbsp;0 with probability 1&nbsp;&minus;&nbsp;9/10.

This can be interpreted as follows: Make two biased coins, one showing "heads" with 2/3 probability and one showing "heads" with 9/10 probability. Flip a fair coin once to decide which biased coin to use for all flips that are recorded. Here "heads" at flip i means X<sub>i</sub>=1.

The independence asserted here is ''conditional'' independence, i.e. the Bernoulli random variables in the sequence are conditionally independent given the event that ''p''&nbsp;=&nbsp;2/3, and are conditionally independent given the event that ''p''&nbsp;=&nbsp;9/10. But they are not unconditionally independent; they are positively [[correlation|correlated]].

In view of the [[law of large numbers|strong law of large numbers]], we can say that

:<math>\lim_{n\rightarrow\infty} \frac{X_1+\cdots+X_n}{n} = \begin{cases}
2/3 & \text{with probability }1/2, \\
9/10 & \text{with probability }1/2.
\end{cases} </math>

Rather than concentrating probability 1/2 at each of two points between 0 and 1, the "mixing distribution" can be any [[probability distribution]] supported on the interval from 0 to 1; which one it is depends on the joint distribution of the infinite sequence of Bernoulli random variables.

The definition of exchangeability, and the statement of the theorem, also makes sense for finite length sequences

:<math>X_1,\dots, X_n,</math>

but the theorem is not generally true in that case.  It is true if the sequence can be extended to an exchangeable sequence that is infinitely long.  The simplest example of an exchangeable sequence of Bernoulli random variables that cannot be so extended is the one in which ''X''<sub>1</sub> = 1&nbsp;&minus;&nbsp;''X''<sub>2</sub> and ''X''<sub>1</sub> is either 0 or 1, each with probability 1/2.  This sequence is exchangeable, but cannot be extended to an exchangeable sequence of length 3, let alone an infinitely long one.

== As a categorical limit == 

De Finetti's theorem can be expressed as a [[limit (category theory)|categorical limit]] in the [[category of Markov kernels]].<ref>{{cite conference | last1 = Jacobs | first1 = Bart | last2 = Staton | first2 = Sam | title = De Finetti's theorem as a categorical limit | book-title = CMCS '20: Proceedings of the 15th IFIP WG 1.3 International Workshop of Coalgebraic Methods in Computer Science | date = 2020 | arxiv = 2003.01964| url = https://link.springer.com/book/10.1007/978-3-030-57201-3}}</ref><ref name="fritz">{{cite journal | first1=Tobias | last1=Fritz | first2=Tomáš | last2=Gonda | first3=Paolo | last3=Perrone | title=De Finetti's theorem in categorical probability | journal=Journal of Stochastic Analysis | volume=2 | issue=4 | year=2021 | doi=10.31390/josa.2.4.06 | url=https://repository.lsu.edu/josa/vol2/iss4/6/ | arxiv=2105.02639}}</ref><ref>{{cite conference | last1 = Moss | first1 = Sean | last2 = Perrone | first2 = Paolo | title = Probability monads with submonads of deterministic states | book-title = LICS '22: Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science | date = 2022 | doi = 10.1145/3531130.3533355 | arxiv = 2204.07003 | url = https://dl.acm.org/doi/10.1145/3531130.3533355}}</ref>

Let <math>(X,\mathcal{A})</math> be a [[standard Borel space]], and consider the space of sequences on <math>X</math>, the countable product <math>X^\mathbb{N}</math> (equipped with the [[product sigma-algebra]]).

Given a finite [[permutation]] <math>\sigma</math>, denote again by <math>\sigma</math> the permutation action on <math>X^\mathbb{N}</math>, as well as the [[Markov kernel]] <math>X^\mathbb{N}\to X^\mathbb{N}</math> induced by it. 
In terms of [[category theory]], we have a [[diagram (category theory)|diagram]] with a single object, <math>X^\mathbb{N}</math>, and a countable number of arrows, one for each permutation.

Recall now that a [[probability measure]] <math>p</math> is equivalently a [[Markov kernel]] from the one-point measurable space. 
A [[probability measure]] <math>p</math> on <math>X^\mathbb{N}</math> is [[exchangeability|exchangeable]] if and only if, as Markov kernels, <math>\sigma\circ p=p</math> for every permutation <math>\sigma</math>.
More generally, given any standard Borel space <math>Y</math>, one can call a Markov kernel <math>k:Y\to X</math> ''exchangeable'' if <math>\sigma\circ k=k</math> for every <math>\sigma</math>, i.e. if the following diagram commutes,

[[File:Definetti-cone.svg|center]]

giving a [[cone (category theory)|cone]].

De Finetti's theorem can be now stated as the fact that the space <math>PX</math> of [[probability measures]] over <math>X</math> ([[Giry monad]]) forms a [[universal property|universal]] (or [[limit (category theory)|limit]]) cone.<ref name="fritz"/>
More in detail, consider the Markov kernel <math>\mathrm{iid}_\mathbb{N}:PX\to X^\mathbb{N}</math> constructed as follows, using the [[Kolmogorov extension theorem]]:
:<math>
\mathrm{iid}_\mathbb{N}(A_1\times\dots\times A_n\times X\times\dots|p) = p(A_1)\cdots p(A_n)
</math>
for all measurable subsets <math>A_1,\dots,A_n</math> of <math>X</math>.
Note that we can interpret this kernel as taking a probability measure <math>p\in PX</math> as input and returning an [[Independent and identically distributed random variables|iid sequence]] on <math>X^\mathbb{N}</math> distributed according to <math>p</math>. Since iid sequences are exchangeable, <math>\mathrm{iid}_\mathbb{N}:PX\to X^\mathbb{N}</math> is an exchangeable kernel in the sense defined above.
The kernel <math>\mathrm{iid}_\mathbb{N}:PX\to X^\mathbb{N}</math> doesn't just form a cone, but a [[limit (category theory)|limit]] cone: given any exchangeable kernel <math>k:Y\to X</math>, there exists a unique kernel <math>\tilde{k}:Y\to PX</math> such that <math>k=\mathrm{iid}_\mathbb{N}\circ\tilde{k}</math>, i.e. making the following diagram commute:

[[File:Definetti-limit.svg|center]]

In particular, for any exchangeable probability measure <math>p</math> on <math>X^\mathbb{N}</math>, there exists a unique probability measure <math>\tilde{p}</math> on <math>PX</math> (i.e. a probability measure over probability measures) such that <math>p=\mathrm{iid}_\mathbb{N}\circ\tilde{p}</math>, i.e. such that for all measurable subsets <math>A_1,\dots,A_n</math> of <math>X</math>,
:<math>
p(A_1\times\dots\times A_n\times X\times\dots) = \int_{PX} \mathrm{iid}_\mathbb{N}(A_1\times\dots\times A_n\times X\times\dots|q) \, \tilde{p}(dq) = \int_{PX} q(A_1)\cdots q(A_n) \, \tilde{p}(dq) .
</math>
In other words, <math>p</math> is a [[compound distribution|mixture]] of [[Independent and identically distributed random variables|iid measures]] on <math>X</math> (the ones formed by <math>q</math> in the integral above).

== Extensions ==
Versions of de&nbsp;Finetti's theorem for  ''finite'' exchangeable sequences,<ref>{{cite journal
 |first1=P. |last1=Diaconis |author-link=Persi Diaconis
 |first2=D. |last2=Freedman |author-link2=David A. Freedman (statistician)
 |title=Finite exchangeable sequences
 |journal=Annals of Probability
 |volume=8 |issue=4 |year=1980 |pages=745&ndash;764
 |doi=10.1214/aop/1176994663 |mr=577313 | zbl = 0434.60034
|doi-access=free }}</ref><ref>{{cite journal
 |first1=G.&nbsp;J. |last1=Szekely |author-link=Gabor J Szekely
 |first2= J.&nbsp;G.  |last2=Kerns  |title=De&nbsp;Finetti's theorem for abstract finite exchangeable sequences |journal=Journal of Theoretical Probability |volume=19 |issue=3 |year=2006 |pages= 589–608 |doi=10.1007/s10959-006-0028-z|s2cid=119981020 }}
</ref> and for ''Markov&nbsp;exchangeable'' sequences<ref>{{cite journal
 |first1=P. |last1=Diaconis |author-link=Persi Diaconis
 |first2=D. |last2=Freedman |author-link2=David A. Freedman (statistician)
 |title=De&nbsp;Finetti's theorem for Markov chains
 |journal=Annals of Probability
 |volume=8 |issue=1 |year=1980 |pages=115&ndash;130
 |doi=10.1214/aop/1176994828 |mr=556418| zbl=0426.60064
|doi-access=free }}</ref> have been proved by Diaconis and Freedman and by Kerns and Szekely. 
Two notions of partial exchangeability of arrays, known as ''separate'' and ''joint exchangeability'' lead to extensions of de&nbsp;Finetti's theorem for arrays by Aldous and Hoover.<ref>Persi Diaconis and [[Svante Janson]] (2008) [http://www.stat.berkeley.edu/~aldous/Research/persi-svante.pdf "Graph Limits and Exchangeable Random Graphs"],''Rendiconti di&nbsp;Matematica'', Ser. VII 28(1), 33–61.</ref>

The computable de&nbsp;Finetti theorem shows that if an exchangeable sequence of real random variables is given by a computer program, then a program which samples from the mixing measure can be automatically recovered.<ref>
Cameron Freer and Daniel Roy (2009) [https://doi.org/10.1007%2F978-3-642-03073-4_23 "Computable exchangeable sequences have computable de&nbsp;Finetti measures"],  ''Proceedings of the 5th Conference on Computability in Europe: Mathematical Theory and Computational Practice'', Lecture Notes in Computer Science, Vol. 5635, pp. 218&ndash;231.</ref>

In the setting of [[free probability]], there is a noncommutative extension of de Finetti's theorem which characterizes noncommutative sequences invariant under quantum permutations.<ref>
{{cite journal |first1=Claus |last1=Koestler |first2=Roland |last2=Speicher |year=2009 |title=A noncommutative de&nbsp;Finetti theorem: Invariance under quantum permutations is equivalent to freeness with amalgamation |journal=Commun. Math. Phys. |volume=291 |issue= 2|pages=473&ndash;490 |doi= 10.1007/s00220-009-0802-8|bibcode=2009CMaPh.291..473K |arxiv=0807.0677 |s2cid=115155584 }}
</ref>

Extensions of de Finetti's theorem to quantum states have been found to be useful in [[quantum information]],<ref>{{Cite journal|last1=Caves|first1=Carlton M.|last2=Fuchs|first2=Christopher A.|last3=Schack|first3=Ruediger|date=2002-08-20|title=Unknown quantum states: The quantum de Finetti representation|journal=Journal of Mathematical Physics|volume=43|issue=9|pages=4537–4559|arxiv=quant-ph/0104088|doi=10.1063/1.1494475|issn=0022-2488|bibcode=2002JMP....43.4537C|s2cid=17416262}}</ref><ref>{{cite web|url=http://math.ucr.edu/home/baez/week251.html|title=This Week's Finds in Mathematical Physics (Week 251)|year=2007|author=J. Baez|access-date=29 April 2012|author-link=John C. Baez}}</ref><ref>{{Cite book|last1=Brandao|first1=Fernando G.S.L.|last2=Harrow|first2=Aram W.|title=Proceedings of the forty-fifth annual ACM symposium on Theory of Computing |chapter=Quantum de finetti theorems under local measurements with applications |date=2013-01-01|series=STOC '13|location=New York, NY, USA|publisher=ACM|pages=861–870|arxiv=1210.6367|doi=10.1145/2488608.2488718|isbn=9781450320290|s2cid=1772280}}</ref> in topics like [[quantum key distribution]]<ref>{{cite arXiv|last=Renner|first=Renato|date=2005-12-30|title=Security of Quantum Key Distribution|arxiv=quant-ph/0512258}}</ref> and [[quantum entanglement|entanglement]] detection.<ref>{{Cite journal|last1=Doherty|first1=Andrew C.|last2=Parrilo|first2=Pablo A.|last3=Spedalieri|first3=Federico M.|date=2005-01-01|title=Detecting multipartite entanglement|journal=Physical Review A|volume=71|issue=3|pages=032333|arxiv=quant-ph/0407143|doi=10.1103/PhysRevA.71.032333|bibcode=2005PhRvA..71c2333D|s2cid=44241800}}</ref> A multivariate extension of de Finetti’s theorem can be used to derive [[Bose–Einstein statistics]] from the statistics of classical (i.e. independent) particles.<ref>{{cite journal |first1=A. |last1=Bach |first2=H. |last2=Blank |first3=H. |last3=Francke |title=Bose-Einstein statistics derived from the statistics of classical particles |journal=Lettere al Nuovo Cimento |volume=43 |pages=195–198 |year=1985 |issue=4 |doi=10.1007/BF02746978 |s2cid=121413539 }}</ref>

==See also==
* [[Choquet theory]]
* [[Hewitt–Savage zero–one law]]
* [[Krein–Milman theorem]]
* [[Invariant sigma-algebra]]

== References ==
{{Reflist}}

==External links==
*{{SpringerEOM|id=De_Finetti_theorem|first=L.|last= Accardi|title=De Finetti theorem}}
*[https://stats.stackexchange.com/q/34465 What is so cool about De Finetti's representation theorem?]
*{{nlab|id=de+Finetti's+theorem|title=De Finetti's theorem}}

[[Category:Theorems in probability theory]]
[[Category:Bayesian statistics]]
[[Category:Integral representations]]