Editing Borel–Kolmogorov paradox

{{Short description|Conditional probability paradox}}
In [[probability theory]], the '''Borel–Kolmogorov paradox''' (sometimes known as '''Borel's paradox''') is a [[paradox]]  relating to [[conditional probability]] with respect to an [[event (probability theory)|event]] of probability zero (also known as a [[null set]]). It is named after [[Émile Borel]] and [[Andrey Kolmogorov]].

== A great circle puzzle ==
Suppose that a [[random variable]] has a [[Uniform distribution (continuous)|uniform distribution]] on a [[unit sphere]]. What is its [[conditional distribution]] on a [[great circle]]? Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results.  First, note that choosing a point uniformly on the sphere is equivalent to choosing the [[longitude]] <math>\lambda</math> uniformly from <math>[-\pi,\pi]</math> and choosing the [[latitude]] <math>\varphi</math> from <math display="inline">[-\frac{\pi}{2},\frac{\pi}{2}]</math> with density <math display="inline">\frac{1}{2} \cos \varphi</math>.<ref name=Jaynes>{{harvnb|Jaynes|2003|pages=1514–1517}}</ref> Then we can look at two different great circles:
# If the coordinates are chosen so that the great circle is an [[equator]] (latitude <math>\varphi = 0</math>), the conditional density for a longitude <math>\lambda</math> defined on the interval <math>[-\pi,\pi]</math> is <math display="block"> f(\lambda\mid\varphi=0) = \frac{1}{2\pi}.</math>
# If the great circle is a [[line of longitude]] with <math>\lambda = 0</math>, the conditional density for  <math>\varphi</math> on the interval <math display="inline">[-\frac{\pi}{2},\frac{\pi}{2}]</math> is <math display="block">f(\varphi\mid\lambda=0) = \frac{1}{2} \cos \varphi.</math>

One distribution is uniform on the circle, the other is not. Yet both seem to be referring to the same great circle in different coordinate systems.

{{Quote
 | Many quite futile arguments have raged — between otherwise competent probabilists — over which of these results is 'correct'.
 | [[E.T. Jaynes]]<ref name=Jaynes/>
}}

== Explanation and implications ==
In case (1) above, the conditional probability that the longitude ''λ'' lies in a set ''E'' given that ''φ'' = 0 can be written ''P''(''λ'' ∈ ''E'' | ''φ'' = 0). Elementary probability theory suggests this can be computed as ''P''(''λ'' ∈ ''E'' and ''φ'' = 0)/''P''(''φ'' = 0), but that expression is not well-defined since ''P''(''φ'' = 0) = 0.  [[Measure theory]] provides a way to define a conditional probability, using the limit of events ''R''<sub>''ab''</sub> = {''φ'' : ''a'' < ''φ'' < ''b''} which are horizontal rings (curved surface zones of [[spherical segment]]s) consisting of all points with latitude between ''a'' and ''b''.

The resolution of the paradox is to notice that in case (2), ''P''(''φ'' ∈ ''F'' | ''λ'' = 0) is defined using a limit of the events ''L''<sub>''cd''</sub> = {''λ'' : ''c'' < ''λ'' < ''d''}, which are [[Spherical lune|lunes]] (vertical wedges), consisting of all points whose longitude varies between ''c'' and ''d''.  So although ''P''(''λ'' ∈ ''E'' | ''φ'' = 0) and ''P''(''φ'' ∈ ''F'' | ''λ'' = 0) each provide a probability distribution on a great circle, one of them is defined using limits of rings, and the other using limits of lunes.  Since rings and lunes have different shapes, it should be less surprising that ''P''(''λ'' ∈ ''E'' | ''φ'' = 0) and ''P''(''φ'' ∈ ''F'' | ''λ'' = 0) have different distributions.

{{Quote
 | The concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible. For we can obtain a probability distribution for [the latitude] on the meridian circle only if we regard this circle as an element of the decomposition of the entire spherical surface onto meridian circles with the given poles
| [[Andrey Kolmogorov]]<ref>Originally [[#kol1933|Kolmogorov (1933)]], translated in [[#kol1956|Kolmogorov (1956)]]. Sourced from [[#pol2002|Pollard (2002)]]</ref>
}}
{{Quote
 | … the term 'great circle' is ambiguous until we specify what limiting operation is to produce it. The intuitive symmetry argument presupposes the equatorial limit; yet one eating slices of an orange might presuppose the other.
 | [[E.T. Jaynes]]<ref name=Jaynes/>
}}

== Mathematical explication ==

=== Measure theoretic perspective ===

To understand the problem we need to recognize that a distribution on a continuous random variable is described by a density ''f'' only with respect to some measure ''μ''. Both are important for the full description of the probability distribution. Or, equivalently, we need to fully define the space on which we want to define ''f''.

Let Φ and Λ denote two random variables taking values in Ω<sub>1</sub> = <math display="inline">\left[-\frac{\pi}{2}, \frac{\pi}{2}\right]</math> respectively Ω<sub>2</sub> = [−{{pi}}, {{pi}}]. An event {Φ&nbsp;=&nbsp;''φ'',&nbsp;Λ&nbsp;=&nbsp;''λ''} gives a point on the sphere ''S''(''r'') with radius ''r''. We define the [[coordinate transform]]
:<math>\begin{align}
  x &= r \cos \varphi \cos \lambda \\
  y &= r \cos \varphi \sin \lambda \\
  z &= r \sin \varphi
\end{align}</math>

for which we obtain the [[volume element]]
:<math>\omega_r(\varphi,\lambda) = \left\| {\partial (x,y,z) \over \partial \varphi} \times {\partial (x,y,z) \over \partial \lambda} \right\| = r^2 \cos \varphi \ .</math>

Furthermore, if either ''φ'' or ''λ'' is fixed, we get the volume elements
:<math>\begin{align}
  \omega_r(\lambda) &= \left\| {\partial (x,y,z) \over \partial \varphi } \right\| = r \ , \quad\text{respectively} \\[3pt]
  \omega_r(\varphi) &= \left\| {\partial (x,y,z) \over \partial \lambda } \right\| = r \cos \varphi\ .
\end{align}</math>

Let
:<math>\mu_{\Phi,\Lambda}(d\varphi, d\lambda) = f_{\Phi,\Lambda}(\varphi,\lambda) \omega_r(\varphi,\lambda) \, d\varphi \, d\lambda</math>

denote the joint measure on <math>\mathcal{B}(\Omega_1 \times \Omega_2)</math>, which has a density <math>f_{\Phi,\Lambda}</math> with respect to <math>\omega_r(\varphi,\lambda) \, d\varphi \, d\lambda</math> and let
:<math>\begin{align}
          \mu_\Phi(d\varphi) &= \int_{\lambda \in \Omega_2} \mu_{\Phi,\Lambda}(d\varphi, d\lambda)\ ,\\
  \mu_\Lambda (d\lambda) &= \int_{\varphi \in \Omega_1} \mu_{\Phi,\Lambda}(d\varphi, d\lambda)\ .
\end{align}</math>

If we assume that the density <math>f_{\Phi,\Lambda}</math> is uniform, then
:<math>\begin{align}
  \mu_{\Phi \mid \Lambda}(d\varphi \mid \lambda) &= {\mu_{\Phi,\Lambda}(d\varphi, d\lambda) \over \mu_\Lambda(d\lambda)} = \frac{1}{2r} \omega_r(\varphi) \, d\varphi \ , \quad\text{and} \\[3pt]
  \mu_{\Lambda \mid \Phi}(d\lambda \mid \varphi) &= {\mu_{\Phi,\Lambda}(d\varphi, d\lambda) \over \mu_\Phi(d\varphi)} = \frac{1}{2r\pi} \omega_r(\lambda) \, d\lambda \ .
\end{align}</math>

Hence, <math>\mu_{\Phi \mid \Lambda}</math> has a uniform density with respect to <math>\omega_r(\varphi) \, d\varphi</math> but not with respect to the [[Lebesgue measure]]. On the other hand, <math>\mu_{\Lambda \mid \Phi}</math> has a uniform density with respect to <math>\omega_r(\lambda) \, d\lambda</math> and the Lebesgue measure.

=== Proof of contradiction ===
{{Original research|paragraph|discuss=Talk:Borel–Kolmogorov_paradox#Error_in_"Mathematical_Explication"|date=March 2021}} 

Consider a random vector <math>(X,Y,Z)</math> that is uniformly distributed on the unit sphere <math>S^2</math>.

We begin by parametrizing the sphere with the usual [[spherical polar coordinates]]:
:<math>\begin{aligned}
  x &= \cos(\varphi) \cos (\theta) \\
  y &= \cos(\varphi) \sin (\theta) \\
  z &= \sin(\varphi)
\end{aligned}</math>
where <math display="inline">-\frac{\pi}{2} \le \varphi \le \frac{\pi}{2}</math> and <math>-\pi \le \theta \le \pi</math>.

We can define random variables <math>\Phi</math>, <math>\Theta</math> as the values of <math>(X, Y, Z)</math>
under the inverse of this parametrization, or more formally using the [[arctan2|arctan2 function]]:
:<math>\begin{align}
    \Phi &= \arcsin(Z) \\
  \Theta &= \arctan_2\left(\frac{Y}{\sqrt{1 - Z^2}}, \frac{X}{\sqrt{1 - Z^2}}\right)
\end{align}</math>

Using the formulas for the surface area [[spherical cap]] and the [[spherical wedge]], the surface of a spherical cap wedge is given by
:<math>
\operatorname{Area}(\Theta \le \theta, \Phi \le \varphi) = (1 + \sin(\varphi)) (\theta + \pi)
</math>

Since <math>(X,Y,Z)</math> is uniformly distributed, the probability is proportional to the surface area, giving the [[Joint probability distribution#Joint cumulative distribution function|joint cumulative distribution function]]
:<math>
F_{\Phi, \Theta}(\varphi, \theta) = P(\Theta \le \theta, \Phi \le \varphi) = \frac{1}{4\pi}(1 + \sin(\varphi)) (\theta + \pi)
</math>

The [[Joint probability distribution#Joint density function or mass function|joint probability density function]] is then given by
:<math>
  f_{\Phi, \Theta}(\varphi, \theta) = 
  \frac{\partial^2}{\partial \varphi \partial \theta} F_{\Phi, \Theta}(\varphi, \theta) = 
  \frac{1}{4\pi} \cos(\varphi) 
</math>
Note that <math>\Phi</math> and <math>\Theta</math> are independent random variables.

For simplicity, we won't calculate the full conditional distribution on a great circle, only the probability that the random vector lies in the first octant. That is to say, we will attempt to calculate the conditional probability <math>\mathbb{P}(A|B)</math> with
:<math>\begin{aligned}
  A &= \left\{ 0 < \Theta < \frac{\pi}{4} \right\} &&= \{ 0 < X < 1, 0 < Y < X \}\\
  B &= \{ \Phi = 0 \} &&= \{ Z = 0 \}
\end{aligned}</math>

We attempt to evaluate the conditional probability as a limit of conditioning on the events
:<math>B_\varepsilon = \{ | \Phi | < \varepsilon \}</math>

As <math>\Phi</math> and <math>\Theta</math> are independent, so are the events <math>A</math> and <math>B_\varepsilon</math>, therefore
:<math>
  P(A \mid B) \mathrel{\stackrel{?}{=}} \lim_{\varepsilon \to 0} \frac{P(A \cap B_\varepsilon)}{P(B_\varepsilon)} =
  \lim_{\varepsilon \to 0} P(A) = P \left(0 < \Theta < \frac{\pi}{4}\right) = \frac{1}{8}.
</math>

Now we repeat the process with a different parametrization of the sphere:
:<math>\begin{align}
  x &=  \sin(\varphi) \\
  y &=  \cos(\varphi) \sin(\theta) \\
  z &= -\cos(\varphi) \cos(\theta)
\end{align}</math>
This is equivalent to the previous parametrization [[Rotation matrix#Basic rotations|rotated by 90 degrees around the y axis]].

Define new random variables
:<math>\begin{align}
    \Phi' &= \arcsin(X) \\
  \Theta' &= \arctan_2\left(\frac{Y}{\sqrt{1 - X^2}}, \frac{-Z}{\sqrt{1 - X^2}}\right).
\end{align}</math>

Rotation is [[measure-preserving transformation|measure preserving]] so the density of <math>\Phi'</math> and  <math>\Theta'</math> is the same:
:<math> f_{\Phi', \Theta'}(\varphi, \theta) = \frac{1}{4\pi} \cos(\varphi) </math>.

The expressions for {{mvar|A}} and {{mvar|B}} are:
:<math>\begin{align}
  A &= \left\{ 0 < \Theta < \frac{\pi}{4} \right\}
   &&= \{ 0 < X < 1,\ 0 < Y < X \}
   &&= \left\{ 0 < \Theta' < \pi,\ 0 < \Phi' < \frac{\pi}{2},\ \sin(\Theta') < \tan(\Phi') \right\} \\
  B &= \{ \Phi = 0 \}
   &&= \{ Z = 0 \}
   &&= \left\{ \Theta' = -\frac{\pi}{2} \right\} \cup \left\{ \Theta' = \frac{\pi}{2} \right\}.
\end{align}</math>

Attempting again to evaluate the conditional probability as a limit of conditioning on the events
:<math>B^\prime_\varepsilon = \left\{ \left|\Theta' + \frac{\pi}{2}\right| < \varepsilon \right\} \cup \left\{ \left|\Theta'-\frac{\pi}{2}\right| < \varepsilon \right\}.</math>

Using [[L'Hôpital's rule]] and [[Leibniz integral rule|differentiation under the integral sign]]:
:<math>\begin{align}
  P(A \mid B) &\mathrel{\stackrel{?}{=}} \lim_{\varepsilon \to 0} \frac{P(A \cap B^\prime_\varepsilon )}{P(B^\prime_\varepsilon )}\\
    &=  \lim_{\varepsilon \to 0} \frac{1}{\frac{4\varepsilon}{2\pi}}P\left( \frac{\pi}{2} - \varepsilon < \Theta' < \frac{\pi}{2} + \varepsilon,\ 0 < \Phi' < \frac{\pi}{2},\ \sin(\Theta') < \tan(\Phi') \right)\\
    &= \frac{\pi}{2} \lim_{\varepsilon \to 0} \frac{\partial}{\partial \varepsilon} \int_{{\pi}/{2}-\epsilon}^{{\pi}/{2}+\epsilon} \int_0^{{\pi}/{2}} 1_{\sin(\theta) < \tan(\varphi)} f_{\Phi', \Theta'}(\varphi, \theta) \mathrm{d}\varphi \mathrm{d}\theta \\
    &= \pi \int_0^{{\pi}/{2}} 1_{1 < \tan(\varphi)} f_{\Phi', \Theta'}\left(\varphi, \frac{\pi}{2}\right) \mathrm{d}\varphi \\
    &= \pi \int_{\pi/4}^{\pi/2} \frac{1}{4 \pi} \cos(\varphi) \mathrm{d}\varphi \\
    &= \frac{1}{4} \left( 1  - \frac{1}{\sqrt{2}} \right) \neq \frac{1}{8}
\end{align}</math>

This shows that the conditional density cannot be treated as conditioning on an event of probability zero, as explained in [[Conditional probability#Conditioning on an event of probability zero]].

==See also==

* {{annotated link|Disintegration theorem}}

== Notes ==
{{Reflist}}

== References ==
{{refbegin}}
* {{cite book |title = Probability Theory: The Logic of Science |last = Jaynes |first = E. T. |author-link= Edwin Thompson Jaynes |year= 2003 |publisher= Cambridge University Press |isbn = 0-521-59271-2 |pages = 467&ndash;470 |section = 15.7 The Borel-Kolmogorov paradox |mr = 1992316 }}
** [http://omega.math.albany.edu:8008/ETJ-PS/cc15w.ps Fragmentary Edition (1994) (pp.&nbsp;1514&ndash;1517)] {{Webarchive|url=https://web.archive.org/web/20180930025717/http://omega.math.albany.edu:8008/ETJ-PS/cc15w.ps |date=2018-09-30 }}  ([[PostScript]] format)
* {{cite book |title = Grundbegriffe der Wahrscheinlichkeitsrechnung |last = Kolmogorov |first = Andrey |author-link = Andrey Kolmogorov |year = 1933 |publisher= Julius Springer |location = Berlin |language = de |ref = kol1933 }}
** Translation: {{cite book |title = Foundations of the Theory of Probability |edition = 2nd |last = Kolmogorov |first = Andrey |author-link = Andrey Kolmogorov |year = 1956 |publisher = Chelsea |location = New York |isbn = 0-8284-0023-7 |pages = 50&ndash;51 |url = http://www.mathematik.com/Kolmogorov/index.html |ref = kol1956 |chapter = Chapter V, §2. Explanation of a Borel Paradox |chapter-url = http://www.mathematik.com/Kolmogorov/0029.html |access-date = 2009-03-12 |archive-url = https://web.archive.org/web/20180914120850/http://www.mathematik.com/Kolmogorov/0029.html |archive-date = 2018-09-14 |url-status = dead }}
* {{cite book |title = A User's Guide to Measure Theoretic Probability |last = Pollard |first = David |year= 2002 |publisher= Cambridge University Press |isbn = 0-521-00289-3 |pages = 122&ndash;123 |chapter = Chapter 5. Conditioning, Example 17. |ref = pol2002 |mr = 1873379 }}
* {{cite book |doi=10.1016/S0074-6142(02)80219-4 |chapter=16 Probabilistic approach to inverse problems |title=International Handbook of Earthquake and Engineering Seismology |series=International Geophysics |year=2002 |last1=Mosegaard |first1=Klaus |last2=Tarantola |first2=Albert |volume=81 |pages=237–265 |isbn=9780124406520 }}
* {{cite web |last1=Gal |first1=Yarin |title=The Borel–Kolmogorov paradox |url=https://www.cs.ox.ac.uk/people/yarin.gal/website/PDFs/Short-talk-03-2014.pdf}}
{{refend}}

{{DEFAULTSORT:Borel-Kolmogorov Paradox}}
[[Category:Probability theory paradoxes]]