Editing Probability axioms

{{Short description|Foundations of probability theory}}
{{Probability fundamentals}}
The standard '''probability axioms''' are the foundations of [[probability theory]] introduced by Russian mathematician [[Andrey Kolmogorov]] in 1933.<ref name=":0">{{Cite book |title=Foundations of the theory of probability |url=https://archive.org/details/foundationsofthe00kolm |last=Kolmogorov |first=Andrey |publisher=Chelsea Publishing Company |year=1950 |orig-date=1933 |location=New York, US }}</ref> These [[axiom]]s remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.<ref>{{Cite web |url=https://www.stat.berkeley.edu/~aldous/Real_World/kolmogorov.html |title=What is the significance of the Kolmogorov axioms? |last=Aldous |first=David |website=David Aldous |access-date=November 19, 2019}}</ref>

There are several other (equivalent) approaches to formalising probability. [[Bayesian theory|Bayesians]] will often motivate the Kolmogorov axioms by invoking [[Cox's theorem]] or the [[Dutch book argument|Dutch book arguments]] instead.<ref>{{Cite journal | last = Cox | first = R. T. | author-link = Richard Threlkeld Cox| doi = 10.1119/1.1990764 | title = Probability, Frequency and Reasonable Expectation | journal = American Journal of Physics | volume = 14 | pages = 1–10 | year = 1946 | issue = 1 | bibcode = 1946AmJPh..14....1C }}</ref><ref>{{cite book|first=R. T. |last=Cox |author-link=Richard Threlkeld Cox |title=The Algebra of Probable Inference |publisher=Johns Hopkins University Press |location=Baltimore, MD |year=1961 }}</ref>

== Kolmogorov axioms ==
The assumptions as to setting up the axioms can be summarised as follows: Let <math>(\Omega, F, P)</math> be a [[measure space]] such that <math>P(E)</math> is the [[probability]] of some [[Event (probability theory)|event]] <math>E</math>, and <math>P(\Omega) = 1</math>. Then <math>(\Omega, F, P)</math> is a [[probability space]], with sample space <math>\Omega</math>, event space <math>F</math> and [[probability measure]] <math>P</math>.<ref name=":0" />

==={{Anchor|Non-negativity}}First axiom ===
The probability of an event is a non-negative real number:
:<math>P(E)\in\mathbb{R}, P(E)\geq 0 \qquad \forall E \in F</math>

where <math>F</math> is the event space. It follows (when combined with the second axiom) that <math>P(E)</math> is always finite, in contrast with more general [[Measure (mathematics)|measure theory]]. Theories which assign [[negative probability]] relax the first axiom.

=== {{Anchor|Unitarity|Normalization}}Second axiom ===
This is the assumption of [[unit measure]]: that the probability that at least one of the [[elementary event]]s in the entire sample space will occur is 1.

: <math>P(\Omega) = 1</math>

=== {{Anchor|Sigma additivity|Finite additivity|Countable additivity|Finitely additive}}Third axiom ===
This is the assumption of [[σ-additivity]]:
: Any [[countable]] sequence of [[disjoint sets]] (synonymous with ''[[Mutual exclusivity|mutually exclusive]]'' events) <math>E_1, E_2, \ldots</math> satisfies
::<math>P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i).</math>
Some authors consider merely [[finitely additive]] probability spaces, in which case one just needs an [[field of sets|algebra of sets]], rather than a [[σ-algebra]].<ref>{{Cite web|url=https://plato.stanford.edu/entries/probability-interpret/#KolProCal|title=Interpretations of Probability|last=Hájek|first=Alan|date=August 28, 2019|website=Stanford Encyclopedia of Philosophy|access-date=November 17, 2019}}</ref> [[Quasiprobability distribution]]s in general relax the third axiom.

== Consequences ==
From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs<ref name=":1">{{Cite book|title=A first course in probability|last=Ross, Sheldon M.|year=2014|isbn=978-0-321-79477-2|edition=Ninth|location=Upper Saddle River, New Jersey|pages=27, 28|oclc=827003384}}</ref><ref>{{Cite web|url=https://dcgerard.github.io/stat234/11_proofs_from_axioms.pdf|title=Proofs from axioms|last=Gerard|first=David|date=December 9, 2017|access-date=November 20, 2019}}</ref><ref>{{Cite web|url=http://www.maths.qmul.ac.uk/~bill/MTH4107/notesweek3_10.pdf|title=Probability (Lecture Notes - Week 3)|last=Jackson|first=Bill|date=2010|website=School of Mathematics, Queen Mary University of London|access-date=November 20, 2019}}</ref> of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the prior two axioms. Four of the immediate corollaries and their proofs are shown below:

=== Monotonicity ===

:<math>\quad\text{if}\quad A\subseteq B\quad\text{then}\quad P(A)\leq P(B).</math>

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

==== ''Proof of monotonicity'' ====
Source:<ref name=":1" />

In order to verify the monotonicity property, we set <math>E_1=A</math> and <math>E_2=B\setminus A</math>, where <math>A\subseteq B</math> and <math>E_i=\varnothing</math> for <math>i\geq 3</math>. From the properties of the [[empty set]] (<math>\varnothing</math>), it is easy to see that the sets <math>E_i</math> are pairwise disjoint and <math>E_1\cup E_2\cup\cdots=B</math>. Hence, we obtain from the third axiom that

:<math>P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)=P(B).</math>

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to <math>P(B)</math> which is finite, we obtain both <math>P(A)\leq P(B)</math> and <math>P(\varnothing)=0</math>.

=== The probability of the empty set ===

: <math>P(\varnothing)=0.</math>

In many cases, <math>\varnothing</math> is not the only event with probability&nbsp;0.

==== ''Proof of the probability of the empty set''====

<math>P(\varnothing \cup \varnothing) = P(\varnothing)</math> since <math>\varnothing \cup \varnothing = \varnothing</math>,

<math>P(\varnothing)+P(\varnothing) = P(\varnothing)</math> by applying the third axiom to the left-hand side 
(note <math>\varnothing</math> is disjoint with itself), and so

<math>P(\varnothing) = 0</math> by subtracting <math>P(\varnothing)</math> from each side of the equation.

=== The complement rule ===
<math>P\left(A^{\complement}\right) = P(\Omega-A) = 1 - P(A)</math>

==== ''Proof of the complement rule'' ====
Given <math>A</math> and <math>A^{\complement}</math> are mutually exclusive and that <math>A \cup A^\complement = \Omega
</math>:

<math>P(A \cup A^\complement)=P(A)+P(A^\complement)
</math>           ''... (by axiom 3)''

and,      <math>
P(A \cup A^\complement)=P(\Omega)=1
</math>                    ... ''(by axiom 2)''

<math> \Rightarrow P(A)+P(A^\complement)=1    
</math>

<math>\therefore    P(A^\complement)=1-P(A)
</math>

=== The numeric bound ===
It immediately follows from the monotonicity property that
: <math>0\leq P(E)\leq 1\qquad \forall E\in F.</math>

==== ''Proof of the numeric bound'' ====
Given the complement rule <math>P(E^c)=1-P(E)
</math> and ''axiom 1'' <math>P(E^c)\geq0
</math>:

<math>1-P(E) \geq 0
</math>

<math>\Rightarrow 1 \geq P(E)
</math>

<math>\therefore 0\leq P(E)\leq 1</math>

== Further consequences ==
Another important property is:

: <math>P(A \cup B) = P(A) + P(B) - P(A \cap B).</math>

This is called the addition law of probability, or the sum rule.
That is, the probability that an event in ''A'' ''or'' ''B'' will happen is the sum of the probability of an event in ''A'' and the probability of an event in ''B'', minus the probability of an event that is in both ''A'' ''and'' ''B''. The proof of this is as follows:

Firstly,

:<math>P(A\cup B) = P(A) + P(B\setminus A)</math>. ''(by Axiom 3)''

So,

:<math>P(A \cup B) = P(A) + P(B\setminus  (A \cap B))</math> (by <math>B \setminus A = B\setminus  (A \cap B)</math>).

Also,

:<math>P(B) = P(B\setminus (A \cap B)) + P(A \cap B)</math>

and eliminating <math>P(B\setminus (A \cap B))</math> from both equations gives us the desired result.

An extension of the addition law to any number of sets is the [[inclusion–exclusion principle]].

Setting ''B'' to the complement ''A<sup>c</sup>'' of ''A'' in the addition law gives

: <math>P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A)</math>

That is, the probability that any event will ''not'' happen (or the event's [[Complement (set theory)|complement]]) is 1 minus the probability that it will.

== Simple example: coin toss ==
Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both).  No assumption is made as to whether the coin is fair or as to whether or not any bias depends on how the coin is tossed.<ref>{{cite journal |last1=Diaconis |first1=Persi |last2=Holmes |first2=Susan |last3=Montgomery |first3=Richard |title=Dynamical Bias in the Coin Toss |journal= SIAM Review|date=2007 |volume=49 |issue=211–235 |pages=211–235 |doi=10.1137/S0036144504446436 |bibcode=2007SIAMR..49..211D |url=https://statweb.stanford.edu/~cgates/PERSI/papers/dyn_coin_07.pdf |access-date=5 January 2024}}</ref>

We may define:

: <math>\Omega = \{H,T\}</math>
: <math>F = \{\varnothing, \{H\}, \{T\}, \{H,T\}\}</math>

Kolmogorov's axioms imply that:

: <math>P(\varnothing) = 0</math>
The probability of ''neither'' heads ''nor'' tails, is 0.

: <math>P(\{H,T\}^c) = 0</math>
The probability of ''either'' heads ''or'' tails, is 1.

: <math>P(\{H\}) + P(\{T\}) = 1</math>
The sum of the probability of heads and the probability of tails, is 1.

== See also ==
* {{annotated link|Borel algebra}}
* {{annotated link|Conditional probability}}
* {{annotated link|Fully probabilistic design}}
* {{annotated link|Intuitive statistics}}
* {{annotated link|Quasiprobability}}
* {{annotated link|Set theory}}
* {{annotated link|Sigma-algebra|σ-algebra}}

== References ==
{{Reflist}}

== Further reading ==
* {{cite book |first=Morris H. |last=DeGroot |author-link=Morris H. DeGroot |title=Probability and Statistics |location=Reading |publisher=Addison-Wesley |year=1975 |pages=[https://archive.org/details/probabilitystati0000degr/page/12 12–16] |isbn=0-201-01503-X |url=https://archive.org/details/probabilitystati0000degr/page/12 }}
* {{cite book |first1=James R. |last1=McCord |first2=Richard M. |last2=Moroney |year=1964 |title=Introduction to Probability Theory |chapter-url=https://archive.org/details/introductiontopr00mcco |chapter-url-access=registration |chapter=Axiomatic Probability |location=New York |publisher=Macmillan |pages=[https://archive.org/details/introductiontopr00mcco/page/13 13–28] }}
*[https://web.archive.org/web/20130923121802/http://mws.cs.ru.nl/mwiki/prob_1.html#M2 Formal definition] of probability in the [[Mizar system]], and the [http://mmlquery.mizar.org/cgi-bin/mmlquery/emacs_search?input=(symbol+Probability+%7C+notation+%7C+constructor+%7C+occur+%7C+th)+ordered+by+number+of+ref list of theorems] formally proved about it.

{{DEFAULTSORT:Probability Axioms}}
[[Category:Probability theory]]
[[Category:Mathematical axioms]]