Editing Probability distribution (section)

==General probability definition==

Let <math> (\Omega, \mathcal{F}, P) </math> be a [[probability space]], <math> (E, \mathcal{E}) </math> be a [[measurable space]], and <math> X : \Omega \to E </math> be a <math> (E, \mathcal{E}) </math>-valued random variable. Then the '''probability distribution''' of <math>X</math> is the [[pushforward measure]] of the probability measure <math>P</math> onto <math> (E, \mathcal{E}) </math> induced by <math>X</math>. Explicitly, this pushforward measure on <math> (E, \mathcal{E}) </math> is given by
<math display="block">X_{*} (P) (B) = P \left( X^{-1} (B) \right)</math> for <math>B \in \mathcal{E}.</math>

Any probability distribution is a [[probability measure]] on <math> (E, \mathcal{E}) </math> (in general different from <math>P</math>, unless <math>X</math> happens to be the identity map).{{cn|date=May 2025}}

A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for absolutely continuous and discrete variables, is by means of a probability function <math>P \colon \mathcal{A} \to \Reals</math> whose '''input space''' <math>\mathcal{A}</math> is a [[σ-algebra]], and gives a [[real number]] '''probability''' as its output, particularly, a number in <math>[0,1] \subseteq \Reals</math>.

The probability function <math>P</math> can take as argument subsets of the sample space itself, as in the coin toss example, where the function <math>P</math> was defined so that {{math|1=''P''(heads) = 0.5}} and {{math|1=''P''(tails) = 0.5}}. However, because of the widespread use of [[random variables]], which transform the sample space into a set of numbers (e.g., <math>\R</math>, <math>\N</math>), it is more common to study probability distributions whose argument are subsets of these particular kinds of sets (number sets),<ref>{{cite book| last1 = Walpole | first1 = R.E. | last2 = Myers | first2 = R.H. | last3 = Myers | first3 = S.L. | last4 = Ye | first4 = K.|year=1999|title=Probability and statistics for engineers|publisher=Prentice Hall}}</ref> and all probability distributions discussed in this article are of this type. It is common to denote as <math>P(X \in E)</math> the probability that a certain value of the variable <math>X</math> belongs to a certain event <math>E</math>.<ref name='ross' /><ref name='degroot' />

The above probability function only characterizes a probability distribution if it satisfies all the [[Kolmogorov axioms]], that is:
# <math>P(X \in E) \ge 0 \; \forall E \in \mathcal{A}</math>, so the probability is non-negative
# <math>P(X \in E) \le 1 \; \forall E \in \mathcal{A}</math>, so no probability exceeds <math>1</math>
# <math>P(X \in \bigcup_{i} E_i ) = \sum_i P(X \in E_i)</math> for any countable disjoint family of sets <math>\{ E_i \}</math>

The concept of probability function is made more rigorous by defining it as the element of a [[probability space]] <math>(X, \mathcal{A}, P)</math>, where <math>X</math> is the set of possible outcomes, <math>\mathcal{A}</math> is the set of all subsets <math>E \subset X</math> whose probability can be measured, and <math>P</math> is the probability function, or '''probability measure''', that assigns a probability to each of these measurable subsets <math>E \in \mathcal{A}</math>.<ref name='billingsley'>{{cite book|author1=Billingsley, P.|year=1986|title=Probability and measure| publisher=Wiley | isbn=9780471804789}}</ref>

Probability distributions usually belong to one of two classes. A '''discrete probability distribution''' is applicable to the scenarios where the set of possible outcomes is [[discrete probability distribution|discrete]] (e.g. a coin toss, a roll of a die) and the probabilities are encoded by a discrete list of the probabilities of the outcomes; in this case the discrete probability distribution is known as [[probability mass function]]. On the other hand, '''absolutely continuous probability distributions''' are applicable to scenarios where the set of possible outcomes can take on values in a continuous range (e.g. real numbers), such as the temperature on a given day. In the absolutely continuous case, probabilities are described by a [[probability density function]], and the probability distribution is by definition the integral of the probability density function.<ref name="ross" /><ref name=":3">{{cite web|title=1.3.6.1. What is a Probability Distribution |url=https://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm|access-date=2020-09-10 |website=www.itl.nist.gov}}</ref><ref name='degroot'>{{cite book|last1=DeGroot|first1=Morris H. |last2=Schervish|first2=Mark J.|title=Probability and Statistics|publisher=Addison-Wesley|year=2002}}</ref> The [[normal distribution]] is a commonly encountered absolutely continuous probability distribution. More complex experiments, such as those involving [[stochastic processes]] defined in [[continuous time]], may demand the use of more general [[probability measure]]s.

A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels, ordered labels or binary) is called [[Univariate distribution|univariate]], while a distribution whose sample space is a [[vector space]] of dimension 2 or more is called [[Multivariate distribution|multivariate]]. A univariate distribution gives the probabilities of a single [[random variable]] taking on various different values; a multivariate distribution (a [[joint probability distribution]]) gives the probabilities of a [[random vector]] – a list of two or more random variables – taking on various combinations of values. Important and commonly encountered univariate probability distributions include the [[binomial distribution]], the [[hypergeometric distribution]], and the [[normal distribution]]. A commonly encountered multivariate distribution is the [[multivariate normal distribution]].

Besides the probability function, the cumulative distribution function, the probability mass function and the probability density function, the [[moment generating function]] and the [[characteristic function (probability theory)|characteristic function]] also serve to identify a probability distribution, as they uniquely determine an underlying cumulative distribution function.<ref>{{cite journal|author1=Shephard, N.G.|year=1991|title=From characteristic function to distribution function: a simple framework for the theory|journal=Econometric Theory|volume=7|issue=4|pages=519–529|doi=10.1017/S0266466600004746|s2cid=14668369 |url=https://ora.ox.ac.uk/objects/uuid:a4c3ad11-74fe-458c-8d58-6f74511a476c}}</ref>
[[File:Standard deviation diagram.svg|right|thumb|250px|Figure 2: The [[probability density function]] (pdf) of the [[normal distribution]], also called Gaussian or "bell curve", the most important absolutely continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.]]