Editing Bernoulli distribution

{{Short description|Probability distribution modeling a coin toss which need not be fair}}
{{Use American English|date = January 2019}}
{{Probability distribution
|name       =Bernoulli distribution
|type       =mass
|pdf_image  =[[File:Bernoulli Distribution.PNG|325px|Funzione di densità di una variabile casuale normale]]
Three examples of Bernoulli distribution:
{{legend|7F0000|2=<math>P(x=0) = 0{.}2</math> and <math>P(x=1) = 0{.}8</math>}}
{{legend|00007F|2=<math>P(x=0) = 0{.}8</math> and <math>P(x=1) = 0{.}2</math>}}
{{legend|007F00|2=<math>P(x=0) = 0{.}5</math> and <math>P(x=1) = 0{.}5</math>}}
|cdf_image  =
|parameters =<math>0 \leq p \leq 1</math><br />
              <math>q = 1 - p</math>
|support    =<math>k \in \{0,1\}</math>
|pdf        =<math>\begin{cases}
    q=1-p & \text{if }k=0 \\
    p & \text{if }k=1
    \end{cases}
</math>
|cdf        =<math>\begin{cases}
    0 & \text{if } k < 0 \\
    1 - p & \text{if } 0 \leq k < 1 \\
    1 & \text{if } k \geq 1
    \end{cases}</math>
|mean       =<math> p</math>
|median     =<math>\begin{cases}
    0 & \text{if } p < 1/2\\
    \left[0, 1\right] & \text{if } p = 1/2\\
    1 & \text{if } p > 1/2
    \end{cases}</math>
|mode       =<math>\begin{cases}
    0 & \text{if } p < 1/2\\
    0, 1 & \text{if } p = 1/2\\
    1 & \text{if } p > 1/2
    \end{cases}</math>
|variance   =<math>p(1-p) = pq </math>
|mad        =<math>2p(1-p) = 2pq</math>
|skewness   =<math>\frac{q - p}{\sqrt{pq}}</math>
|kurtosis   =<math>\frac{1 - 6pq}{pq}</math>
|entropy    =<math>-q\ln q - p\ln p</math>
|mgf        =<math>q+pe^t</math>
|char       =<math>q+pe^{it}</math>
|pgf        =<math>q+pz</math>
|fisher     =<math> \frac{1}{pq} </math>|
}}
{{Probability fundamentals}}

In [[probability theory]] and [[statistics]], the '''Bernoulli distribution''', named after Swiss mathematician [[Jacob Bernoulli]],<ref>{{cite book |first=James Victor |last=Uspensky |title=Introduction to Mathematical Probability |publisher=McGraw-Hill |location=New York |year=1937 |page=45 |oclc=996937 }}</ref> is the [[discrete probability distribution]] of a [[random variable]] which takes the value 1 with probability <math>p</math> and the value 0 with probability <math>q = 1-p</math>. Less formally, it can be thought of as a model for the set of possible outcomes of any single [[experiment]] that asks a [[yes–no question]]. Such questions lead to [[outcome (probability)|outcomes]] that are [[Boolean-valued function|Boolean]]-valued: a single [[bit]] whose value is success/[[yes and no|yes]]/[[Truth value|true]]/[[Binary code|one]] with [[probability]] ''p'' and failure/no/[[false (logic)|false]]/[[Binary code|zero]] with probability ''q''. It can be used to represent a (possibly biased) [[coin toss]] where 1 and 0 would represent "heads" and "tails", respectively, and ''p'' would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and ''p'' would be the probability of tails).  In particular, unfair coins would have <math>p \neq 1/2.</math>

The Bernoulli distribution is a special case of the [[binomial distribution]] where a single trial is conducted (so ''n'' would be 1 for such a binomial distribution). It is also a special case of the '''two-point distribution''', for which the possible outcomes need not be 0 and 1.<ref>{{cite book |last1=Dekking |first1=Frederik |last2=Kraaikamp |first2=Cornelis |last3=Lopuhaä |first3=Hendrik |last4=Meester |first4=Ludolf |title=A Modern Introduction to Probability and Statistics |date=9 October 2010 |publisher=Springer London |isbn=9781849969529 |pages=43–48 |edition=1}}</ref>

==Properties==
If <math>X</math> is a random variable with a Bernoulli distribution, then:

:<math>\Pr(X=1) = p, \Pr(X=0) = q =1 - p.</math>

The [[probability mass function]] <math>f</math> of this distribution, over possible outcomes ''k'', is

:<math> f(k;p) = \begin{cases}
   p & \text{if }k=1, \\
   q = 1-p & \text {if } k = 0.
 \end{cases}</math><ref name=":0">{{Cite book|title=Introduction to Probability|last=Bertsekas|author-link=Dimitri Bertsekas|first=Dimitri P.|date=2002|publisher=Athena Scientific|others=[[John Tsitsiklis|Tsitsiklis, John N.]], Τσιτσικλής, Γιάννης Ν.|isbn=188652940X|location=Belmont, Mass.|oclc=51441829}}</ref>

This can also be expressed as

:<math>f(k;p) = p^k (1-p)^{1-k} \quad \text{for } k\in\{0,1\}</math>

or as

:<math>f(k;p)=pk+(1-p)(1-k) \quad \text{for } k\in\{0,1\}.</math>

The Bernoulli distribution is a special case of the [[binomial distribution]] with <math>n = 1.</math><ref name="McCullagh1989Ch422">{{cite book | last = McCullagh | first = Peter | author-link= Peter McCullagh |author2=Nelder, John |author-link2=John Nelder | title = Generalized Linear Models, Second Edition | publisher = Boca Raton: Chapman and Hall/CRC | year = 1989 | isbn = 0-412-31760-5 |ref=McCullagh1989 |at=Section 4.2.2 }}</ref>

The [[kurtosis]] goes to infinity for high and low values of <math>p,</math> but for <math>p=1/2</math> the two-point distributions including the Bernoulli distribution have a lower [[excess kurtosis]], namely −2, than any other probability distribution.

The Bernoulli distributions for <math>0 \le p \le 1</math> form an [[exponential family]].

The [[maximum likelihood estimator]] of <math>p</math> based on a random sample is the [[sample mean]].

[[File:PMF and CDF of a bernouli distribution.png|thumb|The probability mass distribution function of a Bernoulli experiment along with its corresponding cumulative distribution function.]]

==Mean==
The [[expected value]] of a Bernoulli random variable <math>X</math> is

:<math>\operatorname{E}[X]=p</math>

This is because for a Bernoulli distributed random variable <math>X</math> with <math>\Pr(X=1)=p</math> and <math>\Pr(X=0)=q</math> we find

:<math>\operatorname{E}[X] = \Pr(X=1)\cdot 1 + \Pr(X=0)\cdot 0
= p \cdot 1 + q\cdot 0 = p.</math><ref name=":0" />

== Variance ==
The [[variance]] of a Bernoulli distributed <math>X</math> is

:<math>\operatorname{Var}[X] = pq = p(1-p)</math>

We first find

:<math>\operatorname{E}[X^2] = \Pr(X=1)\cdot 1^2 + \Pr(X=0)\cdot 0^2 </math> 
:<math> = p \cdot 1^2 + q\cdot 0^2 = p = \operatorname{E}[X] </math>

From this follows

:<math>\operatorname{Var}[X] = \operatorname{E}[X^2]-\operatorname{E}[X]^2 = \operatorname{E}[X]-\operatorname{E}[X]^2 </math> 
:<math> = p-p^2 = p(1-p) = pq</math><ref name=":0" />

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside <math>[0,1/4]</math>.

==Skewness==
The [[skewness]] is <math>\frac{q-p}{\sqrt{pq}}=\frac{1-2p}{\sqrt{pq}}</math>. When we take the standardized Bernoulli distributed random variable <math>\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}</math> we find that this random variable attains <math>\frac{q}{\sqrt{pq}}</math> with probability <math>p</math> and attains <math>-\frac{p}{\sqrt{pq}}</math> with probability <math>q</math>. Thus we get

:<math>\begin{align}
\gamma_1 &= \operatorname{E} \left[\left(\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}\right)^3\right] \\
&= p \cdot \left(\frac{q}{\sqrt{pq}}\right)^3 + q \cdot \left(-\frac{p}{\sqrt{pq}}\right)^3 \\
&= \frac{1}{\sqrt{pq}^3} \left(pq^3-qp^3\right) \\
&= \frac{pq}{\sqrt{pq}^3} (q^2-p^2) \\
&= \frac{(1-p)^2-p^2}{\sqrt{pq}} \\
&= \frac{1-2p}{\sqrt{pq}} = \frac{q-p}{\sqrt{pq}}.
\end{align}</math>

==Higher moments and cumulants==
The raw moments are all equal because <math>1^k=1</math> and <math>0^k=0</math>.

:<math>\operatorname{E}[X^k] = \Pr(X=1)\cdot 1^k + \Pr(X=0)\cdot 0^k = p \cdot 1 + q\cdot 0 = p = \operatorname{E}[X].</math>

The central moment of order <math>k</math> is given by
:<math>
\mu_k =(1-p)(-p)^k +p(1-p)^k.
</math>
The first six central moments are
:<math>\begin{align}
\mu_1 &= 0, \\
\mu_2 &= p(1-p), \\
\mu_3 &= p(1-p)(1-2p), \\
\mu_4 &= p(1-p)(1-3p(1-p)), \\
\mu_5 &= p(1-p)(1-2p)(1-2p(1-p)), \\
\mu_6 &= p(1-p)(1-5p(1-p)(1-p(1-p))).
\end{align}</math>
The higher central moments can be expressed more compactly in terms of <math>\mu_2</math> and <math>\mu_3</math>
:<math>\begin{align}
\mu_4 &= \mu_2 (1-3\mu_2 ), \\
\mu_5 &= \mu_3 (1-2\mu_2 ), \\
\mu_6 &= \mu_2 (1-5\mu_2 (1-\mu_2 )).
\end{align}</math>
The first six cumulants are
:<math>\begin{align}
\kappa_1 &= p, \\
\kappa_2 &= \mu_2 , \\
\kappa_3 &= \mu_3 , \\
\kappa_4 &= \mu_2 (1-6\mu_2 ), \\
\kappa_5 &= \mu_3 (1-12\mu_2 ), \\
\kappa_6 &= \mu_2 (1-30\mu_2 (1-4\mu_2 )).
\end{align}</math>

==Entropy and Fisher's Information==

===Entropy===
Entropy is a measure of uncertainty or randomness in a probability distribution. For a Bernoulli random variable <math>X</math> with success probability <math>p</math> and failure probability <math>q = 1 - p</math>, the entropy <math>H(X)</math> is defined as:

:<math>\begin{align}
H(X) &= \mathbb{E}_p \ln (\frac{1}{P(X)}) = - [P(X = 0) \ln P(X = 0) + P(X = 1) \ln P(X = 1)] \\
H(X) &= - (q \ln q + p \ln p) , \quad  q = P(X = 0),  p = P(X = 1)
\end{align}</math>

The entropy is maximized when <math>p = 0.5</math>, indicating the highest level of uncertainty when both outcomes are equally likely. The entropy is zero when <math>p = 0</math> or <math>p = 1</math>, where one outcome is certain.

===Fisher's Information===
Fisher information measures the amount of information that an observable random variable <math>X</math> carries about an unknown parameter <math>p</math> upon which the probability of <math>X</math> depends. For the Bernoulli distribution, the Fisher information with respect to the parameter <math>p</math> is given by:

:<math>\begin{align}
I(p) = \frac{1}{pq}
\end{align}</math>

'''Proof:'''

*The '''Likelihood Function''' for a Bernoulli random variable<math>X</math> is:

:<math>\begin{align}   
L(p; X) = p^X (1 - p)^{1 - X}
\end{align}</math>
This represents the probability of observing <math>X</math> given the parameter <math>p</math>.

*The '''Log-Likelihood Function''' is:

:<math>\begin{align}   
\ln L(p; X) = X \ln p + (1 - X) \ln (1 - p)
\end{align}</math>

*The Score Function (the first derivative of the log-likelihood w.r.t. <math>p</math> is:

:<math>\begin{align}   
\frac{\partial}{\partial p} \ln L(p; X) = \frac{X}{p} - \frac{1 - X}{1 - p}
 \end{align}</math>

*The second derivative of the log-likelihood function is:

:<math>\begin{align}   
\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{X}{p^2} - \frac{1 - X}{(1 - p)^2}
\end{align}</math>

*'''Fisher information''' is calculated as the negative expected value of the second derivative of the log-likelihood:

:<math>\begin{align}   
I(p) = -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] = -\left(-\frac{p}{p^2} - \frac{1 - p}{(1 - p)^2}\right) = \frac{1}{p(1-p)} = \frac{1}{pq}
\end{align}</math>

It is maximized when <math>p = 0.5</math>, reflecting maximum uncertainty and thus maximum information about the parameter <math>p</math>.

==Related distributions==
*If <math>X_1,\dots,X_n</math> are independent, identically distributed  ([[Independent and identically distributed random variables|i.i.d.]])  random variables, all [[Bernoulli trial]]s with success probability&nbsp;''p'', then their [[Sum of independent random variables|sum is distributed]] according to a [[binomial distribution]] with parameters ''n'' and ''p'':
*:<math>\sum_{k=1}^n X_k \sim \operatorname{B}(n,p)</math> ([[binomial distribution]]).<ref name=":0" />

:The Bernoulli distribution is simply <math>\operatorname{B}(1, p)</math>, also written as <math display="inline">\mathrm{Bernoulli} (p).</math>

*The [[categorical distribution]] is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
*The [[Beta distribution]] is the [[conjugate prior]] of the Bernoulli distribution.<ref>{{Cite web |last1=Orloff |first1=Jeremy |last2=Bloom |first2=Jonathan |date= |title=Conjugate priors: Beta and normal |url=https://math.mit.edu/~dav/05.dir/class15-prep.pdf |access-date=October 20, 2023 |website=math.mit.edu}}</ref>
*The [[geometric distribution]] models the number of independent and identical Bernoulli trials needed to get one success.
*If <math display="inline">Y \sim \mathrm{Bernoulli}\left(\frac{1}{2}\right)</math>, then <math display="inline">2Y - 1</math> has a [[Rademacher distribution]].

==See also==
*[[Bernoulli process]], a [[random process]] consisting of a sequence of [[Independence (probability theory)|independent]] Bernoulli trials
*[[Bernoulli sampling]]
*[[Binary entropy function]]
*[[Binary decision diagram]]

==References==
{{Reflist}}

==Further reading==
*{{cite book |last1=Johnson |first1=N. L. |last2=Kotz |first2=S. |last3=Kemp |first3=A. |year=1993 |title=Univariate Discrete Distributions |edition=2nd |publisher=Wiley |isbn=0-471-54897-9 }}
*{{cite book |first=John G. |last=Peatman |title=Introduction to Applied Statistics |location=New York |publisher=Harper & Row |year=1963 |pages=162–171 }}

==External links==
{{Commons category|Bernoulli distribution}}
*{{springer|title=Binomial distribution|id=p/b016420}}.
*{{MathWorld|title=Bernoulli Distribution|urlname=BernoulliDistribution}}
* Interactive graphic: [http://www.math.wm.edu/~leemis/chart/UDR/UDR.html Univariate Distribution Relationships].

{{ProbDistributions|discrete-finite}}

{{DEFAULTSORT:Bernoulli Distribution}}
[[Category:Discrete distributions]]
[[Category:Conjugate prior distributions]]
[[Category:Exponential family distributions]]