Bernoulli distribution
Template:Short description Template:Use American English Template:Probability distribution</math> |kurtosis =<math>\frac{1 - 6pq}{pq}</math> |entropy =<math>-q\ln q - p\ln p</math> |mgf =<math>q+pe^t</math> |char =<math>q+pe^{it}</math> |pgf =<math>q+pz</math> |fisher =<math> \frac{1}{pq} </math>| }} Template:Probability fundamentals
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,<ref>Template:Cite book</ref> is the discrete probability distribution of a random variable which takes the value 1 with probability <math>p</math> and the value 0 with probability <math>q = 1-p</math>. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have <math>p \neq 1/2.</math>
The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.<ref>Template:Cite book</ref>
PropertiesEdit
If <math>X</math> is a random variable with a Bernoulli distribution, then:
- <math>\Pr(X=1) = p, \Pr(X=0) = q =1 - p.</math>
The probability mass function <math>f</math> of this distribution, over possible outcomes k, is
- <math> f(k;p) = \begin{cases}
p & \text{if }k=1, \\ q = 1-p & \text {if } k = 0. \end{cases}</math><ref name=":0">Template:Cite book</ref>
This can also be expressed as
- <math>f(k;p) = p^k (1-p)^{1-k} \quad \text{for } k\in\{0,1\}</math>
or as
- <math>f(k;p)=pk+(1-p)(1-k) \quad \text{for } k\in\{0,1\}.</math>
The Bernoulli distribution is a special case of the binomial distribution with <math>n = 1.</math><ref name="McCullagh1989Ch422">Template:Cite book</ref>
The kurtosis goes to infinity for high and low values of <math>p,</math> but for <math>p=1/2</math> the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.
The Bernoulli distributions for <math>0 \le p \le 1</math> form an exponential family.
The maximum likelihood estimator of <math>p</math> based on a random sample is the sample mean.
MeanEdit
The expected value of a Bernoulli random variable <math>X</math> is
- <math>\operatorname{E}[X]=p</math>
This is because for a Bernoulli distributed random variable <math>X</math> with <math>\Pr(X=1)=p</math> and <math>\Pr(X=0)=q</math> we find
- <math>\operatorname{E}[X] = \Pr(X=1)\cdot 1 + \Pr(X=0)\cdot 0
= p \cdot 1 + q\cdot 0 = p.</math><ref name=":0" />
VarianceEdit
The variance of a Bernoulli distributed <math>X</math> is
- <math>\operatorname{Var}[X] = pq = p(1-p)</math>
We first find
- <math>\operatorname{E}[X^2] = \Pr(X=1)\cdot 1^2 + \Pr(X=0)\cdot 0^2 </math>
- <math> = p \cdot 1^2 + q\cdot 0^2 = p = \operatorname{E}[X] </math>
From this follows
- <math>\operatorname{Var}[X] = \operatorname{E}[X^2]-\operatorname{E}[X]^2 = \operatorname{E}[X]-\operatorname{E}[X]^2 </math>
- <math> = p-p^2 = p(1-p) = pq</math><ref name=":0" />
With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside <math>[0,1/4]</math>.
SkewnessEdit
The skewness is <math>\frac{q-p}{\sqrt{pq}}=\frac{1-2p}{\sqrt{pq}}</math>. When we take the standardized Bernoulli distributed random variable <math>\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}</math> we find that this random variable attains <math>\frac{q}{\sqrt{pq}}</math> with probability <math>p</math> and attains <math>-\frac{p}{\sqrt{pq}}</math> with probability <math>q</math>. Thus we get
- <math>\begin{align}
\gamma_1 &= \operatorname{E} \left[\left(\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}\right)^3\right] \\ &= p \cdot \left(\frac{q}{\sqrt{pq}}\right)^3 + q \cdot \left(-\frac{p}{\sqrt{pq}}\right)^3 \\ &= \frac{1}{\sqrt{pq}^3} \left(pq^3-qp^3\right) \\ &= \frac{pq}{\sqrt{pq}^3} (q^2-p^2) \\ &= \frac{(1-p)^2-p^2}{\sqrt{pq}} \\ &= \frac{1-2p}{\sqrt{pq}} = \frac{q-p}{\sqrt{pq}}. \end{align}</math>
Higher moments and cumulantsEdit
The raw moments are all equal because <math>1^k=1</math> and <math>0^k=0</math>.
- <math>\operatorname{E}[X^k] = \Pr(X=1)\cdot 1^k + \Pr(X=0)\cdot 0^k = p \cdot 1 + q\cdot 0 = p = \operatorname{E}[X].</math>
The central moment of order <math>k</math> is given by
- <math>
\mu_k =(1-p)(-p)^k +p(1-p)^k. </math> The first six central moments are
- <math>\begin{align}
\mu_1 &= 0, \\ \mu_2 &= p(1-p), \\ \mu_3 &= p(1-p)(1-2p), \\ \mu_4 &= p(1-p)(1-3p(1-p)), \\ \mu_5 &= p(1-p)(1-2p)(1-2p(1-p)), \\ \mu_6 &= p(1-p)(1-5p(1-p)(1-p(1-p))). \end{align}</math> The higher central moments can be expressed more compactly in terms of <math>\mu_2</math> and <math>\mu_3</math>
- <math>\begin{align}
\mu_4 &= \mu_2 (1-3\mu_2 ), \\ \mu_5 &= \mu_3 (1-2\mu_2 ), \\ \mu_6 &= \mu_2 (1-5\mu_2 (1-\mu_2 )). \end{align}</math> The first six cumulants are
- <math>\begin{align}
\kappa_1 &= p, \\ \kappa_2 &= \mu_2 , \\ \kappa_3 &= \mu_3 , \\ \kappa_4 &= \mu_2 (1-6\mu_2 ), \\ \kappa_5 &= \mu_3 (1-12\mu_2 ), \\ \kappa_6 &= \mu_2 (1-30\mu_2 (1-4\mu_2 )). \end{align}</math>
Entropy and Fisher's InformationEdit
EntropyEdit
Entropy is a measure of uncertainty or randomness in a probability distribution. For a Bernoulli random variable <math>X</math> with success probability <math>p</math> and failure probability <math>q = 1 - p</math>, the entropy <math>H(X)</math> is defined as:
- <math>\begin{align}
H(X) &= \mathbb{E}_p \ln (\frac{1}{P(X)}) = - [P(X = 0) \ln P(X = 0) + P(X = 1) \ln P(X = 1)] \\ H(X) &= - (q \ln q + p \ln p) , \quad q = P(X = 0), p = P(X = 1) \end{align}</math>
The entropy is maximized when <math>p = 0.5</math>, indicating the highest level of uncertainty when both outcomes are equally likely. The entropy is zero when <math>p = 0</math> or <math>p = 1</math>, where one outcome is certain.
Fisher's InformationEdit
Fisher information measures the amount of information that an observable random variable <math>X</math> carries about an unknown parameter <math>p</math> upon which the probability of <math>X</math> depends. For the Bernoulli distribution, the Fisher information with respect to the parameter <math>p</math> is given by:
- <math>\begin{align}
I(p) = \frac{1}{pq} \end{align}</math>
Proof:
- The Likelihood Function for a Bernoulli random variable<math>X</math> is:
- <math>\begin{align}
L(p; X) = p^X (1 - p)^{1 - X} \end{align}</math> This represents the probability of observing <math>X</math> given the parameter <math>p</math>.
- The Log-Likelihood Function is:
- <math>\begin{align}
\ln L(p; X) = X \ln p + (1 - X) \ln (1 - p) \end{align}</math>
- The Score Function (the first derivative of the log-likelihood w.r.t. <math>p</math> is:
- <math>\begin{align}
\frac{\partial}{\partial p} \ln L(p; X) = \frac{X}{p} - \frac{1 - X}{1 - p}
\end{align}</math>
- The second derivative of the log-likelihood function is:
- <math>\begin{align}
\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{X}{p^2} - \frac{1 - X}{(1 - p)^2} \end{align}</math>
- Fisher information is calculated as the negative expected value of the second derivative of the log-likelihood:
- <math>\begin{align}
I(p) = -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] = -\left(-\frac{p}{p^2} - \frac{1 - p}{(1 - p)^2}\right) = \frac{1}{p(1-p)} = \frac{1}{pq} \end{align}</math>
It is maximized when <math>p = 0.5</math>, reflecting maximum uncertainty and thus maximum information about the parameter <math>p</math>.
Related distributionsEdit
- If <math>X_1,\dots,X_n</math> are independent, identically distributed (i.i.d.) random variables, all Bernoulli trials with success probability p, then their sum is distributed according to a binomial distribution with parameters n and p:
- <math>\sum_{k=1}^n X_k \sim \operatorname{B}(n,p)</math> (binomial distribution).<ref name=":0" />
- The Bernoulli distribution is simply <math>\operatorname{B}(1, p)</math>, also written as <math display="inline">\mathrm{Bernoulli} (p).</math>
- The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
- The Beta distribution is the conjugate prior of the Bernoulli distribution.<ref>{{#invoke:citation/CS1|citation
|CitationClass=web }}</ref>
- The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
- If <math display="inline">Y \sim \mathrm{Bernoulli}\left(\frac{1}{2}\right)</math>, then <math display="inline">2Y - 1</math> has a Rademacher distribution.
See alsoEdit
- Bernoulli process, a random process consisting of a sequence of independent Bernoulli trials
- Bernoulli sampling
- Binary entropy function
- Binary decision diagram
ReferencesEdit
Further readingEdit
External linksEdit
- Template:Springer.
- {{#invoke:Template wrapper|{{#if:|list|wrap}}|_template=cite web
|_exclude=urlname, _debug, id |url = https://mathworld.wolfram.com/{{#if:BernoulliDistribution%7CBernoulliDistribution.html}} |title = Bernoulli Distribution |author = Weisstein, Eric W. |website = MathWorld |access-date = |ref = Template:SfnRef }}
- Interactive graphic: Univariate Distribution Relationships.