Editing Geometric distribution

{{short description|Probability distribution}}
{{Distinguish|Hypergeometric distribution}}
{{Infobox probability distribution 2
| name        = Geometric
| type        = mass
| pdf_image   = [[File:geometric pmf.svg|300px]]
| cdf_image   = [[File:geometric cdf.svg|300px]]
| parameters  = <math>0 < p \leq 1</math> success probability ([[real number|real]])
| support     = ''k'' trials where <math>k \in \mathbb{N} = \{1, 2, 3, \dotsc\}</math>
| pdf         = <math>(1 - p)^{k-1}p</math>
| cdf         = <math>1-(1 - p)^{\lfloor x\rfloor}</math> for <math>x\geq 1</math>,<br /><math>0</math> for <math>x<1</math>
| mean        = <math>\frac{1}{p}</math>
| median      = <math>\left\lceil \frac{-1}{\log_2(1-p)} \right\rceil</math> <br />
(not unique if <math>-1/\log_2(1-p)</math> is an integer)
| mode        = <math>1</math>
| variance    = <math>\frac{1-p}{p^2}</math>
| skewness    = <math>\frac{2-p}{\sqrt{1-p}}</math>
| kurtosis    = <math>6+\frac{p^2}{1-p}</math>
| entropy     = <math>\tfrac{-(1-p)\log (1-p) - p \log p}{p}</math>
| fisher      = <math>\tfrac{1}{p^2 \cdot(1-p)}</math>
| mgf         = <math>\frac{pe^t}{1-(1-p) e^t},</math><br />for <math>t<-\ln(1-p)</math>
| char        = <math>\frac{pe^{it}}{1-(1-p)e^{it}}</math>
| pgf         = <math>\frac{pz}{1-(1-p)z}</math>
| parameters2 = <math>0 < p \leq 1</math> success probability ([[real number|real]])
| support2    = ''k'' failures where <math>k \in \mathbb{N}_0 = \{0, 1, 2, \dotsc\}</math>
| pdf2        = <math>(1 - p)^k p</math>
| cdf2        = <math>1-(1 - p)^{\lfloor x\rfloor+1}</math> for <math>x\geq 0</math>,<br /><math>0</math> for <math>x<0</math>
| mean2       = <math>\frac{1-p}{p}</math>
| median2     = <math>\left\lceil \frac{-1}{\log_2(1-p)} \right\rceil - 1</math> <br />
(not unique if <math>-1/\log_2(1-p)</math> is an integer)
| mode2       = <math>0</math>
| variance2   = <math>\frac{1-p}{p^2}</math>
| skewness2   = <math>\frac{2-p}{\sqrt{1-p}}</math>
| kurtosis2   = <math>6+\frac{p^2}{1-p}</math>
| entropy2    = <math>\tfrac{-(1-p)\log (1-p) - p \log p}{p}</math>
| fisher2      = <math>\tfrac{1}{p^2 \cdot(1-p)}</math>
| mgf2        = <math>\frac{p}{1-(1-p)e^t},</math><br />for <math>t<-\ln(1-p)</math>
| char2       = <math>\frac{p}{1-(1-p)e^{it}}</math>
| pgf2        = <math>\frac{p}{1-(1-p)z}</math>
}}
In [[probability theory]] and [[statistics]], the '''geometric distribution''' is either one of two [[discrete probability distribution]]s:

* The probability distribution of the number <math>X</math> of [[Bernoulli trial]]s needed to get one success, supported on <math>\mathbb{N} = \{1,2,3,\ldots\}</math>;
* The probability distribution of the number <math>Y=X-1</math> of failures before the first success, supported on <math>\mathbb{N}_0 = \{0, 1, 2, \ldots \} </math>.

These two different geometric distributions should not be confused with each other. Often, the name ''shifted'' geometric distribution is adopted for the former one (distribution of <math>X</math>); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires <math>k</math> independent trials, each with success probability <math>p</math>. If the probability of success on each trial is <math>p</math>, then the probability that the <math>k</math>-th trial is the first success is

:<math>\Pr(X = k) = (1-p)^{k-1}p</math>

for <math>k=1,2,3,4,\dots</math>

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

:<math>\Pr(Y=k) =\Pr(X=k+1)= (1 - p)^k p</math>

for <math>k=0,1,2,3,\dots</math>

The geometric distribution gets its name because its probabilities follow a [[geometric sequence]]. It is sometimes called the Furry distribution after [[Wendell H. Furry]].<ref name=":8" />{{Rp|page=210}}

==Definition==

The geometric distribution is the [[discrete probability distribution]] that describes when the first success in an infinite sequence of [[Independent and identically distributed random variables|independent and identically distributed]] [[Bernoulli trial|Bernoulli trials]] occurs. Its [[probability mass function]] depends on its parameterization and [[Support (mathematics)|support]]. When supported on <math>\mathbb{N}</math>, the probability mass function is <math display="block">P(X = k) = (1 - p)^{k-1} p</math> where <math>k = 1, 2, 3, \dotsc</math> is the number of trials and <math>p</math> is the probability of success in each trial.<ref name=":1">{{Cite book |last1=Nagel |first1=Werner |url=https://onlinelibrary.wiley.com/doi/book/10.1002/9781119243496 |title=Probability and Conditional Expectation: Fundamentals for the Empirical Sciences |last2=Steyer |first2=Rolf |date=2017-04-04 |publisher=Wiley |isbn=978-1-119-24352-6 |edition=1st |series=Wiley Series in Probability and Statistics |pages= |language=en |doi=10.1002/9781119243496}}</ref>{{Rp|pages=260–261}}

The support may also be <math>\mathbb{N}_0</math>, defining <math>Y=X-1</math>. This alters the probability mass function into <math display="block">P(Y = k) = (1 - p)^k p</math> where <math>k = 0, 1, 2, \dotsc</math> is the number of failures before the first success.<ref name=":2">{{Cite book |last1=Chattamvelli |first1=Rajan |url=https://link.springer.com/10.1007/978-3-031-02425-2 |title=Discrete Distributions in Engineering and the Applied Sciences |last2=Shanmugam |first2=Ramalingam |publisher=Springer International Publishing |year=2020 |isbn=978-3-031-01297-6 |series=Synthesis Lectures on Mathematics & Statistics |location=Cham |language=en |doi=10.1007/978-3-031-02425-2}}</ref>{{Rp|page=66}}

An alternative parameterization of the distribution gives the probability mass function <math display="block">P(Y = k) = \left(\frac{P}{Q}\right)^k \left(1-\frac{P}{Q}\right)</math> where <math>P = \frac{1-p}{p}</math> and <math>Q = \frac{1}{p}</math>.<ref name=":8" />{{Rp|pages=208–209}}

An example of a geometric distribution arises from rolling a six-sided [[dice|die]] <!-- "die" is the correct singular form of the plural "dice." -->until a "1" appears. Each roll is [[Independence (probability theory)|independent]] with a <math>1/6</math> chance of success. The number of rolls needed follows a geometric distribution with <math>p=1/6</math>.

==Properties ==

=== Memorylessness ===
{{Main article|Memorylessness}}
The geometric distribution is the only memoryless discrete probability distribution.<ref>{{Cite book |last1=Dekking |first1=Frederik Michel |url=http://link.springer.com/10.1007/1-84628-168-7 |title=A Modern Introduction to Probability and Statistics |last2=Kraaikamp |first2=Cornelis |last3=Lopuhaä |first3=Hendrik Paul |last4=Meester |first4=Ludolf Erwin |date=2005 |publisher=Springer London |isbn=978-1-85233-896-1 |series=Springer Texts in Statistics |location=London |page=50 |language=en |doi=10.1007/1-84628-168-7}}</ref> It is the discrete version of the same property found in the [[exponential distribution]].<ref name=":8">{{Cite book |last1=Johnson |first1=Norman L. |url=https://onlinelibrary.wiley.com/doi/book/10.1002/0471715816 |title=Univariate Discrete Distributions |last2=Kemp |first2=Adrienne W.|author2-link=Adrienne W. Kemp |last3=Kotz |first3=Samuel |date=2005-08-19 |publisher=Wiley |isbn=978-0-471-27246-5 |edition=1 |series=Wiley Series in Probability and Statistics |page= |language=en |doi=10.1002/0471715816}}</ref>{{Rp|page=228}} The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success. 

Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables.<ref>{{Cite web |last=Weisstein |first=Eric W. |title=Memoryless |url=https://mathworld.wolfram.com/ |access-date=2024-07-25 |website=mathworld.wolfram.com |language=en}}</ref> Expressed in terms of [[conditional probability]], the two definitions are<math display="block">\Pr(X>m+n\mid X>n)=\Pr(X>m),</math>

and<math display="block">\Pr(Y>m+n\mid Y\geq n)=\Pr(Y>m),</math>

where <math>m</math> and <math>n</math> are [[Natural number|natural numbers]], <math>X</math> is a geometrically distributed random variable defined over <math>\mathbb{N}</math>, and <math>Y</math> is a geometrically distributed random variable defined over <math>\mathbb{N}_0</math>. Note that these definitions are not equivalent for discrete random variables; <math>Y</math> does not satisfy the first equation and <math>X</math> does not satisfy the second. 

===Moments and cumulants===
The [[expected value]] and [[variance]] of a geometrically distributed [[random variable]] <math>X</math> defined over <math>\mathbb{N}</math> is<ref name=":1" />{{Rp|page=261}}<math display="block">\operatorname{E}(X) = \frac{1}{p},
 \qquad\operatorname{var}(X) = \frac{1-p}{p^2}.</math> With a geometrically distributed random variable <math>Y</math> defined over <math>\mathbb{N}_0</math>, the expected value changes into<math display="block">\operatorname{E}(Y) = \frac{1-p} p,</math>while the variance stays the same.<ref name=":0">{{Cite book |last1=Forbes |first1=Catherine |url=https://onlinelibrary.wiley.com/doi/book/10.1002/9780470627242 |title=Statistical Distributions |last2=Evans |first2=Merran |last3=Hastings |first3=Nicholas |last4=Peacock |first4=Brian |date=2010-11-29 |publisher=Wiley |isbn=978-0-470-39063-4 |edition=1st |pages= |language=en |doi=10.1002/9780470627242}}</ref>{{Rp|pages=114–115}}

For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is <math>\frac{1}{1/6} = 6</math> and the average number of failures is <math>\frac{1 - 1/6}{1/6} = 5</math>.

The [[Moment-generating function|moment generating function]] of the geometric distribution when defined over <math>
\mathbb{N}
</math> and <math>\mathbb{N}_0</math> respectively is<ref>{{Cite book |last1=Bertsekas |first1=Dimitri P. |url=https://archive.org/details/introductiontopr0000bert_p5i9_2ndedi |title=Introduction to probability |last2=Tsitsiklis |first2=John N. |publisher=Athena Scientific |year=2008 |isbn=978-1-886529-23-6 |edition=2nd |series=Optimization and computation series |location=Belmont |page=235 |language=en}}</ref><ref name=":0" />{{Rp|page=114}}<math display="block">\begin{align}
M_X(t) &= \frac{pe^t}{1-(1-p)e^t} \\
M_Y(t) &= \frac{p}{1-(1-p)e^t}, t < -\ln(1-p)
\end{align}</math>The moments for the number of failures before the first success are given by
: <math>
\begin{align}
\mathrm{E}(Y^n) & {} =\sum_{k=0}^\infty (1-p)^k p\cdot k^n \\
& {} =p \operatorname{Li}_{-n}(1-p) & (\text{for }n \neq 0)
\end{align}
</math>

where <math> \operatorname{Li}_{-n}(1-p) </math> is the [[Polylogarithm|polylogarithm function]].<ref>{{Cite web |last=Weisstein |first=Eric W. |title=Geometric Distribution |url=https://mathworld.wolfram.com/ |access-date=2024-07-13 |website=[[MathWorld]] |language=en}}</ref>

The [[cumulant generating function]] of the geometric distribution defined over <math>\mathbb{N}_0</math> is<ref name=":8" />{{Rp|page=216}} <math display="block">K(t) = \ln p - \ln (1 - (1-p)e^t)</math>The [[cumulant]]s <math>\kappa_r</math> satisfy the recursion<math display="block">\kappa_{r+1} = q \frac{\delta\kappa_r}{\delta q}, r=1,2,\dotsc</math>where <math>q = 1-p</math>, when defined over <math>\mathbb{N}_0</math>.<ref name=":8" />{{Rp|page=216}}

==== Proof of expected value ====
Consider the expected value <math>\mathrm{E}(X)</math> of ''X'' as above, i.e. the average number of trials until a success. 
The first trial either succeeds with probability <math>p</math>, or fails with probability <math>1-p</math>. 
If it fails, the '''remaining''' mean number of trials until a success is identical to the original mean - 
this follows from the fact that all trials are independent.

From this we get the formula:

: <math>\operatorname \mathrm{E}(X) =  p + (1-p)(1 + \mathrm{E}[X]) ,</math>

which, when solved for <math> \mathrm{E}(X) </math>, gives:

: <math>\operatorname E(X) = \frac{1}{p}.</math>

The expected number of '''failures''' <math>Y</math> can be found from the [[linearity of expectation]], <math>\mathrm{E}(Y) = \mathrm{E}(X-1) = \mathrm{E}(X) - 1 = \frac 1 p - 1 = \frac{1-p}{p}</math>. It can also be shown in the following way:

: <math>
\begin{align}
\operatorname E(Y) & =p\sum_{k=0}^\infty(1-p)^k k \\
& = p (1-p) \sum_{k=0}^\infty (1-p)^{k-1} k\\
& = p (1-p) \left(-\sum_{k=0}^\infty \frac{d}{dp}\left[(1-p)^k\right]\right) \\
& = p (1-p) \left[\frac{d}{dp}\left(-\sum_{k=0}^\infty (1-p)^k\right)\right] \\
& = p(1-p)\frac{d}{dp}\left(-\frac{1}{p}\right) \\
& = \frac{1-p}{p}.
\end{align}
</math>

The interchange of summation and differentiation is justified by the fact that convergent [[power series]] [[uniform convergence|converge uniformly]] on [[compact space|compact]] subsets of the set of points where they converge.

=== Summary statistics ===
The [[mean]] of the geometric distribution is its expected value which is, as previously discussed in [[Geometric distribution#Moments and cumulants|§ Moments and cumulants]], <math>\frac{1}{p}</math> or <math>\frac{1-p}{p}</math> when defined over <math>\mathbb{N}</math> or <math>\mathbb{N}_0</math> respectively.

The [[median]] of the geometric distribution is <math>\left\lceil -\frac{\log 2}{\log(1-p)} \right\rceil</math>when defined over <math>\mathbb{N}</math><ref>{{Cite book |last=Aggarwal |first=Charu C. |url=https://link.springer.com/10.1007/978-3-031-53282-5 |title=Probability and Statistics for Machine Learning: A Textbook |publisher=Springer Nature Switzerland |year=2024 |isbn=978-3-031-53281-8 |location=Cham |page=138 |language=en |doi=10.1007/978-3-031-53282-5}}</ref> and <math>\left\lfloor-\frac{\log 2}{\log(1-p)}\right\rfloor</math> when defined over <math>\mathbb{N}_0</math>.<ref name=":2" />{{Rp|page=69}}

The [[Mode (statistics)|mode]] of the geometric distribution is the first value in the support set. This is 1 when defined over <math>\mathbb{N}</math> and 0 when defined over <math>\mathbb{N}_0</math>.<ref name=":2" />{{Rp|page=69}}

The [[skewness]] of the geometric distribution is <math>\frac{2-p}{\sqrt{1-p}}</math>.<ref name=":0" />{{Rp|pages=|page=115}}

The [[Kurtosis risk|kurtosis]] of the geometric distribution is <math>9 + \frac{p^2}{1-p}</math>.<ref name=":0" />{{Rp|pages=|page=115}} The [[excess kurtosis]] of a distribution is the difference between its kurtosis and the kurtosis of a [[normal distribution]], <math>3</math>.<ref name=":4">{{Cite book |last=Chan |first=Stanley |url=https://probability4datascience.com/ |title=Introduction to Probability for Data Science |publisher=[[Michigan Publishing]] |year=2021 |isbn=978-1-60785-747-1 |edition=1st |language=en}}</ref>{{Rp|pages=|page=217}} Therefore, the excess kurtosis of the geometric distribution is <math>6 + \frac{p^2}{1-p}</math>. Since <math>\frac{p^2}{1-p} \geq 0</math>, the excess kurtosis is always positive so the distribution is [[leptokurtic]].<ref name=":2" />{{Rp|page=69}} In other words, the tail of a geometric distribution decays faster than a Gaussian.<ref name=":4" />{{Rp|pages=|page=217}}

==Entropy and Fisher's Information==

===Entropy (Geometric Distribution, Failures Before Success)===
Entropy is a measure of uncertainty in a probability distribution. For the geometric distribution that models the number of failures before the first success, the probability mass function is:

:<math>P(X = k) = (1 - p)^k p, \quad k = 0, 1, 2, \dots</math>

The entropy <math>H(X)</math> for this distribution is defined as:

:<math>\begin{align}
H(X) &= - \sum_{k=0}^{\infty} P(X = k) \ln P(X = k) \\
     &= - \sum_{k=0}^{\infty} (1 - p)^k p \ln \left( (1 - p)^k p \right) \\
     &= - \sum_{k=0}^{\infty} (1 - p)^k p \left[ k \ln(1 - p) + \ln p \right] \\
     &= -\log p - \frac{1 - p}{p} \log(1 - p)
\end{align}</math>

The entropy increases as the probability <math>p</math> decreases, reflecting greater uncertainty as success becomes rarer.

===Fisher's Information (Geometric Distribution, Failures Before Success)===
Fisher information measures the amount of information that an observable random variable <math>X</math> carries about an unknown parameter <math>p</math>. For the geometric distribution (failures before the first success), the Fisher information with respect to <math>p</math> is given by:

:<math>I(p) = \frac{1}{p^2(1 - p)}</math>

'''Proof:'''

*The '''Likelihood Function''' for a geometric random variable <math>X</math> is:

:<math>L(p; X) = (1 - p)^X p</math>

*The '''Log-Likelihood Function''' is:

:<math>\ln L(p; X) = X \ln(1 - p) + \ln p</math>

*The Score Function (first derivative of the log-likelihood w.r.t. <math>p</math>) is:

:<math>\frac{\partial}{\partial p} \ln L(p; X) = \frac{1}{p} - \frac{X}{1 - p}</math>

*The second derivative of the log-likelihood function is:

:<math>\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{1}{p^2} - \frac{X}{(1 - p)^2}</math>

*'''Fisher Information''' is calculated as the negative expected value of the second derivative:

:<math>\begin{align}
I(p) &= -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] \\
     &= - \left(-\frac{1}{p^2} - \frac{1 - p}{p (1 - p)^2} \right) \\
     &= \frac{1}{p^2(1 - p)}
\end{align}</math>

Fisher information increases as <math>p</math> decreases, indicating that rarer successes provide more information about the parameter <math>p</math>.

===Entropy (Geometric Distribution, Trials Until Success)===
For the geometric distribution modeling the number of trials until the first success, the probability mass function is:

:<math>P(X = k) = (1 - p)^{k - 1} p, \quad k = 1, 2, 3, \dots</math>

The entropy <math>H(X)</math> for this distribution is given by:

:<math>\begin{align}
H(X) &= - \sum_{k=1}^{\infty} P(X = k) \ln P(X = k) \\
     &= - \sum_{k=1}^{\infty} (1 - p)^{k - 1} p \ln \left( (1 - p)^{k - 1} p \right) \\
     &= - \sum_{k=1}^{\infty} (1 - p)^{k - 1} p \left[ (k - 1) \ln(1 - p) + \ln p \right] \\
     &= - \log p + \frac{1 - p}{p} \log(1 - p)
\end{align}</math>

Entropy increases as <math>p</math> decreases, reflecting greater uncertainty as the probability of success in each trial becomes smaller.

===Fisher's Information (Geometric Distribution, Trials Until Success)===
Fisher information for the geometric distribution modeling the number of trials until the first success is given by:

:<math>I(p) = \frac{1}{p^2(1 - p)}</math>

'''Proof:'''

*The '''Likelihood Function''' for a geometric random variable <math>X</math> is:

:<math>L(p; X) = (1 - p)^{X - 1} p</math>

*The '''Log-Likelihood Function''' is:

:<math>\ln L(p; X) = (X - 1) \ln(1 - p) + \ln p</math>

*The Score Function (first derivative of the log-likelihood w.r.t. <math>p</math>) is:

:<math>\frac{\partial}{\partial p} \ln L(p; X) = \frac{1}{p} - \frac{X - 1}{1 - p}</math>

*The second derivative of the log-likelihood function is:

:<math>\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{1}{p^2} - \frac{X - 1}{(1 - p)^2}</math>

*'''Fisher Information''' is calculated as the negative expected value of the second derivative:

:<math>\begin{align}
I(p) &= -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] \\
     &= - \left(-\frac{1}{p^2} - \frac{1 - p}{p (1 - p)^2} \right) \\
     &= \frac{1}{p^2(1 - p)}
\end{align}</math>

=== General properties ===

* The [[probability-generating function|probability generating function]]s of geometric random variables <math>
X
</math> and <math>
Y
</math> defined over <math>
\mathbb{N}
</math> and <math>
\mathbb{N}_0
</math> are, respectively,<ref name=":0" />{{Rp|pages=114–115}}
::<math>\begin{align}
G_X(s) & = \frac{s\,p}{1-s\,(1-p)}, \\[10pt]
G_Y(s) & = \frac{p}{1-s\,(1-p)}, \quad |s| < (1-p)^{-1}.
\end{align}</math>

* The [[Characteristic function (probability theory)|characteristic function]] <math>\varphi(t)</math> is equal to <math>G(e^{it})</math> so the geometric distribution's characteristic function, when defined over <math>
\mathbb{N}
</math> and <math>
\mathbb{N}_0
</math> respectively, is<ref name=":9">{{Cite book |url=http://link.springer.com/10.1007/978-3-642-04898-2 |title=International Encyclopedia of Statistical Science |publisher=Springer Berlin Heidelberg |year=2011 |isbn=978-3-642-04897-5 |editor-last=Lovric |editor-first=Miodrag |edition=1st |location=Berlin, Heidelberg |language=en |doi=10.1007/978-3-642-04898-2}}</ref>{{Rp|page=1630}}<math display="block">\begin{align}
\varphi_X(t) &= \frac{pe^{it}}{1-(1-p)e^{it}},\\[10pt]
\varphi_Y(t) &= \frac{p}{1-(1-p)e^{it}}.
\end{align}</math>
* The [[Entropy (information theory)|entropy]] of a geometric distribution with parameter <math>p</math> is<ref name=":7" /><math display="block">-\frac{p \log_2 p + (1-p) \log_2 (1-p)}{p}</math>
* Given a [[mean]], the geometric distribution is the [[maximum entropy probability distribution]] of all discrete probability distributions. The corresponding continuous distribution is the [[exponential distribution]].<ref>{{Cite journal |last1=Lisman |first1=J. H. C. |last2=Zuylen |first2=M. C. A. van |date=March 1972 |title=Note on the generation of most probable frequency distributions |url=https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9574.1972.tb00152.x |journal=[[Statistica Neerlandica]] |language=en |volume=26 |issue=1 |pages=19–23 |doi=10.1111/j.1467-9574.1972.tb00152.x |issn=0039-0402}}</ref>
* The geometric distribution defined on <math>
\mathbb{N}_0
</math> is [[infinite divisibility (probability)|infinitely divisible]], that is, for any positive integer <math>n</math>, there exist <math>n</math> independent identically distributed random variables whose sum is also geometrically distributed. This is because the negative binomial distribution can be derived from a Poisson-stopped sum of [[Logarithmic distribution|logarithmic random variables]].<ref name=":9" />{{Rp|pages=606–607}}
* The decimal digits of the geometrically distributed random variable ''Y'' are a sequence of [[statistical independence|independent]] (and ''not'' identically distributed) random variables.{{citation needed|date=May 2012}}  For example, the <!-- "hundreds" is correct; "hundredth" is wrong -->hundreds<!-- "hundreds" is correct; "hundredth" is wrong --> digit ''D'' has this probability distribution:

::<math>\Pr(D=d) = {q^{100d} \over 1 + q^{100} + q^{200} + \cdots + q^{900}},</math>
:where ''q''&nbsp;=&nbsp;1&nbsp;&minus;&nbsp;''p'', and similarly for the other digits, and, more generally, similarly for [[numeral system]]s with other bases than 10.  When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are [[indecomposable distribution|indecomposable]].

* [[Golomb coding]] is the optimal [[prefix code]]{{clarify|date=May 2012}} for the geometric discrete distribution.<ref name=":7">{{Cite journal|last1=Gallager|first1=R.|last2=van Voorhis|first2=D.|date=March 1975|title=Optimal source codes for geometrically distributed integer alphabets (Corresp.)|journal=IEEE Transactions on Information Theory|volume=21|issue=2|pages=228–230|doi=10.1109/TIT.1975.1055357|issn=0018-9448}}</ref>

==Related distributions==

* The sum of <math>r</math> [[Statistical Independence|independent]] geometric random variables with parameter <math>p</math> is a [[Negative binomial distribution|negative binomial]] random variable with parameters <math>r</math> and <math>p</math>.<ref>{{Cite book |last=Pitman |first=Jim |url=http://link.springer.com/10.1007/978-1-4612-4374-8 |title=Probability |date=1993 |publisher=Springer New York |isbn=978-0-387-94594-1 |location=New York, NY |page=372 |language=en |doi=10.1007/978-1-4612-4374-8}}</ref> The geometric distribution is a special case of the negative binomial distribution, with <math>r=1</math>.

*The geometric distribution is a special case of discrete [[compound Poisson distribution]].<ref name=":9" />{{Rp|page=606}}
* The minimum of <math>n</math> geometric random variables with parameters <math>p_1, \dotsc, p_n</math> is also geometrically distributed with parameter <math>1 - \prod_{i=1}^n (1-p_i)</math>.<ref>{{cite journal |last1=Ciardo |first1=Gianfranco |last2=Leemis |first2=Lawrence M. |last3=Nicol |first3=David |date=1 June 1995 |title=On the minimum of independent geometrically distributed random variables |url=https://dx.doi.org/10.1016/0167-7152%2894%2900130-Z |journal=Statistics & Probability Letters |language=en |volume=23 |issue=4 |pages=313–326 |doi=10.1016/0167-7152(94)00130-Z |s2cid=1505801 |hdl-access=free |hdl=2060/19940028569}}</ref>

* Suppose 0&nbsp;<&nbsp;''r''&nbsp;<&nbsp;1, and for ''k''&nbsp;=&nbsp;1,&nbsp;2,&nbsp;3,&nbsp;... the random variable ''X''<sub>''k''</sub> has a [[Poisson distribution]] with expected value ''r''<sup>''k''</sup>/''k''.  Then

::<math>\sum_{k=1}^\infty k\,X_k</math>

:has a geometric distribution taking values in <math>\mathbb{N}_0</math>, with expected value ''r''/(1&nbsp;&minus;&nbsp;''r'').{{citation needed|date=May 2012}}

* The [[exponential distribution]] is the continuous analogue of the geometric distribution. Applying the [[Floor and ceiling functions|floor]] function to the exponential distribution with parameter <math>\lambda</math> creates a geometric distribution with parameter <math>p=1-e^{-\lambda}</math> defined over <math>\mathbb{N}_0</math>.<ref name=":2" />{{Rp|page=74}} This can be used to generate geometrically distributed random numbers as detailed in [[Geometric distribution#Random variate generation|§ Random variate generation]].

* If ''p'' = 1/''n'' and ''X'' is geometrically distributed with parameter ''p'', then the distribution of ''X''/''n'' approaches an [[exponential distribution]] with expected value 1 as ''n''&nbsp;&rarr;&nbsp;&infin;, since<math display="block">
\begin{align}
\Pr(X/n>a)=\Pr(X>na) & = (1-p)^{na} = \left(1-\frac 1 n \right)^{na} = \left[ \left( 1-\frac 1 n \right)^n \right]^{a} \\
& \to [e^{-1}]^{a} = e^{-a} \text{ as } n\to\infty.
\end{align}
</math>More generally, if ''p''&nbsp;=&nbsp;''λ''/''n'', where ''λ'' is a parameter, then as ''n''&rarr;&nbsp;&infin; the distribution of ''X''/''n'' approaches an exponential distribution with rate ''λ'':<math>\Pr(X>nx)=\lim_{n \to \infty}(1-\lambda /n)^{nx}=e^{-\lambda x}</math> therefore the distribution function of ''X''/''n'' converges to <math>1-e^{-\lambda x}</math>, which is that of an exponential random variable.{{Cn|date=July 2024}}
* The [[index of dispersion]] of the geometric distribution is <math>\frac{1}{p}</math> and its [[coefficient of variation]] is <math>\frac{1}{\sqrt{1-p}}</math>. The distribution is [[Overdispersion|overdispersed]].<ref name=":8" />{{Rp|page=216}}

==Statistical inference==
The true parameter <math>p</math> of an unknown geometric distribution can be inferred through estimators and conjugate distributions.

=== Method of moments ===
Provided they exist, the first <math>l</math> moments of a probability distribution can be estimated from a sample <math>x_1, \dotsc, x_n</math> using the formula<math display="block">m_i = \frac{1}{n} \sum_{j=1}^n x^i_j</math>where <math>m_i</math> is the <math>i</math>th sample moment and <math>1 \leq i \leq l</math>.<ref name=":5">{{Cite book |last1=Evans |first1=Michael |url=https://www.utstat.toronto.edu/mikevans/jeffrosenthal/ |title=Probability and Statistics: The Science of Uncertainty |last2=Rosenthal |first2=Jeffrey |year=2023 |isbn=978-1429224628 |edition=2nd |pages= |publisher=Macmillan Learning |language=en}}</ref>{{Rp|pages=349–350}} Estimating <math>\mathrm{E}(X)</math> with <math>m_1</math> gives the [[sample mean]], denoted <math>
\bar{x}
</math>. Substituting this estimate in the formula for the expected value of a geometric distribution and solving for <math>
p
</math> gives the estimators <math>
\hat{p} = \frac{1}{\bar{x}}
</math> and <math>
\hat{p} = \frac{1}{\bar{x}+1}
</math> when supported on <math>\mathbb{N}</math> and <math>\mathbb{N}_0</math> respectively. These estimators are [[Biased estimator|biased]] since <math>\mathrm{E}\left(\frac{1}{\bar{x}}\right) > \frac{1}{\mathrm{E}(\bar{x})} = p</math> as a result of [[Jensen's inequality]].<ref name=":3">{{Cite book |last1=Held |first1=Leonhard |url=https://link.springer.com/10.1007/978-3-662-60792-3 |title=Likelihood and Bayesian Inference: With Applications in Biology and Medicine |last2=Sabanés Bové |first2=Daniel |date=2020 |publisher=Springer Berlin Heidelberg |isbn=978-3-662-60791-6 |series=Statistics for Biology and Health |location=Berlin, Heidelberg |language=en |doi=10.1007/978-3-662-60792-3}}</ref>{{Rp|pages=53–54}} 

=== Maximum likelihood estimation ===
The [[maximum likelihood estimator]] of <math>p</math> is the value that maximizes the [[likelihood function]] given a sample.<ref name=":5" />{{Rp|page=308}} By finding the [[Zero of a function|zero]] of the [[derivative]] of the [[Log-likelihood|log-likelihood function]] when the distribution is defined over <math>\mathbb{N}</math>, the maximum likelihood estimator can be found to be <math>\hat{p} = \frac{1}{\bar{x}}</math>, where <math>\bar{x}</math> is the sample mean.<ref>{{Cite web |last=Siegrist |first=Kyle |date=2020-05-05 |title=7.3: Maximum Likelihood |url=https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Point_Estimation/7.03%3A_Maximum_Likelihood |access-date=2024-06-20 |website=Statistics LibreTexts |language=en}}</ref> If the domain is <math>\mathbb{N}_0</math>, then the estimator shifts to <math>\hat{p} = \frac{1}{\bar{x}+1}</math>. As previously discussed in [[Geometric distribution#Method of moments|§ Method of moments]], these estimators are biased.

Regardless of the domain, the bias is equal to

: <math>
    b \equiv \operatorname{E}\bigg[\;(\hat p_\mathrm{mle} - p)\;\bigg]
        = \frac{p\,(1-p)}{n} 
  </math>

which yields the [[Maximum likelihood estimation#Higher-order properties|bias-corrected maximum likelihood estimator]],{{Cn|date=July 2024}}

: <math>
    \hat{p\,}^*_\text{mle} = \hat{p\,}_\text{mle} - \hat{b\,}
  </math>

=== Bayesian inference ===
In [[Bayesian inference]], the parameter <math>p</math> is a random variable from a [[prior distribution]] with a [[posterior distribution]] calculated using [[Bayes' theorem]] after observing samples.<ref name=":3" />{{Rp|page=167}} If a [[beta distribution]] is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the [[conjugate distribution]]. In particular, if a <math>\mathrm{Beta}(\alpha,\beta)</math> prior is selected, then the posterior, after observing samples <math>k_1, \dotsc, k_n \in \mathbb{N}</math>, is<ref>{{Cite CiteSeerX |citeseerx=10.1.1.157.5540 |first=Daniel |last=Fink |title=A Compendium of Conjugate Priors}}</ref><math display="block">p \sim \mathrm{Beta}\left(\alpha+n,\ \beta+\sum_{i=1}^n (k_i-1)\right). \!</math>Alternatively, if the samples are in <math>\mathbb{N}_0</math>, the posterior distribution is<ref>{{Cite web|url=http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf |archive-url=https://web.archive.org/web/20100408092905/http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf |archive-date=2010-04-08 |url-status=live|title=3. Conjugate families of distributions}}</ref><math display="block">p \sim \mathrm{Beta}\left(\alpha+n,\beta+\sum_{i=1}^n k_i\right).</math>Since the expected value of a <math>\mathrm{Beta}(\alpha,\beta)</math> distribution is <math>\frac{\alpha}{\alpha+\beta}</math>,<ref name=":9" />{{Rp|page=145}} as <math>\alpha</math> and <math>\beta</math> approach zero, the posterior mean approaches its maximum likelihood estimate.

== Random variate generation ==
{{Further|Non-uniform random variate generation}}
The geometric distribution can be generated experimentally from [[i.i.d.]] [[Standard uniform distribution|standard uniform]] random variables by finding the first such random variable to be less than or equal to <math>p</math>. However, the number of random variables needed is also geometrically distributed and the algorithm slows as <math>p</math> decreases.<ref name=":6">{{Cite book |last=Devroye |first=Luc |url=http://link.springer.com/10.1007/978-1-4613-8643-8 |title=Non-Uniform Random Variate Generation |publisher=Springer New York |year=1986 |isbn=978-1-4613-8645-2 |location=New York, NY |language=en |doi=10.1007/978-1-4613-8643-8}}</ref>{{Rp|page=498}}

Random generation can be done in [[constant time]] by truncating [[exponential random numbers]]. An exponential random variable <math>E</math> can become geometrically distributed with parameter <math>p</math> through <math>\lceil -E/\log(1-p) \rceil</math>. In turn, <math>E</math> can be generated from a standard uniform random variable <math>U</math> altering the formula into <math>\lceil \log(U) / \log(1-p)\rceil</math>.<ref name=":6" />{{Rp|pages=499–500}}<ref>{{Cite book |last=Knuth |first=Donald Ervin |title=The Art of Computer Programming |publisher=[[Addison-Wesley]] |year=1997 |isbn=978-0-201-89683-1 |edition=3rd |volume=2 |location=Reading, Mass |pages=136 |language=en}}</ref>

== Applications ==
The geometric distribution is used in many disciplines. In [[queueing theory]], the [[M/M/1 queue]] has a steady state following a geometric distribution.<ref>{{Cite book |last=Daskin |first=Mark S. |url=https://link.springer.com/10.1007/978-3-031-02493-1 |title=Bite-Sized Operations Management |publisher=Springer International Publishing |year=2021 |isbn=978-3-031-01365-2 |series=Synthesis Lectures on Operations Research and Applications |location=Cham |page=127 |language=en |doi=10.1007/978-3-031-02493-1}}</ref> In [[Stochastic process|stochastic processes]], the Yule Furry process is geometrically distributed.<ref>{{Cite book |last1=Madhira |first1=Sivaprasad |url=https://link.springer.com/10.1007/978-981-99-5601-2 |title=Introduction to Stochastic Processes Using R |last2=Deshmukh |first2=Shailaja |publisher=Springer Nature Singapore |year=2023 |isbn=978-981-99-5600-5 |location=Singapore |page=449 |language=en |doi=10.1007/978-981-99-5601-2}}</ref> The distribution also arises when modeling the lifetime of a device in discrete contexts.<ref>{{Citation |last1=Gupta |first1=Rakesh |title=Some Discrete Parametric Markov–Chain System Models to Analyze Reliability |date=2023 |work=Advances in Reliability, Failure and Risk Analysis |pages=305–306 |editor-last=Garg |editor-first=Harish |url=https://link.springer.com/10.1007/978-981-19-9909-3_14 |access-date=2024-07-13 |place=Singapore |publisher=Springer Nature Singapore |language=en |doi=10.1007/978-981-19-9909-3_14 |isbn=978-981-19-9908-6 |last2=Gupta |first2=Shubham |last3=Ali |first3=Irfan}}</ref> It has also been used to fit data including modeling patients spreading [[COVID-19]].<ref>{{Cite journal |last=Polymenis |first=Athanase |date=2021-10-01 |title=An application of the geometric distribution for assessing the risk of infection with SARS-CoV-2 by location |url=https://www.nepjol.info/index.php/AJMS/article/view/38783 |journal=Asian Journal of Medical Sciences |volume=12 |issue=10 |pages=8–11 |doi=10.3126/ajms.v12i10.38783 |issn=2091-0576|doi-access=free }}</ref>

== See also ==

* [[Hypergeometric distribution]]
* [[Coupon collector's problem]]
* [[Compound Poisson distribution]]
* [[Negative binomial distribution]]

==References==
{{reflist}}

{{ProbDistributions|discrete-infinite}}

[[Category:Discrete distributions]]
[[Category:Exponential family distributions]]
[[Category:Infinitely divisible probability distributions]]
[[Category:Articles with example R code]]