Editing Geometric distribution (section)

==Properties ==

=== Memorylessness ===
{{Main article|Memorylessness}}
The geometric distribution is the only memoryless discrete probability distribution.<ref>{{Cite book |last1=Dekking |first1=Frederik Michel |url=http://link.springer.com/10.1007/1-84628-168-7 |title=A Modern Introduction to Probability and Statistics |last2=Kraaikamp |first2=Cornelis |last3=Lopuhaä |first3=Hendrik Paul |last4=Meester |first4=Ludolf Erwin |date=2005 |publisher=Springer London |isbn=978-1-85233-896-1 |series=Springer Texts in Statistics |location=London |page=50 |language=en |doi=10.1007/1-84628-168-7}}</ref> It is the discrete version of the same property found in the [[exponential distribution]].<ref name=":8">{{Cite book |last1=Johnson |first1=Norman L. |url=https://onlinelibrary.wiley.com/doi/book/10.1002/0471715816 |title=Univariate Discrete Distributions |last2=Kemp |first2=Adrienne W.|author2-link=Adrienne W. Kemp |last3=Kotz |first3=Samuel |date=2005-08-19 |publisher=Wiley |isbn=978-0-471-27246-5 |edition=1 |series=Wiley Series in Probability and Statistics |page= |language=en |doi=10.1002/0471715816}}</ref>{{Rp|page=228}} The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success. 

Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables.<ref>{{Cite web |last=Weisstein |first=Eric W. |title=Memoryless |url=https://mathworld.wolfram.com/ |access-date=2024-07-25 |website=mathworld.wolfram.com |language=en}}</ref> Expressed in terms of [[conditional probability]], the two definitions are<math display="block">\Pr(X>m+n\mid X>n)=\Pr(X>m),</math>

and<math display="block">\Pr(Y>m+n\mid Y\geq n)=\Pr(Y>m),</math>

where <math>m</math> and <math>n</math> are [[Natural number|natural numbers]], <math>X</math> is a geometrically distributed random variable defined over <math>\mathbb{N}</math>, and <math>Y</math> is a geometrically distributed random variable defined over <math>\mathbb{N}_0</math>. Note that these definitions are not equivalent for discrete random variables; <math>Y</math> does not satisfy the first equation and <math>X</math> does not satisfy the second. 

===Moments and cumulants===
The [[expected value]] and [[variance]] of a geometrically distributed [[random variable]] <math>X</math> defined over <math>\mathbb{N}</math> is<ref name=":1" />{{Rp|page=261}}<math display="block">\operatorname{E}(X) = \frac{1}{p},
 \qquad\operatorname{var}(X) = \frac{1-p}{p^2}.</math> With a geometrically distributed random variable <math>Y</math> defined over <math>\mathbb{N}_0</math>, the expected value changes into<math display="block">\operatorname{E}(Y) = \frac{1-p} p,</math>while the variance stays the same.<ref name=":0">{{Cite book |last1=Forbes |first1=Catherine |url=https://onlinelibrary.wiley.com/doi/book/10.1002/9780470627242 |title=Statistical Distributions |last2=Evans |first2=Merran |last3=Hastings |first3=Nicholas |last4=Peacock |first4=Brian |date=2010-11-29 |publisher=Wiley |isbn=978-0-470-39063-4 |edition=1st |pages= |language=en |doi=10.1002/9780470627242}}</ref>{{Rp|pages=114–115}}

For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is <math>\frac{1}{1/6} = 6</math> and the average number of failures is <math>\frac{1 - 1/6}{1/6} = 5</math>.

The [[Moment-generating function|moment generating function]] of the geometric distribution when defined over <math>
\mathbb{N}
</math> and <math>\mathbb{N}_0</math> respectively is<ref>{{Cite book |last1=Bertsekas |first1=Dimitri P. |url=https://archive.org/details/introductiontopr0000bert_p5i9_2ndedi |title=Introduction to probability |last2=Tsitsiklis |first2=John N. |publisher=Athena Scientific |year=2008 |isbn=978-1-886529-23-6 |edition=2nd |series=Optimization and computation series |location=Belmont |page=235 |language=en}}</ref><ref name=":0" />{{Rp|page=114}}<math display="block">\begin{align}
M_X(t) &= \frac{pe^t}{1-(1-p)e^t} \\
M_Y(t) &= \frac{p}{1-(1-p)e^t}, t < -\ln(1-p)
\end{align}</math>The moments for the number of failures before the first success are given by
: <math>
\begin{align}
\mathrm{E}(Y^n) & {} =\sum_{k=0}^\infty (1-p)^k p\cdot k^n \\
& {} =p \operatorname{Li}_{-n}(1-p) & (\text{for }n \neq 0)
\end{align}
</math>

where <math> \operatorname{Li}_{-n}(1-p) </math> is the [[Polylogarithm|polylogarithm function]].<ref>{{Cite web |last=Weisstein |first=Eric W. |title=Geometric Distribution |url=https://mathworld.wolfram.com/ |access-date=2024-07-13 |website=[[MathWorld]] |language=en}}</ref>

The [[cumulant generating function]] of the geometric distribution defined over <math>\mathbb{N}_0</math> is<ref name=":8" />{{Rp|page=216}} <math display="block">K(t) = \ln p - \ln (1 - (1-p)e^t)</math>The [[cumulant]]s <math>\kappa_r</math> satisfy the recursion<math display="block">\kappa_{r+1} = q \frac{\delta\kappa_r}{\delta q}, r=1,2,\dotsc</math>where <math>q = 1-p</math>, when defined over <math>\mathbb{N}_0</math>.<ref name=":8" />{{Rp|page=216}}

==== Proof of expected value ====
Consider the expected value <math>\mathrm{E}(X)</math> of ''X'' as above, i.e. the average number of trials until a success. 
The first trial either succeeds with probability <math>p</math>, or fails with probability <math>1-p</math>. 
If it fails, the '''remaining''' mean number of trials until a success is identical to the original mean - 
this follows from the fact that all trials are independent.

From this we get the formula:

: <math>\operatorname \mathrm{E}(X) =  p + (1-p)(1 + \mathrm{E}[X]) ,</math>

which, when solved for <math> \mathrm{E}(X) </math>, gives:

: <math>\operatorname E(X) = \frac{1}{p}.</math>

The expected number of '''failures''' <math>Y</math> can be found from the [[linearity of expectation]], <math>\mathrm{E}(Y) = \mathrm{E}(X-1) = \mathrm{E}(X) - 1 = \frac 1 p - 1 = \frac{1-p}{p}</math>. It can also be shown in the following way:

: <math>
\begin{align}
\operatorname E(Y) & =p\sum_{k=0}^\infty(1-p)^k k \\
& = p (1-p) \sum_{k=0}^\infty (1-p)^{k-1} k\\
& = p (1-p) \left(-\sum_{k=0}^\infty \frac{d}{dp}\left[(1-p)^k\right]\right) \\
& = p (1-p) \left[\frac{d}{dp}\left(-\sum_{k=0}^\infty (1-p)^k\right)\right] \\
& = p(1-p)\frac{d}{dp}\left(-\frac{1}{p}\right) \\
& = \frac{1-p}{p}.
\end{align}
</math>

The interchange of summation and differentiation is justified by the fact that convergent [[power series]] [[uniform convergence|converge uniformly]] on [[compact space|compact]] subsets of the set of points where they converge.

=== Summary statistics ===
The [[mean]] of the geometric distribution is its expected value which is, as previously discussed in [[Geometric distribution#Moments and cumulants|§ Moments and cumulants]], <math>\frac{1}{p}</math> or <math>\frac{1-p}{p}</math> when defined over <math>\mathbb{N}</math> or <math>\mathbb{N}_0</math> respectively.

The [[median]] of the geometric distribution is <math>\left\lceil -\frac{\log 2}{\log(1-p)} \right\rceil</math>when defined over <math>\mathbb{N}</math><ref>{{Cite book |last=Aggarwal |first=Charu C. |url=https://link.springer.com/10.1007/978-3-031-53282-5 |title=Probability and Statistics for Machine Learning: A Textbook |publisher=Springer Nature Switzerland |year=2024 |isbn=978-3-031-53281-8 |location=Cham |page=138 |language=en |doi=10.1007/978-3-031-53282-5}}</ref> and <math>\left\lfloor-\frac{\log 2}{\log(1-p)}\right\rfloor</math> when defined over <math>\mathbb{N}_0</math>.<ref name=":2" />{{Rp|page=69}}

The [[Mode (statistics)|mode]] of the geometric distribution is the first value in the support set. This is 1 when defined over <math>\mathbb{N}</math> and 0 when defined over <math>\mathbb{N}_0</math>.<ref name=":2" />{{Rp|page=69}}

The [[skewness]] of the geometric distribution is <math>\frac{2-p}{\sqrt{1-p}}</math>.<ref name=":0" />{{Rp|pages=|page=115}}

The [[Kurtosis risk|kurtosis]] of the geometric distribution is <math>9 + \frac{p^2}{1-p}</math>.<ref name=":0" />{{Rp|pages=|page=115}} The [[excess kurtosis]] of a distribution is the difference between its kurtosis and the kurtosis of a [[normal distribution]], <math>3</math>.<ref name=":4">{{Cite book |last=Chan |first=Stanley |url=https://probability4datascience.com/ |title=Introduction to Probability for Data Science |publisher=[[Michigan Publishing]] |year=2021 |isbn=978-1-60785-747-1 |edition=1st |language=en}}</ref>{{Rp|pages=|page=217}} Therefore, the excess kurtosis of the geometric distribution is <math>6 + \frac{p^2}{1-p}</math>. Since <math>\frac{p^2}{1-p} \geq 0</math>, the excess kurtosis is always positive so the distribution is [[leptokurtic]].<ref name=":2" />{{Rp|page=69}} In other words, the tail of a geometric distribution decays faster than a Gaussian.<ref name=":4" />{{Rp|pages=|page=217}}