Editing Geometric distribution (section)

==Entropy and Fisher's Information==

===Entropy (Geometric Distribution, Failures Before Success)===
Entropy is a measure of uncertainty in a probability distribution. For the geometric distribution that models the number of failures before the first success, the probability mass function is:

:<math>P(X = k) = (1 - p)^k p, \quad k = 0, 1, 2, \dots</math>

The entropy <math>H(X)</math> for this distribution is defined as:

:<math>\begin{align}
H(X) &= - \sum_{k=0}^{\infty} P(X = k) \ln P(X = k) \\
     &= - \sum_{k=0}^{\infty} (1 - p)^k p \ln \left( (1 - p)^k p \right) \\
     &= - \sum_{k=0}^{\infty} (1 - p)^k p \left[ k \ln(1 - p) + \ln p \right] \\
     &= -\log p - \frac{1 - p}{p} \log(1 - p)
\end{align}</math>

The entropy increases as the probability <math>p</math> decreases, reflecting greater uncertainty as success becomes rarer.

===Fisher's Information (Geometric Distribution, Failures Before Success)===
Fisher information measures the amount of information that an observable random variable <math>X</math> carries about an unknown parameter <math>p</math>. For the geometric distribution (failures before the first success), the Fisher information with respect to <math>p</math> is given by:

:<math>I(p) = \frac{1}{p^2(1 - p)}</math>

'''Proof:'''

*The '''Likelihood Function''' for a geometric random variable <math>X</math> is:

:<math>L(p; X) = (1 - p)^X p</math>

*The '''Log-Likelihood Function''' is:

:<math>\ln L(p; X) = X \ln(1 - p) + \ln p</math>

*The Score Function (first derivative of the log-likelihood w.r.t. <math>p</math>) is:

:<math>\frac{\partial}{\partial p} \ln L(p; X) = \frac{1}{p} - \frac{X}{1 - p}</math>

*The second derivative of the log-likelihood function is:

:<math>\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{1}{p^2} - \frac{X}{(1 - p)^2}</math>

*'''Fisher Information''' is calculated as the negative expected value of the second derivative:

:<math>\begin{align}
I(p) &= -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] \\
     &= - \left(-\frac{1}{p^2} - \frac{1 - p}{p (1 - p)^2} \right) \\
     &= \frac{1}{p^2(1 - p)}
\end{align}</math>

Fisher information increases as <math>p</math> decreases, indicating that rarer successes provide more information about the parameter <math>p</math>.

===Entropy (Geometric Distribution, Trials Until Success)===
For the geometric distribution modeling the number of trials until the first success, the probability mass function is:

:<math>P(X = k) = (1 - p)^{k - 1} p, \quad k = 1, 2, 3, \dots</math>

The entropy <math>H(X)</math> for this distribution is given by:

:<math>\begin{align}
H(X) &= - \sum_{k=1}^{\infty} P(X = k) \ln P(X = k) \\
     &= - \sum_{k=1}^{\infty} (1 - p)^{k - 1} p \ln \left( (1 - p)^{k - 1} p \right) \\
     &= - \sum_{k=1}^{\infty} (1 - p)^{k - 1} p \left[ (k - 1) \ln(1 - p) + \ln p \right] \\
     &= - \log p + \frac{1 - p}{p} \log(1 - p)
\end{align}</math>

Entropy increases as <math>p</math> decreases, reflecting greater uncertainty as the probability of success in each trial becomes smaller.

===Fisher's Information (Geometric Distribution, Trials Until Success)===
Fisher information for the geometric distribution modeling the number of trials until the first success is given by:

:<math>I(p) = \frac{1}{p^2(1 - p)}</math>

'''Proof:'''

*The '''Likelihood Function''' for a geometric random variable <math>X</math> is:

:<math>L(p; X) = (1 - p)^{X - 1} p</math>

*The '''Log-Likelihood Function''' is:

:<math>\ln L(p; X) = (X - 1) \ln(1 - p) + \ln p</math>

*The Score Function (first derivative of the log-likelihood w.r.t. <math>p</math>) is:

:<math>\frac{\partial}{\partial p} \ln L(p; X) = \frac{1}{p} - \frac{X - 1}{1 - p}</math>

*The second derivative of the log-likelihood function is:

:<math>\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{1}{p^2} - \frac{X - 1}{(1 - p)^2}</math>

*'''Fisher Information''' is calculated as the negative expected value of the second derivative:

:<math>\begin{align}
I(p) &= -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] \\
     &= - \left(-\frac{1}{p^2} - \frac{1 - p}{p (1 - p)^2} \right) \\
     &= \frac{1}{p^2(1 - p)}
\end{align}</math>

=== General properties ===

* The [[probability-generating function|probability generating function]]s of geometric random variables <math>
X
</math> and <math>
Y
</math> defined over <math>
\mathbb{N}
</math> and <math>
\mathbb{N}_0
</math> are, respectively,<ref name=":0" />{{Rp|pages=114–115}}
::<math>\begin{align}
G_X(s) & = \frac{s\,p}{1-s\,(1-p)}, \\[10pt]
G_Y(s) & = \frac{p}{1-s\,(1-p)}, \quad |s| < (1-p)^{-1}.
\end{align}</math>

* The [[Characteristic function (probability theory)|characteristic function]] <math>\varphi(t)</math> is equal to <math>G(e^{it})</math> so the geometric distribution's characteristic function, when defined over <math>
\mathbb{N}
</math> and <math>
\mathbb{N}_0
</math> respectively, is<ref name=":9">{{Cite book |url=http://link.springer.com/10.1007/978-3-642-04898-2 |title=International Encyclopedia of Statistical Science |publisher=Springer Berlin Heidelberg |year=2011 |isbn=978-3-642-04897-5 |editor-last=Lovric |editor-first=Miodrag |edition=1st |location=Berlin, Heidelberg |language=en |doi=10.1007/978-3-642-04898-2}}</ref>{{Rp|page=1630}}<math display="block">\begin{align}
\varphi_X(t) &= \frac{pe^{it}}{1-(1-p)e^{it}},\\[10pt]
\varphi_Y(t) &= \frac{p}{1-(1-p)e^{it}}.
\end{align}</math>
* The [[Entropy (information theory)|entropy]] of a geometric distribution with parameter <math>p</math> is<ref name=":7" /><math display="block">-\frac{p \log_2 p + (1-p) \log_2 (1-p)}{p}</math>
* Given a [[mean]], the geometric distribution is the [[maximum entropy probability distribution]] of all discrete probability distributions. The corresponding continuous distribution is the [[exponential distribution]].<ref>{{Cite journal |last1=Lisman |first1=J. H. C. |last2=Zuylen |first2=M. C. A. van |date=March 1972 |title=Note on the generation of most probable frequency distributions |url=https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9574.1972.tb00152.x |journal=[[Statistica Neerlandica]] |language=en |volume=26 |issue=1 |pages=19–23 |doi=10.1111/j.1467-9574.1972.tb00152.x |issn=0039-0402}}</ref>
* The geometric distribution defined on <math>
\mathbb{N}_0
</math> is [[infinite divisibility (probability)|infinitely divisible]], that is, for any positive integer <math>n</math>, there exist <math>n</math> independent identically distributed random variables whose sum is also geometrically distributed. This is because the negative binomial distribution can be derived from a Poisson-stopped sum of [[Logarithmic distribution|logarithmic random variables]].<ref name=":9" />{{Rp|pages=606–607}}
* The decimal digits of the geometrically distributed random variable ''Y'' are a sequence of [[statistical independence|independent]] (and ''not'' identically distributed) random variables.{{citation needed|date=May 2012}}  For example, the <!-- "hundreds" is correct; "hundredth" is wrong -->hundreds<!-- "hundreds" is correct; "hundredth" is wrong --> digit ''D'' has this probability distribution:

::<math>\Pr(D=d) = {q^{100d} \over 1 + q^{100} + q^{200} + \cdots + q^{900}},</math>
:where ''q''&nbsp;=&nbsp;1&nbsp;&minus;&nbsp;''p'', and similarly for the other digits, and, more generally, similarly for [[numeral system]]s with other bases than 10.  When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are [[indecomposable distribution|indecomposable]].

* [[Golomb coding]] is the optimal [[prefix code]]{{clarify|date=May 2012}} for the geometric discrete distribution.<ref name=":7">{{Cite journal|last1=Gallager|first1=R.|last2=van Voorhis|first2=D.|date=March 1975|title=Optimal source codes for geometrically distributed integer alphabets (Corresp.)|journal=IEEE Transactions on Information Theory|volume=21|issue=2|pages=228–230|doi=10.1109/TIT.1975.1055357|issn=0018-9448}}</ref>