Editing Geometric distribution (section)

==Statistical inference==
The true parameter <math>p</math> of an unknown geometric distribution can be inferred through estimators and conjugate distributions.

=== Method of moments ===
Provided they exist, the first <math>l</math> moments of a probability distribution can be estimated from a sample <math>x_1, \dotsc, x_n</math> using the formula<math display="block">m_i = \frac{1}{n} \sum_{j=1}^n x^i_j</math>where <math>m_i</math> is the <math>i</math>th sample moment and <math>1 \leq i \leq l</math>.<ref name=":5">{{Cite book |last1=Evans |first1=Michael |url=https://www.utstat.toronto.edu/mikevans/jeffrosenthal/ |title=Probability and Statistics: The Science of Uncertainty |last2=Rosenthal |first2=Jeffrey |year=2023 |isbn=978-1429224628 |edition=2nd |pages= |publisher=Macmillan Learning |language=en}}</ref>{{Rp|pages=349–350}} Estimating <math>\mathrm{E}(X)</math> with <math>m_1</math> gives the [[sample mean]], denoted <math>
\bar{x}
</math>. Substituting this estimate in the formula for the expected value of a geometric distribution and solving for <math>
p
</math> gives the estimators <math>
\hat{p} = \frac{1}{\bar{x}}
</math> and <math>
\hat{p} = \frac{1}{\bar{x}+1}
</math> when supported on <math>\mathbb{N}</math> and <math>\mathbb{N}_0</math> respectively. These estimators are [[Biased estimator|biased]] since <math>\mathrm{E}\left(\frac{1}{\bar{x}}\right) > \frac{1}{\mathrm{E}(\bar{x})} = p</math> as a result of [[Jensen's inequality]].<ref name=":3">{{Cite book |last1=Held |first1=Leonhard |url=https://link.springer.com/10.1007/978-3-662-60792-3 |title=Likelihood and Bayesian Inference: With Applications in Biology and Medicine |last2=Sabanés Bové |first2=Daniel |date=2020 |publisher=Springer Berlin Heidelberg |isbn=978-3-662-60791-6 |series=Statistics for Biology and Health |location=Berlin, Heidelberg |language=en |doi=10.1007/978-3-662-60792-3}}</ref>{{Rp|pages=53–54}} 

=== Maximum likelihood estimation ===
The [[maximum likelihood estimator]] of <math>p</math> is the value that maximizes the [[likelihood function]] given a sample.<ref name=":5" />{{Rp|page=308}} By finding the [[Zero of a function|zero]] of the [[derivative]] of the [[Log-likelihood|log-likelihood function]] when the distribution is defined over <math>\mathbb{N}</math>, the maximum likelihood estimator can be found to be <math>\hat{p} = \frac{1}{\bar{x}}</math>, where <math>\bar{x}</math> is the sample mean.<ref>{{Cite web |last=Siegrist |first=Kyle |date=2020-05-05 |title=7.3: Maximum Likelihood |url=https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Point_Estimation/7.03%3A_Maximum_Likelihood |access-date=2024-06-20 |website=Statistics LibreTexts |language=en}}</ref> If the domain is <math>\mathbb{N}_0</math>, then the estimator shifts to <math>\hat{p} = \frac{1}{\bar{x}+1}</math>. As previously discussed in [[Geometric distribution#Method of moments|§ Method of moments]], these estimators are biased.

Regardless of the domain, the bias is equal to

: <math>
    b \equiv \operatorname{E}\bigg[\;(\hat p_\mathrm{mle} - p)\;\bigg]
        = \frac{p\,(1-p)}{n} 
  </math>

which yields the [[Maximum likelihood estimation#Higher-order properties|bias-corrected maximum likelihood estimator]],{{Cn|date=July 2024}}

: <math>
    \hat{p\,}^*_\text{mle} = \hat{p\,}_\text{mle} - \hat{b\,}
  </math>

=== Bayesian inference ===
In [[Bayesian inference]], the parameter <math>p</math> is a random variable from a [[prior distribution]] with a [[posterior distribution]] calculated using [[Bayes' theorem]] after observing samples.<ref name=":3" />{{Rp|page=167}} If a [[beta distribution]] is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the [[conjugate distribution]]. In particular, if a <math>\mathrm{Beta}(\alpha,\beta)</math> prior is selected, then the posterior, after observing samples <math>k_1, \dotsc, k_n \in \mathbb{N}</math>, is<ref>{{Cite CiteSeerX |citeseerx=10.1.1.157.5540 |first=Daniel |last=Fink |title=A Compendium of Conjugate Priors}}</ref><math display="block">p \sim \mathrm{Beta}\left(\alpha+n,\ \beta+\sum_{i=1}^n (k_i-1)\right). \!</math>Alternatively, if the samples are in <math>\mathbb{N}_0</math>, the posterior distribution is<ref>{{Cite web|url=http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf |archive-url=https://web.archive.org/web/20100408092905/http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf |archive-date=2010-04-08 |url-status=live|title=3. Conjugate families of distributions}}</ref><math display="block">p \sim \mathrm{Beta}\left(\alpha+n,\beta+\sum_{i=1}^n k_i\right).</math>Since the expected value of a <math>\mathrm{Beta}(\alpha,\beta)</math> distribution is <math>\frac{\alpha}{\alpha+\beta}</math>,<ref name=":9" />{{Rp|page=145}} as <math>\alpha</math> and <math>\beta</math> approach zero, the posterior mean approaches its maximum likelihood estimate.