Editing Binomial distribution (section)

== Statistical inference ==
=== Estimation of parameters ===
{{see also|Beta distribution#Bayesian inference}}

When {{math|''n''}} is known, the parameter {{math|''p''}} can be estimated using the proportion of successes:
: <math> \widehat{p} = \frac{x}{n}.</math>
This estimator is found using [[maximum likelihood estimator]] and also the [[method of moments (statistics)|method of moments]]. This estimator is [[Bias of an estimator|unbiased]] and uniformly with [[Minimum-variance unbiased estimator|minimum variance]], proven using [[Lehmann–Scheffé theorem]], since it is based on a [[minimal sufficient]] and [[Completeness (statistics)|complete]] statistic (i.e.: {{math|''x''}}). It is also [[Consistent estimator|consistent]] both in probability and in [[Mean squared error|MSE]]. This statistic is [[Asymptotic distribution|asymptotically]] [[normal distribution|normal]] thanks to the [[central limit theorem]], because it is the same as taking the [[arithmetic mean|mean]] over Bernoulli samples.  It has a variance of <math> var(\widehat{p}) = \frac{p(1-p)}{n}</math>, a property which is used in various ways, such as in [[Binomial_proportion_confidence_interval#Wald_interval|Wald's confidence intervals]].

A closed form [[Bayes estimator]] for {{math|''p''}} also exists when using the [[Beta distribution]] as a [[Conjugate prior|conjugate]] [[prior distribution]]. When using a general <math>\operatorname{Beta}(\alpha, \beta)</math> as a prior, the [[Bayes estimator#Posterior mean|posterior mean]] estimator is:
: <math> \widehat{p}_b = \frac{x+\alpha}{n+\alpha+\beta}.</math>
The Bayes estimator is [[Asymptotic efficiency (Bayes)|asymptotically efficient]] and as the sample size approaches infinity ({{math|''n'' → ∞}}), it approaches the [[Maximum likelihood estimation|MLE]] solution.<ref>{{Cite journal |last=Wilcox |first=Rand R. |date=1979 |title=Estimating the Parameters of the Beta-Binomial Distribution |url=http://journals.sagepub.com/doi/10.1177/001316447903900302 |journal=Educational and Psychological Measurement |language=en |volume=39 |issue=3 |pages=527–535 |doi=10.1177/001316447903900302 |s2cid=121331083 |issn=0013-1644|url-access=subscription }}</ref> The Bayes estimator is [[Bias of an estimator|biased]] (how much depends on the priors),  [[Bayes estimator#Admissibility|admissible]] and [[Consistent estimator|consistent]] in probability. Using the Bayesian estimator with the Beta distribution can be used with [[Thompson sampling]].

For the special case of using the [[standard uniform distribution]] as a [[non-informative prior]], <math>\operatorname{Beta}(\alpha=1, \beta=1) = U(0,1)</math>, the posterior mean estimator becomes:
:<math> \widehat{p}_b = \frac{x+1}{n+2}.</math>
(A [[Bayes estimator#Posterior mode|posterior mode]] should just lead to the standard estimator.) This method is called the [[rule of succession]], which was introduced in the 18th century by [[Pierre-Simon Laplace]].

When relying on [[Jeffreys prior]], the prior is <math>\operatorname{Beta}(\alpha=\frac{1}{2}, \beta=\frac{1}{2})</math>,<ref>Marko Lalovic (https://stats.stackexchange.com/users/105848/marko-lalovic), Jeffreys prior for binomial likelihood, URL (version: 2019-03-04): https://stats.stackexchange.com/q/275608</ref> which leads to the estimator:
: <math> \widehat{p}_{Jeffreys} = \frac{x+\frac{1}{2}}{n+1}.</math>

When estimating {{math|''p''}} with very rare events and a small {{math|''n''}} (e.g.: if {{math|1=''x'' = 0}}), then using the standard estimator leads to <math> \widehat{p} = 0,</math> which sometimes is unrealistic and undesirable. In such cases there are various alternative estimators.<ref>{{cite journal |last=Razzaghi |first=Mehdi |title=On the estimation of binomial success probability with zero occurrence in sample |journal=Journal of Modern Applied Statistical Methods |volume=1 |issue=2 |year=2002 |pages=326–332 |doi=10.22237/jmasm/1036110000 |doi-access=free }}</ref> One way is to use the Bayes estimator <math> \widehat{p}_b</math>, leading to:
: <math> \widehat{p}_b = \frac{1}{n+2}.</math>
Another method is to use the upper bound of the [[confidence interval]] obtained using the [[Rule of three (statistics)|rule of three]]:
: <math> \widehat{p}_{\text{rule of 3}} = \frac{3}{n}.</math>

=== Confidence intervals for the parameter p ===
{{Main|Binomial proportion confidence interval}}
{{see also|Z-test#Comparing the Proportions of Two Binomials}}

Even for quite large values of ''n'', the actual distribution of the mean is significantly nonnormal.<ref name=Brown2001>{{Citation |first1=Lawrence D. |last1=Brown |first2=T. Tony |last2=Cai |first3=Anirban |last3=DasGupta |year=2001 |title = Interval Estimation for a Binomial Proportion |url=http://www-stat.wharton.upenn.edu/~tcai/paper/html/Binomial-StatSci.html |journal=Statistical Science |volume=16 |issue=2 |pages=101–133 |access-date = 2015-01-05 |doi=10.1214/ss/1009213286|citeseerx=10.1.1.323.7752 }}</ref> Because of this problem several methods to estimate confidence intervals have been proposed.

In the equations for confidence intervals below, the variables have the following meaning:
* ''n''<sub>1</sub> is the number of successes out of ''n'', the total number of trials
* <math> \widehat{p\,} = \frac{n_1}{n}</math> is the proportion of successes
* <math>z</math> is the <math>1 - \tfrac{1}{2}\alpha</math> [[quantile]] of a [[standard normal distribution]] (i.e., [[probit]]) corresponding to the target error rate <math>\alpha</math>. For example, for a 95% [[confidence level]] the error <math>\alpha</math>&nbsp;=&nbsp;0.05, so <math>1 - \tfrac{1}{2}\alpha</math>&nbsp;=&nbsp;0.975 and <math>z</math>&nbsp;=&nbsp;1.96.

==== Wald method ====
{{Main|Binomial proportion confidence interval#Wald interval}}
: <math> \widehat{p\,} \pm z \sqrt{ \frac{ \widehat{p\,} ( 1 -\widehat{p\,} )}{ n } } .</math>

A [[continuity correction]] of {{math|0.5/''n''}} may be added.{{clarify|date=July 2012}}

==== Agresti–Coull method ====
{{Main|Binomial proportion confidence interval#Agresti–Coull interval}}
<ref name=Agresti1988>{{Citation |last1=Agresti |first1=Alan |last2=Coull |first2=Brent A. |date=May 1998 |title=Approximate is better than 'exact' for interval estimation of binomial proportions |url = http://www.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf |journal=The American Statistician |volume=52 |issue=2 |pages=119–126 |access-date=2015-01-05 |doi=10.2307/2685469 |jstor=2685469 }}</ref>
: <math> \tilde{p} \pm z \sqrt{ \frac{ \tilde{p} ( 1 - \tilde{p} )}{ n + z^2 } }</math>

Here the estimate of {{math|''p''}} is modified to
: <math> \tilde{p}= \frac{ n_1 + \frac{1}{2} z^2}{ n + z^2 } </math>

This method works well for {{math|''n'' > 10}} and {{math|''n''<sub>1</sub> ≠ 0, ''n''}}.<ref>{{cite web|last1=Gulotta|first1=Joseph|title=Agresti-Coull Interval Method|url=https://pellucid.atlassian.net/wiki/spaces/PEL/pages/25722894/Agresti-Coull+Interval+Method#:~:text=The%20Agresti%2DCoull%20Interval%20Method,%2C%20or%20per%20100%2C000%2C%20etc|website=pellucid.atlassian.net|access-date=18 May 2021}}</ref> See here for <math>n\leq 10</math>.<ref>{{cite web|title=Confidence intervals|url=https://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm|website=itl.nist.gov|access-date=18 May 2021}}</ref> For {{math|1=''n''<sub>1</sub> = 0, ''n''}} use the Wilson (score) method below.

==== Arcsine method ====
{{Main|Binomial proportion confidence interval#Arcsine transformation}}
<ref name="Pires00">{{cite book |last=Pires |first=M. A. |chapter-url=https://www.math.tecnico.ulisboa.pt/~apires/PDFs/AP_COMPSTAT02.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://www.math.tecnico.ulisboa.pt/~apires/PDFs/AP_COMPSTAT02.pdf |archive-date=2022-10-09 |url-status=live |chapter=Confidence intervals for a binomial proportion: comparison of methods and software evaluation |editor-last=Klinke |editor-first=S. |editor2-last=Ahrend |editor2-first=P. |editor3-last=Richter |editor3-first=L. |title=Proceedings of the Conference CompStat 2002 |others=Short Communications and Posters |year=2002 }}</ref>
: <math>\sin^2 \left(\arcsin \left(\sqrt{\widehat{p\,}}\right) \pm \frac{z}{2\sqrt{n}} \right).</math>

==== Wilson (score) method ====
{{Main|Binomial proportion confidence interval#Wilson score interval}}

The notation in the formula below differs from the previous formulas in two respects:<ref name="Wilson1927">{{Citation |last = Wilson |first=Edwin B. |date = June 1927 |title = Probable inference, the law of succession, and statistical inference |url = http://psych.stanford.edu/~jlm/pdfs/Wison27SingleProportion.pdf |journal = Journal of the American Statistical Association |volume=22 |issue=158 |pages=209–212 |access-date= 2015-01-05 |doi = 10.2307/2276774 |url-status=dead |archive-url = https://web.archive.org/web/20150113082307/http://psych.stanford.edu/~jlm/pdfs/Wison27SingleProportion.pdf |archive-date = 2015-01-13 |jstor = 2276774 }}</ref>
* Firstly, {{math|''z''<sub>''x''</sub>}} has a slightly different interpretation in the formula below: it has its ordinary meaning of 'the {{math|''x''}}th quantile of the standard normal distribution', rather than being a shorthand for 'the {{math|(1 − ''x'')}}th quantile'.
* Secondly, this formula does not use a plus-minus to define the two bounds. Instead, one may use <math>z = z_{\alpha / 2}</math> to get the lower bound, or use <math>z = z_{1 - \alpha/2}</math> to get the upper bound. For example: for a 95% confidence level the error <math>\alpha</math>&nbsp;=&nbsp;0.05, so one gets the lower bound by using <math>z = z_{\alpha/2} = z_{0.025} = - 1.96</math>, and one gets the upper bound by using <math>z = z_{1 - \alpha/2} = z_{0.975} = 1.96</math>.
: <math>\frac{
    \widehat{p\,} + \frac{z^2}{2n} + z
    \sqrt{
        \frac{\widehat{p\,}(1 - \widehat{p\,})}{n} +
        \frac{z^2}{4 n^2}
    }
}{
    1 + \frac{z^2}{n}
}</math><ref>
{{cite book
 | chapter = Confidence intervals
 | chapter-url = http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
 | title = Engineering Statistics Handbook
 | publisher = NIST/Sematech
 | year = 2012
 | access-date = 2017-07-23
}}</ref>

==== Comparison ====
The so-called "exact" ([[Binomial proportion confidence interval#Clopper–Pearson interval|Clopper–Pearson]]) method is the most conservative.<ref name="Brown2001" />  (''Exact'' does not mean perfectly accurate;  rather, it indicates that the estimates will not be less conservative than the true value.)

The Wald method, although commonly recommended in textbooks, is the most biased.{{clarify|reason=what sense of bias is this|date=July 2012}}