Editing Binomial distribution (section)

== Related distributions ==

=== Sums of binomials ===
If {{math|''X'' ~ B(''n'', ''p'')}} and {{math|''Y'' ~ B(''m'', ''p'')}} are independent binomial variables with the same probability {{math|''p''}}, then {{math|''X'' + ''Y''}} is again a binomial variable; its distribution is {{math|1=''Z'' = ''X'' + ''Y'' ~ B(''n'' + ''m'', ''p'')}}:<ref>{{cite book |last1=Dekking |first1=F.M. |last2=Kraaikamp |first2=C. |last3=Lopohaa |first3=H.P. |last4=Meester |first4=L.E. |title=A Modern Introduction of Probability and Statistics |date=2005 |publisher=Springer-Verlag London |isbn=978-1-84628-168-6 |edition=1 |url=https://www.springer.com/gp/book/9781852338961}}</ref>
: <math>\begin{align}
  \operatorname P(Z=k) &= \sum_{i=0}^k\left[\binom{n}i p^i (1-p)^{n-i}\right]\left[\binom{m}{k-i} p^{k-i} (1-p)^{m-k+i}\right]\\
                       &= \binom{n+m}k p^k (1-p)^{n+m-k}
\end{align}</math>

A Binomial distributed random variable {{math|''X'' ~ B(''n'', ''p'')}} can be considered as the sum of {{math|''n''}} Bernoulli distributed random variables. So the sum of two Binomial distributed random variables {{math|''X'' ~ B(''n'', ''p'')}} and {{math|''Y'' ~ B(''m'', ''p'')}} is equivalent to the sum of {{math|''n'' + ''m''}} Bernoulli distributed random variables, which means {{math|1=''Z'' = ''X'' + ''Y'' ~ B(''n'' + ''m'', ''p'')}}. This can also be proven directly using the addition rule.

However, if {{math|''X''}} and {{math|''Y''}} do not have the same probability {{math|''p''}}, then the variance of the sum will be [[Binomial sum variance inequality|smaller than the variance of a binomial variable]] distributed as {{math|B(''n'' + ''m'', {{overline|''p''}})}}.

=== Poisson binomial distribution ===
The binomial distribution is a special case of the [[Poisson binomial distribution]], which is the distribution of a sum of {{math|''n''}} independent non-identical [[Bernoulli trials]] {{math|B(''p''<sub>''i''</sub>)}}.<ref>
{{cite journal
 | volume = 3
 | issue = 2
 | pages = 295–312
 | last = Wang
 | first = Y. H.
 | title = On the number of successes in independent trials
 | journal = Statistica Sinica
 | year = 1993
 | url = http://www3.stat.sinica.edu.tw/statistica/oldpdf/A3n23.pdf
 | url-status = dead
 | archive-url = https://web.archive.org/web/20160303182353/http://www3.stat.sinica.edu.tw/statistica/oldpdf/A3n23.pdf
 | archive-date = 2016-03-03
}}
</ref>

=== Ratio of two binomial distributions ===

This result was first derived by Katz and coauthors in 1978.<ref name=Katz1978>{{cite journal |last1=Katz |first1=D. |display-authors=1 |first2=J. |last2=Baptista |first3=S. P. |last3=Azen |first4=M. C. |last4=Pike |year=1978 |title=Obtaining confidence intervals for the risk ratio in cohort studies |journal=Biometrics |volume=34 |issue=3 |pages=469–474 |doi=10.2307/2530610 |jstor=2530610 }}</ref>

Let {{nowrap|''X'' ~ B(''n'', ''p''<sub>1</sub>)}} and {{nowrap|''Y'' ~ B(''m'', ''p''<sub>2</sub>)}} be independent. Let {{nowrap|1=''T'' = (''X''/''n'') / (''Y''/''m'')}}.

Then log(''T'') is approximately normally distributed with mean log(''p''<sub>1</sub>/''p''<sub>2</sub>) and variance {{nowrap|((1/''p''<sub>1</sub>) − 1)/''n'' + ((1/''p''<sub>2</sub>) − 1)/''m''}}.

=== Conditional binomials ===
If ''X''&nbsp;~&nbsp;B(''n'',&nbsp;''p'') and ''Y''&nbsp;|&nbsp;''X''&nbsp;~&nbsp;B(''X'',&nbsp;''q'') (the conditional distribution of ''Y'', given&nbsp;''X''), then ''Y'' is a simple binomial random variable with distribution ''Y''&nbsp;~&nbsp;B(''n'',&nbsp;''pq'').

For example, imagine throwing ''n'' balls to a basket ''U<sub>X</sub>'' and taking the balls that hit and throwing them to another basket ''U<sub>Y</sub>''. If ''p'' is the probability to hit ''U<sub>X</sub>'' then ''X''&nbsp;~&nbsp;B(''n'',&nbsp;''p'') is the number of balls that hit ''U<sub>X</sub>''. If ''q'' is the probability to hit ''U<sub>Y</sub>'' then the number of balls that hit ''U<sub>Y</sub>'' is ''Y''&nbsp;~&nbsp;B(''X'',&nbsp;''q'') and therefore ''Y''&nbsp;~&nbsp;B(''n'',&nbsp;''pq'').

{{hidden begin|style=width:60%|ta1=center|border=1px #aaa solid|title=[Proof]}}
Since <math> X \sim B(n, p) </math> and <math> Y \sim B(X, q) </math>, by the [[law of total probability]],
: <math>\begin{align}
   \Pr[Y = m] &= \sum_{k = m}^{n} \Pr[Y = m \mid X = k] \Pr[X = k] \\[2pt]
   &= \sum_{k=m}^n \binom{n}{k} \binom{k}{m} p^k q^m (1-p)^{n-k} (1-q)^{k-m}
 \end{align}</math>
Since <math>\tbinom{n}{k} \tbinom{k}{m} = \tbinom{n}{m} \tbinom{n-m}{k-m},</math> the equation above can be expressed as
: <math> \Pr[Y = m] = \sum_{k=m}^{n} \binom{n}{m} \binom{n-m}{k-m} p^k q^m (1-p)^{n-k} (1-q)^{k-m} </math>
Factoring <math> p^k = p^m p^{k-m} </math> and pulling all the terms that don't depend on <math> k </math> out of the sum now yields
: <math>\begin{align}
   \Pr[Y = m] &= \binom{n}{m} p^m q^m \left( \sum_{k=m}^n \binom{n-m}{k-m} p^{k-m} (1-p)^{n-k} (1-q)^{k-m} \right) \\[2pt]
   &= \binom{n}{m} (pq)^m \left( \sum_{k=m}^n \binom{n-m}{k-m} \left(p(1-q)\right)^{k-m} (1-p)^{n-k}  \right)
 \end{align}</math>
After substituting <math> i = k - m </math> in the expression above, we get
: <math> \Pr[Y = m] = \binom{n}{m} (pq)^m \left( \sum_{i=0}^{n-m} \binom{n-m}{i} (p - pq)^i (1-p)^{n-m - i} \right) </math>
Notice that the sum (in the parentheses) above equals <math> (p - pq + 1 - p)^{n-m} </math> by the [[binomial theorem]]. Substituting this in finally yields
: <math>\begin{align}
   \Pr[Y=m] &=  \binom{n}{m} (pq)^m (p - pq + 1 - p)^{n-m}\\[4pt]
   &= \binom{n}{m} (pq)^m (1-pq)^{n-m}
 \end{align}</math>
and thus <math> Y \sim B(n, pq) </math> as desired.
{{hidden end}}

=== Bernoulli distribution ===
The [[Bernoulli distribution]] is a special case of the binomial distribution, where {{math|1=''n'' = 1}}. Symbolically, {{math|''X'' ~ B(1, ''p'')}} has the same meaning as {{math|''X'' ~ Bernoulli(''p'')}}. Conversely, any binomial distribution, {{math|B(''n'', ''p'')}}, is the distribution of the sum of {{math|''n''}} independent [[Bernoulli trials]], {{math|Bernoulli(''p'')}}, each with the same probability {{math|''p''}}.<ref>{{cite web|last1=Taboga|first1=Marco|title=Lectures on Probability Theory and Mathematical Statistics|url=https://www.statlect.com/probability-distributions/binomial-distribution#hid3|website=statlect.com|access-date=18 December 2017}}</ref>

=== Normal approximation ===
{{see also|Binomial proportion confidence interval#Normal approximation interval}}

[[File:Binomial Distribution.svg|right|250px|thumb|Binomial [[probability mass function]] and normal [[probability density function]] approximation for {{math|1=''n'' = 6}} and {{math|1=''p'' = 0.5}}]]

If {{math|''n''}} is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to {{math|B(''n'', ''p'')}} is given by the [[normal distribution]]
: <math> \mathcal{N}(np,\,np(1-p)),</math>
and this basic approximation can be improved in a simple way by using a suitable [[continuity correction]].
The basic approximation generally improves as {{math|''n''}} increases (at least 20) and is better when {{math|''p''}} is not near to 0 or 1.<ref name="bhh">{{cite book|title=Statistics for experimenters|url=https://archive.org/details/statisticsforexp00geor|url-access=registration|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|page=[https://archive.org/details/statisticsforexp00geor/page/130 130]|isbn=9780471093152}}</ref> Various [[Rule of thumb|rules of thumb]] may be used to decide whether {{math|''n''}} is large enough, and {{math|''p''}} is far enough from the extremes of zero or one:
* One rule<ref name="bhh"/> is that for {{math|''n'' > 5}} the normal approximation is adequate if the absolute value of the skewness is strictly less than 0.3; that is, if
*: <math>\frac{|1-2p|}{\sqrt{np(1-p)}}=\frac1{\sqrt{n}}\left|\sqrt{\frac{1-p}p}-\sqrt{\frac{p}{1-p}}\,\right|<0.3.</math>
This can be made precise using the [[Berry–Esseen theorem]].
* A stronger rule states that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values; that is, only if
*: <math>\mu\pm3\sigma=np\pm3\sqrt{np(1-p)}\in(0,n).</math>
: This 3-standard-deviation rule is equivalent to the following conditions, which also imply the first rule above.
:: <math>n>9 \left(\frac{1-p}{p} \right)\quad\text{and}\quad n>9\left(\frac{p}{1-p}\right).</math>
{{hidden begin|style=width:66%|ta1=center|border=1px #aaa solid|title=[Proof]}}
The rule <math> np\pm3\sqrt{np(1-p)}\in(0,n)</math> is totally equivalent to request that
: <math>np-3\sqrt{np(1-p)}>0\quad\text{and}\quad np+3\sqrt{np(1-p)}<n.</math>
Moving terms around yields:
: <math>np>3\sqrt{np(1-p)}\quad\text{and}\quad n(1-p)>3\sqrt{np(1-p)}.</math>
Since <math>0<p<1</math>, we can apply the square power and divide by the respective factors <math>np^2</math> and <math>n(1-p)^2</math>, to obtain the desired conditions:
: <math>n>9 \left(\frac{1-p}p\right) \quad\text{and}\quad n>9 \left(\frac{p}{1-p}\right).</math>
Notice that these conditions automatically imply that <math>n>9</math>. On the other hand, apply again the square root and divide by 3,
: <math>\frac{\sqrt{n}}3>\sqrt{\frac{1-p}p}>0 \quad \text{and} \quad \frac{\sqrt{n}}3 > \sqrt{\frac{p}{1-p}}>0.</math>
Subtracting the second set of inequalities from the first one yields:
: <math>\frac{\sqrt{n}}3>\sqrt{\frac{1-p}p}-\sqrt{\frac{p}{1-p}}>-\frac{\sqrt{n}}3;</math>
and so, the desired first rule is satisfied,
: <math>\left|\sqrt{\frac{1-p}p}-\sqrt{\frac{p}{1-p}}\,\right|<\frac{\sqrt{n}}3.</math>
{{hidden end}}
* Another commonly used rule is that both values {{math|''np''}} and {{math|''n''(1 − ''p'')}} must be greater than<ref>{{Cite book |last=Chen |first=Zac |title=H2 Mathematics Handbook |publisher=Educational Publishing House |year=2011 |isbn=9789814288484 |edition=1 |location=Singapore |pages=350}}</ref><ref>{{Cite web |date=2023-05-29 |title=6.4: Normal Approximation to the Binomial Distribution - Statistics LibreTexts |url=https://stats.libretexts.org/Courses/Las_Positas_College/Math_40:_Statistics_and_Probability/06:_Continuous_Random_Variables_and_the_Normal_Distribution/6.04:_Normal_Approximation_to_the_Binomial_Distribution |access-date=2023-10-07 |archive-date=2023-05-29 |archive-url=https://web.archive.org/web/20230529211919/https://stats.libretexts.org/Courses/Las_Positas_College/Math_40:_Statistics_and_Probability/06:_Continuous_Random_Variables_and_the_Normal_Distribution/6.04:_Normal_Approximation_to_the_Binomial_Distribution |url-status=bot: unknown }}</ref> or equal to&nbsp;5. However, the specific number varies from source to source, and depends on how good an approximation one wants. In particular, if one uses&nbsp;9 instead of&nbsp;5, the rule implies the results stated in the previous paragraphs.
{{hidden begin|style=width:66%|ta1=center|border=1px #aaa solid|title=[Proof]}}
Assume that both values <math>np</math> and <math>n(1-p)</math> are greater than&nbsp;9. Since <math>0< p<1</math>, we easily have that 
: <math>np\geq9>9(1-p)\quad\text{and}\quad n(1-p)\geq9>9p.</math>
We only have to divide now by the respective factors <math>p</math> and <math>1-p</math>, to deduce the alternative form of the 3-standard-deviation rule:
: <math>n>9 \left(\frac{1-p}p\right) \quad\text{and}\quad n>9 \left(\frac{p}{1-p}\right).</math>
{{hidden end}}

The following is an example of applying a [[continuity correction]]. Suppose one wishes to calculate {{math|Pr(''X'' ≤ 8)}} for a binomial random variable {{math|''X''}}. If {{math|''Y''}} has a distribution given by the normal approximation, then {{math|Pr(''X'' ≤ 8)}} is approximated by {{math|Pr(''Y'' ≤ 8.5)}}. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as [[de Moivre–Laplace theorem]], is a huge time-saver when undertaking calculations by hand (exact calculations with large {{math|''n''}} are very onerous); historically, it was the first use of the normal distribution, introduced in [[Abraham de Moivre]]'s book ''[[The Doctrine of Chances]]'' in 1738. Nowadays, it can be seen as a consequence of the [[central limit theorem]] since {{math|B(''n'', ''p'')}} is a sum of {{math|''n''}} independent, identically distributed [[Bernoulli distribution|Bernoulli variables]] with parameter&nbsp;{{math|''p''}}. This fact is the basis of a [[hypothesis test]], a "proportion z-test", for the value of {{math|''p''}} using {{math|''x''/''n''}}, the sample proportion and estimator of {{math|''p''}}, in a [[common test statistics|common test statistic]].<ref>[[NIST]]/[[SEMATECH]], [http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm "7.2.4. Does the proportion of defectives meet requirements?"] ''e-Handbook of Statistical Methods.''</ref>

For example, suppose one randomly samples {{math|''n''}} people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of ''n'' people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion ''p'' of agreement in the population and with standard deviation
: <math>\sigma = \sqrt{\frac{p(1-p)}{n}}</math>

=== Poisson approximation ===

The binomial distribution converges towards the [[Poisson distribution]] as the number of trials goes to infinity while the product {{math|''np''}} converges to a finite limit. Therefore, the Poisson distribution with parameter {{math|1=''λ'' = ''np''}} can be used as an approximation to {{math|B(''n'', ''p'')}} of the binomial distribution if {{math|''n''}} is sufficiently large and {{math|''p''}} is sufficiently small.  According to rules of thumb, this approximation is good if {{math|''n'' ≥ 20}} and {{math|''p'' ≤ 0.05}}<ref>{{cite news |date=2023-03-28 |title=12.4 – Approximating the Binomial Distribution {{!}} STAT 414 |newspaper=Pennstate: Statistics Online Courses |url=https://online.stat.psu.edu/stat414/lesson/12/12.4 |access-date=2023-10-08 |archive-date=2023-03-28 |archive-url=https://web.archive.org/web/20230328081322/https://online.stat.psu.edu/stat414/lesson/12/12.4 |url-status=bot: unknown }}</ref> such that {{math|''np'' ≤ 1}}, or if {{math|''n'' > 50}} and {{math|''p'' < 0.1}} such that {{math|''np'' < 5}},<ref>{{Cite book |last=Chen |first=Zac |title=H2 mathematics handbook |publisher=Educational publishing house |year=2011 |isbn=9789814288484 |edition=1 |location=Singapore |pages=348}}</ref> or if {{math|''n'' ≥ 100}} and {{math|''np'' ≤ 10}}.<ref name="nist">[[NIST]]/[[SEMATECH]], [http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm "6.3.3.1. Counts Control Charts"], ''e-Handbook of Statistical Methods.''</ref><ref>{{Cite web |date=2023-03-13 |title=The Connection Between the Poisson and Binomial Distributions |url=https://mathcenter.oxford.emory.edu/site/math117/connectingPoissonAndBinomial/ |access-date=2023-10-08 |archive-date=2023-03-13 |archive-url=https://web.archive.org/web/20230313085931/https://mathcenter.oxford.emory.edu/site/math117/connectingPoissonAndBinomial/ |url-status=bot: unknown }}</ref>

Concerning the accuracy of Poisson approximation, see Novak,<ref>Novak S.Y. (2011) Extreme value methods with applications to finance. London: CRC/ Chapman & Hall/Taylor & Francis. {{ISBN|9781-43983-5746}}.</ref> ch. 4, and references therein.

=== Limiting distributions ===

* ''[[Poisson limit theorem]]'': As {{math|''n''}} approaches {{math|∞}} and {{math|''p''}} approaches 0 with the product {{math|''np''}} held fixed, the {{math|Binomial(''n'', ''p'')}} distribution approaches the [[Poisson distribution]] with [[expected value]] {{math|1=''λ'' = ''np''}}.<ref name="nist"/>
* ''[[de Moivre–Laplace theorem]]'': As {{math|''n''}} approaches {{math|∞}} while {{math|''p''}} remains fixed, the distribution of
*: <math>\frac{X-np}{\sqrt{np(1-p)}}</math>
: approaches the [[normal distribution]] with expected value&nbsp;0 and [[variance]]&nbsp;1. This result is sometimes loosely stated by saying that the distribution of {{math|''X''}} is [[Asymptotic normality|asymptotically normal]] with expected value&nbsp;0 and [[variance]]&nbsp;1. This result is a specific case of the [[central limit theorem]].

=== Beta distribution ===

The binomial distribution and beta distribution are different views of the same model of repeated Bernoulli trials. The binomial distribution is the [[Probability mass function|PMF]] of {{mvar|k}} successes given {{mvar|n}} independent events each with a probability {{mvar|p}} of success. 
Mathematically, when {{math|1=''α'' = ''k'' + 1}} and {{math|1=''β'' = ''n'' &minus; ''k'' + 1}}, the beta distribution and the binomial distribution are related by{{clarification needed|date=March 2023|
reason=Is the left hand side referring to a probability density, and the right hand side to a probability mass function? Clearly a beta distributed random variable can not be a scalar multiple of a binomial random variable given that the former is continuous and the latter discrete. In any case, it would seem to be more correct to say that this relationship means that the PDF of one is related to the PMF of the other, rather than appearing to say that the _distributions_ (often interchangeable with their CDFs) are directly related to one another.
}} a factor of {{math|''n'' + 1}}:
: <math>\operatorname{Beta}(p;\alpha;\beta) = (n+1)B(k;n;p)</math>

[[Beta distribution]]s also provide a family of [[prior distribution|prior probability distribution]]s for binomial distributions in [[Bayesian inference]]:<ref name=MacKay>{{cite book| last=MacKay| first=David| title = Information Theory, Inference and Learning Algorithms|year=2003| publisher=Cambridge University Press; First Edition |isbn=978-0521642989}}</ref> 
: <math>P(p;\alpha,\beta) = \frac{p^{\alpha-1}(1-p)^{\beta-1}}{\operatorname{Beta}(\alpha,\beta)}.</math>
Given a uniform prior, the posterior distribution for the probability of success {{mvar|p}} given {{mvar|n}} independent events with {{mvar|k}} observed successes is a beta distribution.<ref>{{Cite web|url=https://www.statlect.com/probability-distributions/beta-distribution|title = Beta distribution}}</ref>