Editing Negative binomial distribution (section)

==Definitions==

Imagine a sequence of independent [[Bernoulli trial]]s: each trial has two potential outcomes called "success" and "failure." In each trial the probability of success is <math>p</math> and of failure is <math>1-p</math>. We observe this sequence until a predefined number <math>r</math> of successes occurs. Then the random number of observed failures, <math>X</math>, follows the '''negative binomial''' distribution:
: <math>
    X\sim\operatorname{NB}(r, p)
  </math>

===Probability mass function===

The [[probability mass function]] of the negative binomial distribution is
:<math>
    f(k; r, p) \equiv \Pr(X = k) = \binom{k+r-1}{k} (1-p)^k p^r
  </math>

where {{mvar|r}} is the number of successes, {{mvar|k}} is the number of failures, and {{mvar|p}} is the probability of success on each trial.

Here, the quantity in parentheses is the [[binomial coefficient]], and is equal to
:<math>
    \binom{k+r-1}{k} = \frac{(k+r-1)!}{(r-1)!\,(k)!} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k + r)}{k!\ \Gamma(r)}.
  </math>
Note that {{math|Γ(''r'')}} is the [[Gamma function]].

There are {{mvar|k}} failures chosen from {{math|''k'' + ''r'' − 1}} trials rather than {{math|''k'' + ''r''}} because the last of the {{math|''k'' + ''r''}} trials is by definition a success.

This quantity can alternatively be written in the following manner, explaining the name "negative binomial":

:<math>
\begin{align}
& \frac{(k+r-1)\dotsm(r)}{k!} \\[10pt]
= {} & (-1)^k \frac{\overbrace{(-r)(-r-1)(-r-2)\dotsm(-r-k+1)}^{k\text{ factors}}}{k!} = (-1)^k\binom{-r}{\phantom{-}k}.
\end{align}
</math>

Note that by the last expression and the [[binomial series]], for every {{math|0 ≤ ''p'' < 1}} and <math>q=1-p</math>,

:<math>
p^{-r} = (1-q)^{-r} = \sum_{k=0}^\infty \binom{-r}{\phantom{-}k}(-q)^k = \sum_{k=0}^\infty \binom{k+r-1}{k}q^k
</math>

hence the terms of the probability mass function indeed add up to one as below.
:<math>
\sum_{k=0}^\infty \binom{k+r-1}{k}(1-p)^kp^r = p^{-r}p^r = 1
</math>

To understand the above definition of the probability mass function, note that the probability for every specific sequence of {{mvar|r}}&nbsp;successes and {{mvar|k}}&nbsp;failures is {{math|''p''{{sup|''r''}}(1 − ''p''){{sup|''k''}}}}, because the outcomes of the {{math|''k'' + ''r''}} trials are supposed to happen [[independence (probability theory)|independently]]. Since the {{mvar|r}}-th success always comes last, it remains to choose the {{mvar|k}}&nbsp;trials with failures out of the remaining {{math|''k'' + ''r'' − 1}} trials. The above binomial coefficient, due to its combinatorial interpretation, gives precisely the number of all these sequences of length {{math|''k'' + ''r'' − 1}}.

===Cumulative distribution function===

The [[cumulative distribution function]] can be expressed in terms of the [[regularized incomplete beta function]]:<ref name="Wolfram" /><ref name="Cook" />
: <math>
    F(k; r, p) \equiv \Pr(X\le k) = I_{p}(r, k+1).
  </math>
(This formula is using the same parameterization as in the article's table, with {{mvar|r}} the number of successes, and <math>p=r/(r+\mu)</math> with <math>\mu</math> the mean.)

It can also be expressed in terms of the [[cumulative distribution function]] of the [[binomial distribution]]:<ref>Morris K W (1963),A note on direct and inverse sampling, Biometrika, 50, 544–545.</ref>
: <math>
    F(k; r, p) = F_\text{binomial}(k;n=k+r,1-p).
  </math>

===Alternative formulations===
Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable {{mvar|X}} is counting different things. These variations can be seen in the table here:
{| class="wikitable"
|
!{{mvar|X}} is counting...
!Probability mass function
!Formula
!Alternate formula
(using equivalent binomial)
!Alternate formula
(simplified using: <math display="inline">n=k+r
</math>)
!Support
|-
|1
|{{mvar|k}} failures, given {{mvar|r}} successes
|<math display="inline">f(k; r, p) \equiv \Pr(X = k) =
</math>
|<math display="inline">\binom{k+r-1}{k} p^r(1-p)^k
</math><ref>{{Cite web|url=http://www.mathworks.com/help/stats/negative-binomial-distribution.html|title=Mathworks: Negative Binomial Distribution}}</ref><ref name="Cook">{{Cite web|url=http://www.johndcook.com/negative_binomial.pdf|title=Notes on the Negative Binomial Distribution|last=Cook|first=John D.}}</ref><ref>{{Cite web|url=http://www.stat.ufl.edu/~abhisheksaha/sta4321/lect14.pdf|title=Introduction to Probability / Fundamentals of Probability: Lecture 14|last=Saha|first=Abhishek}}</ref>
|<math display="inline">\binom{k+r-1}{r-1} p^r(1-p)^k
</math><ref name="Wolfram"
/>
<ref>[[SAS Institute]], "[https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0zb2l2xnsw2ctn1qe2os5yk2l9c.htm Negative Binomial Distribution]", ''SAS(R) 9.4 Functions and CALL Routines: Reference, Fourth Edition'', SAS Institute, Cary, NC, 2016.</ref><ref name="Crawley 2012">{{cite book|url=https://books.google.com/books?id=XYDl0mlH-moC|title=The R Book|last=Crawley|first=Michael J.|publisher=Wiley|year=2012|isbn=978-1-118-44896-0}}</ref><ref name=":0">{{Cite web|url=http://www.math.ntu.edu.tw/~hchen/teaching/StatInference/notes/lecture16.pdf|title=Set theory: Section 3.2.5 – Negative Binomial Distribution}}</ref>
| rowspan="2" |<math display="inline">\binom{n-1}{k} p^r(1-p)^k
</math>
|<math>\text{for }k = 0, 1, 2, \ldots</math>
|-
|2
|{{mvar|n}} trials, given {{mvar|r}} successes
|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =
</math>
|<math display="inline">\binom{n-1}{r-1} p^r(1-p)^{n-r}
</math><ref name="Cook" /><ref name=":0" /><ref>{{Cite web|url=http://www.randomservices.org/random/bernoulli/NegativeBinomial.html|title=Randomservices.org, Chapter 10: Bernoulli Trials, Section 4: The Negative Binomial Distribution}}</ref><ref>{{Cite web|url=http://stattrek.com/probability-distributions/negative-binomial.aspx|title=Stat Trek: Negative Binomial Distribution}}</ref><ref>{{Cite web|url=http://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.5.pdf|title=Distinguishing Between Binomial, Hypergeometric and Negative Binomial Distributions|last=Wroughton|first=Jacqueline}}</ref>
|<math display="inline">\binom{n-1}{n-r} p^r(1-p)^{n-r}
</math>
| rowspan="2" |<math>\text{for }n = r, r+1, r+2, \dotsc</math>
|-
|3
|{{mvar|n}} trials, given {{mvar|r}} failures
|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =
</math>
|<math display="inline">\binom{n-1}{r-1} p^{n-r}(1-p)^{r}
</math>
|<math display="inline">\binom{n-1}{n-r} p^{n-r}(1-p)^{r}
</math>
| rowspan="2" |<math display="inline">\binom{n-1}{k} p^{k}(1-p)^{r}
</math>
|-
|4
|{{mvar|k}} successes, given {{mvar|r}} failures
|<math display="inline">f(k; r, p) \equiv \Pr(X = k) =
</math>
|<math display="inline">\binom{k+r-1}{k} p^k(1-p)^r
</math>
|<math display="inline">\binom{k+r-1}{r-1} p^k(1-p)^r
</math>
|<math>\text{for }k = 0, 1, 2, \ldots</math>
|-
| -
|{{mvar|k}} successes, given {{mvar|n}} trials
|<math display="inline">f(k; n, p) \equiv \Pr(X = k) = </math>
| colspan="3" |This is the [[binomial distribution]] not the negative binomial:  <math display="inline">\binom{n}{k} p^k(1-p)^{n-k}=\binom{n}{n-k} p^k(1-p)^{n-k}=\binom{n}{k} p^k(1-p)^{r}</math>
|<math>\text{for }k = 0, 1, 2, \dotsc, n</math>
|}
Each of the four definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: <math display="inline"> \binom ab = \binom a{a-b} \quad \text{for }\ 0\leq b\leq a</math>.  The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: <math display="inline">n=r+k
</math>.  These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms.
* The definition where {{mvar|X}} is the number of {{mvar|n}} '''trials''' that occur for a given number of {{mvar|r}} '''successes''' is similar to the primary definition, except that the number of trials is given instead of the number of failures.  This adds {{mvar|r}} to the value of the random variable, shifting its support and mean. 
* The definition where {{mvar|X}} is the number of {{mvar|k}} '''successes''' (or {{mvar|n}} '''trials''') that occur for a given number of {{mvar|r}} '''failures''' is similar to the primary definition used in this article, except that numbers of failures and successes are switched when considering what is being counted and what is given.  Note however, that {{mvar|p}} still refers to the probability of "success".
* The definition of the negative binomial distribution can be extended to the case where the parameter {{mvar|r}} can take on a positive [[real number|real]] value.  Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function.  The problem of extending the definition to real-valued (positive) {{mvar|r}} boils down to extending the binomial coefficient to its real-valued counterpart, based on the [[gamma function]]:
:: <math>
   \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}
  </math>
: After substituting this expression in the original definition, we say that {{mvar|X}} has a negative binomial (or '''Pólya''') distribution if it has a [[probability mass function]]:
:: <math>
    f(k; r, p) \equiv \Pr(X = k) = \frac{\Gamma(k+r)}{k!\,\Gamma(r)} (1-p)^k p^r \quad\text{for }k = 0, 1, 2, \dotsc
  </math>
: Here {{mvar|r}} is a real, positive number.

In negative binomial regression,<ref name="neg bin reg2">{{cite book|url=https://books.google.com/books?id=0Q_ijxOEBjMC|title=Negative Binomial Regression|last=Hilbe|first=Joseph M.|authorlink=Joseph Hilbe|publisher=Cambridge University Press|year=2011|isbn=978-0-521-19815-8|edition=Second|location=Cambridge, UK}}</ref> the distribution is specified in terms of its mean, <math display="inline">m=\frac{r(1-p)}{p}</math>, which is then related to explanatory variables as in [[linear regression]] or other [[generalized linear model]]s.  From the expression for the mean {{mvar|m}}, one can derive <math display="inline">p=\frac{r}{m+r}</math> and <math display="inline">1-p=\frac{m}{m+r}</math>.  Then, substituting these expressions in [[#Extension to real-valued r|the one for the probability mass function when {{mvar|r}} is real-valued]], yields this parametrization of the probability mass function in terms of&nbsp;{{mvar|m}}:

:<math>
    \Pr(X = k) = \frac{\Gamma(r+k)}{k! \, \Gamma(r)} \left(\frac{r}{r+m}\right)^r \left(\frac{m}{r+m}\right)^k \quad\text{for }k = 0, 1, 2, \dotsc
  </math>
The variance can then be written as <math display="inline">m+\frac{m^2}{r}</math>.  Some authors prefer to set <math display="inline">\alpha = \frac{1}{r}</math>, and express the variance as <math display="inline">m+\alpha m^2</math>.  In this context, and depending on the author, either the parameter {{mvar|r}} or its reciprocal {{mvar|α}} is referred to as the "dispersion parameter", "[[shape parameter]]" or "[[clustering coefficient]]",<ref>{{cite journal|last=Lloyd-Smith|first=J. O.|year=2007|title=Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases|journal=[[PLoS ONE]]|volume=2|issue=2|pages=e180|doi=10.1371/journal.pone.0000180|pmid=17299582|pmc=1791715|bibcode=2007PLoSO...2..180L|doi-access=free}} {{open access}}</ref> or the "heterogeneity"<ref name="neg bin reg2" /> or "aggregation" parameter.<ref name="Crawley 2012"/> The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter {{mvar|r}} towards zero corresponds to increasing aggregation of the organisms; increase of {{mvar|r}} towards infinity corresponds to absence of aggregation, as can be described by [[Poisson regression]].

===Alternative parameterizations===
Sometimes the distribution is parameterized in terms of its mean {{mvar|μ}} and variance {{math|''σ''{{sup|2}}}}:
:: <math>
\begin{align}
& p =\frac{\mu}{\sigma^2}, \\[6pt]
& r =\frac{\mu^2}{\sigma^2-\mu}, \\[3pt]
& \Pr(X=k) = {k+\frac{\mu^2}{\sigma^2-\mu}-1 \choose k}  \left(1-\frac{\mu}{\sigma^2}\right)^k \left(\frac \mu {\sigma^2}\right)^{\mu^2/(\sigma^2-\mu)} \\
& \operatorname{E}(X) = \mu \\
& \operatorname{Var}(X) = \sigma^2 .
\end{align}
</math>

Another popular parameterization uses {{mvar|r}} and the failure [[odds]] {{mvar|β}}:
::<math>
\begin{align}
& p = \frac{1}{1+\beta} \\
& \Pr(X=k) = {k+r-1 \choose k}  \left(\frac{\beta}{1+\beta}\right)^k \left(\frac {1} {1+\beta}\right)^r \\
& \operatorname{E}(X) = r\beta \\
& \operatorname{Var}(X) = r\beta(1+\beta) .
\end{align}
</math>

===Examples===

====Length of hospital stay====
Hospital [[length of stay]] is an example of real-world data that can be modelled well with a negative binomial distribution via [[negative binomial regression]].<ref name="carter">{{cite journal |author=Carter, E.M., Potts, H.W.W. |date=4 April 2014 |title=Predicting length of stay from an electronic patient record system: a primary total knee replacement example |journal=BMC Medical Informatics and Decision Making |volume=14 |pages=26 |doi=10.1186/1472-6947-14-26 |pmc=3992140 |pmid=24708853 |doi-access=free }} {{open access}}</ref><ref>{{Cite journal |last1=Orooji |first1=Arezoo |last2=Nazar |first2=Eisa |last3=Sadeghi |first3=Masoumeh |last4=Moradi |first4=Ali |last5=Jafari |first5=Zahra |last6=Esmaily |first6=Habibollah |date=2021-04-30 |title=Factors associated with length of stay in hospital among the elderly patients using count regression models |url=http://mjiri.iums.ac.ir/article-1-6183-en.html |journal=Medical Journal of the Islamic Republic of Iran |volume=35 |page=5 |doi=10.47176/mjiri.35.5 |pmc=8111647 |pmid=33996656}}</ref>

====Selling candy====
Pat Collis is required to sell candy bars to raise money for the 6th grade field trip.  Pat is (somewhat harshly) not supposed to return home until five candy bars have been sold.  So the child goes door to door, selling candy bars. At each house, there is a 0.6 probability of selling one candy bar and a 0.4 probability of selling nothing.

''What's the probability of selling the last candy bar at the'' {{mvar|n}}-th ''house?''

Successfully selling candy enough times is what defines our stopping criterion (as opposed to failing to sell it), so {{mvar|k}} in this case represents the number of failures and {{mvar|r}} represents the number of successes.  Recall that the {{math|NB(''r'', ''p'')}} distribution describes the probability of {{mvar|k}} failures and {{mvar|r}} successes in {{math|''k'' + ''r''}} {{math|Bernoulli(''p'')}} trials with success on the last trial.  Selling five candy bars means getting five successes.  The number of trials (i.e. houses) this takes is therefore {{math|1=''k'' + 5 = ''n''}}.  The random variable we are interested in is the number of houses, so we substitute {{math|1=''k'' = ''n'' − 5}} into a {{math|NB(5, 0.4)}} mass function and obtain the following mass function of the distribution of houses (for {{math|''n'' ≥ 5}}):

:<math> f(n) = {(n-5) + 5 - 1 \choose n-5} \; (1-0.4)^5 \; 0.4^{n-5} = {n-1 \choose n-5} \; 3^5 \; \frac{2^{n-5}}{5^n}. </math>

''What's the probability that Pat finishes on the tenth house?''

:<math> f(10) = \frac{979776}{9765625} \approx 0.10033. \, </math>

''What's the probability that Pat finishes on or before reaching the eighth house?''

To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities:
:<math> f(5) = \frac{243}{3125} \approx 0.07776 \, </math>
:<math> f(6) = \frac{486}{3125} \approx 0.15552 \, </math>
:<math> f(7) = \frac{2916}{15625} \approx 0.18662 \, </math>
:<math> f(8) = \frac{13608}{78125} \approx 0.17418 \, </math>
:<math>\sum_{j=5}^8 f(j) = \frac{46413}{78125} \approx 0.59409.</math>

''What's the probability that Pat exhausts all 30 houses that happen to stand in the neighborhood?''

This can be expressed as the probability that Pat [[Complementary event|does not]] finish on the fifth through the thirtieth house:
:<math>1-\sum_{j=5}^{30} f(j) = 1 - I_{0.4}(5, 30-5+1) \approx 1 - 0.999999823 = 0.000000177. </math>

Because of the rather high probability that Pat will sell to each house (60 percent), the probability of her ''not'' fulfilling her quest is vanishingly slim.