Editing Negative binomial distribution (section)

===Alternative formulations===
Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable {{mvar|X}} is counting different things. These variations can be seen in the table here:
{| class="wikitable"
|
!{{mvar|X}} is counting...
!Probability mass function
!Formula
!Alternate formula
(using equivalent binomial)
!Alternate formula
(simplified using: <math display="inline">n=k+r
</math>)
!Support
|-
|1
|{{mvar|k}} failures, given {{mvar|r}} successes
|<math display="inline">f(k; r, p) \equiv \Pr(X = k) =
</math>
|<math display="inline">\binom{k+r-1}{k} p^r(1-p)^k
</math><ref>{{Cite web|url=http://www.mathworks.com/help/stats/negative-binomial-distribution.html|title=Mathworks: Negative Binomial Distribution}}</ref><ref name="Cook">{{Cite web|url=http://www.johndcook.com/negative_binomial.pdf|title=Notes on the Negative Binomial Distribution|last=Cook|first=John D.}}</ref><ref>{{Cite web|url=http://www.stat.ufl.edu/~abhisheksaha/sta4321/lect14.pdf|title=Introduction to Probability / Fundamentals of Probability: Lecture 14|last=Saha|first=Abhishek}}</ref>
|<math display="inline">\binom{k+r-1}{r-1} p^r(1-p)^k
</math><ref name="Wolfram"
/>
<ref>[[SAS Institute]], "[https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0zb2l2xnsw2ctn1qe2os5yk2l9c.htm Negative Binomial Distribution]", ''SAS(R) 9.4 Functions and CALL Routines: Reference, Fourth Edition'', SAS Institute, Cary, NC, 2016.</ref><ref name="Crawley 2012">{{cite book|url=https://books.google.com/books?id=XYDl0mlH-moC|title=The R Book|last=Crawley|first=Michael J.|publisher=Wiley|year=2012|isbn=978-1-118-44896-0}}</ref><ref name=":0">{{Cite web|url=http://www.math.ntu.edu.tw/~hchen/teaching/StatInference/notes/lecture16.pdf|title=Set theory: Section 3.2.5 – Negative Binomial Distribution}}</ref>
| rowspan="2" |<math display="inline">\binom{n-1}{k} p^r(1-p)^k
</math>
|<math>\text{for }k = 0, 1, 2, \ldots</math>
|-
|2
|{{mvar|n}} trials, given {{mvar|r}} successes
|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =
</math>
|<math display="inline">\binom{n-1}{r-1} p^r(1-p)^{n-r}
</math><ref name="Cook" /><ref name=":0" /><ref>{{Cite web|url=http://www.randomservices.org/random/bernoulli/NegativeBinomial.html|title=Randomservices.org, Chapter 10: Bernoulli Trials, Section 4: The Negative Binomial Distribution}}</ref><ref>{{Cite web|url=http://stattrek.com/probability-distributions/negative-binomial.aspx|title=Stat Trek: Negative Binomial Distribution}}</ref><ref>{{Cite web|url=http://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.5.pdf|title=Distinguishing Between Binomial, Hypergeometric and Negative Binomial Distributions|last=Wroughton|first=Jacqueline}}</ref>
|<math display="inline">\binom{n-1}{n-r} p^r(1-p)^{n-r}
</math>
| rowspan="2" |<math>\text{for }n = r, r+1, r+2, \dotsc</math>
|-
|3
|{{mvar|n}} trials, given {{mvar|r}} failures
|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =
</math>
|<math display="inline">\binom{n-1}{r-1} p^{n-r}(1-p)^{r}
</math>
|<math display="inline">\binom{n-1}{n-r} p^{n-r}(1-p)^{r}
</math>
| rowspan="2" |<math display="inline">\binom{n-1}{k} p^{k}(1-p)^{r}
</math>
|-
|4
|{{mvar|k}} successes, given {{mvar|r}} failures
|<math display="inline">f(k; r, p) \equiv \Pr(X = k) =
</math>
|<math display="inline">\binom{k+r-1}{k} p^k(1-p)^r
</math>
|<math display="inline">\binom{k+r-1}{r-1} p^k(1-p)^r
</math>
|<math>\text{for }k = 0, 1, 2, \ldots</math>
|-
| -
|{{mvar|k}} successes, given {{mvar|n}} trials
|<math display="inline">f(k; n, p) \equiv \Pr(X = k) = </math>
| colspan="3" |This is the [[binomial distribution]] not the negative binomial:  <math display="inline">\binom{n}{k} p^k(1-p)^{n-k}=\binom{n}{n-k} p^k(1-p)^{n-k}=\binom{n}{k} p^k(1-p)^{r}</math>
|<math>\text{for }k = 0, 1, 2, \dotsc, n</math>
|}
Each of the four definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: <math display="inline"> \binom ab = \binom a{a-b} \quad \text{for }\ 0\leq b\leq a</math>.  The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: <math display="inline">n=r+k
</math>.  These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms.
* The definition where {{mvar|X}} is the number of {{mvar|n}} '''trials''' that occur for a given number of {{mvar|r}} '''successes''' is similar to the primary definition, except that the number of trials is given instead of the number of failures.  This adds {{mvar|r}} to the value of the random variable, shifting its support and mean. 
* The definition where {{mvar|X}} is the number of {{mvar|k}} '''successes''' (or {{mvar|n}} '''trials''') that occur for a given number of {{mvar|r}} '''failures''' is similar to the primary definition used in this article, except that numbers of failures and successes are switched when considering what is being counted and what is given.  Note however, that {{mvar|p}} still refers to the probability of "success".
* The definition of the negative binomial distribution can be extended to the case where the parameter {{mvar|r}} can take on a positive [[real number|real]] value.  Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function.  The problem of extending the definition to real-valued (positive) {{mvar|r}} boils down to extending the binomial coefficient to its real-valued counterpart, based on the [[gamma function]]:
:: <math>
   \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}
  </math>
: After substituting this expression in the original definition, we say that {{mvar|X}} has a negative binomial (or '''Pólya''') distribution if it has a [[probability mass function]]:
:: <math>
    f(k; r, p) \equiv \Pr(X = k) = \frac{\Gamma(k+r)}{k!\,\Gamma(r)} (1-p)^k p^r \quad\text{for }k = 0, 1, 2, \dotsc
  </math>
: Here {{mvar|r}} is a real, positive number.

In negative binomial regression,<ref name="neg bin reg2">{{cite book|url=https://books.google.com/books?id=0Q_ijxOEBjMC|title=Negative Binomial Regression|last=Hilbe|first=Joseph M.|authorlink=Joseph Hilbe|publisher=Cambridge University Press|year=2011|isbn=978-0-521-19815-8|edition=Second|location=Cambridge, UK}}</ref> the distribution is specified in terms of its mean, <math display="inline">m=\frac{r(1-p)}{p}</math>, which is then related to explanatory variables as in [[linear regression]] or other [[generalized linear model]]s.  From the expression for the mean {{mvar|m}}, one can derive <math display="inline">p=\frac{r}{m+r}</math> and <math display="inline">1-p=\frac{m}{m+r}</math>.  Then, substituting these expressions in [[#Extension to real-valued r|the one for the probability mass function when {{mvar|r}} is real-valued]], yields this parametrization of the probability mass function in terms of&nbsp;{{mvar|m}}:

:<math>
    \Pr(X = k) = \frac{\Gamma(r+k)}{k! \, \Gamma(r)} \left(\frac{r}{r+m}\right)^r \left(\frac{m}{r+m}\right)^k \quad\text{for }k = 0, 1, 2, \dotsc
  </math>
The variance can then be written as <math display="inline">m+\frac{m^2}{r}</math>.  Some authors prefer to set <math display="inline">\alpha = \frac{1}{r}</math>, and express the variance as <math display="inline">m+\alpha m^2</math>.  In this context, and depending on the author, either the parameter {{mvar|r}} or its reciprocal {{mvar|α}} is referred to as the "dispersion parameter", "[[shape parameter]]" or "[[clustering coefficient]]",<ref>{{cite journal|last=Lloyd-Smith|first=J. O.|year=2007|title=Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases|journal=[[PLoS ONE]]|volume=2|issue=2|pages=e180|doi=10.1371/journal.pone.0000180|pmid=17299582|pmc=1791715|bibcode=2007PLoSO...2..180L|doi-access=free}} {{open access}}</ref> or the "heterogeneity"<ref name="neg bin reg2" /> or "aggregation" parameter.<ref name="Crawley 2012"/> The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter {{mvar|r}} towards zero corresponds to increasing aggregation of the organisms; increase of {{mvar|r}} towards infinity corresponds to absence of aggregation, as can be described by [[Poisson regression]].