Editing Binomial distribution (section)

== Properties ==
=== Expected value and variance ===

If {{math|''X'' ~ ''B''(''n'', ''p'')}}, that is, {{math|''X''}} is a binomially distributed random variable, {{mvar|n}} being the total number of experiments and ''p'' the probability of each experiment yielding a successful result, then the [[expected value]] of {{math|''X''}} is:<ref>See [https://proofwiki.org/wiki/Expectation_of_Binomial_Distribution Proof Wiki]</ref>
: <math> \operatorname{E}[X] = np.</math>

This follows from the linearity of the expected value along with the fact that {{mvar|X}} is the sum of {{mvar|n}} identical Bernoulli random variables, each with expected value {{mvar|p}}.  In other words, if <math>X_1, \ldots, X_n</math> are identical (and independent) Bernoulli random variables with parameter {{mvar|p}}, then {{math|1=''X'' = ''X''<sub>1</sub> + ... + ''X''<sub>''n''</sub>}} and
: <math>\operatorname{E}[X] = \operatorname{E}[X_1 + \cdots + X_n] = \operatorname{E}[X_1] + \cdots + \operatorname{E}[X_n] = p + \cdots + p = np.</math>

The [[variance]] is:
: <math> \operatorname{Var}(X) = npq = np(1 - p).</math>

This similarly follows from the fact that the variance of a sum of independent random variables is the sum of the variances.

=== Higher moments ===
<!-- Please stop changing the equation \mu_1 = 0, it is correct. The first central moment is not the mean. -->
The first 6 [[central moment]]s, defined as <math> \mu _{c}=\operatorname {E} \left[(X-\operatorname {E} [X])^{c}\right] </math>, are given by 
: <math>\begin{align}
\mu_1 &= 0, \\ 
\mu_2 &= np(1-p),\\
\mu_3 &= np(1-p)(1-2p),\\
\mu_4 &= np(1-p)(1+(3n-6)p(1-p)),\\
\mu_5 &= np(1-p)(1-2p)(1+(10n-12)p(1-p)),\\
\mu_6 &= np(1-p)(1-30p(1-p)(1-4p(1-p))+5np(1-p)(5-26p(1-p))+15n^2 p^2 (1-p)^2).
\end{align}</math>

The non-central moments satisfy
: <math>\begin{align}
\operatorname {E}[X] &= np, \\ 
\operatorname {E}[X^2] &= np(1-p)+n^2p^2,
\end{align}</math>
and in general
<ref name="Andreas2008">
{{citation 
 |last1=Knoblauch |first1=Andreas 
 |title=Closed-Form Expressions for the Moments of the Binomial Probability Distribution 
 |year=2008
 |journal=SIAM Journal on Applied Mathematics 
 |url=https://www.jstor.org/stable/40233780
 |volume=69 
 |issue=1 
 |pages=197–204
 |doi=10.1137/070700024 
 |jstor=40233780 
|url-access=subscription 
 }}</ref><ref name="Nguyen2021">
{{citation 
 |last1=Nguyen |first1=Duy 
 |title=A probabilistic approach to the moments of binomial random variables and application 
 |year=2021
 |journal=The American Statistician 
 |url=https://www.tandfonline.com/doi/abs/10.1080/00031305.2019.1679257?journalCode=utas20
 |volume=75
 |issue=1 
 |pages=101–103 
 |doi=10.1080/00031305.2019.1679257 
 |s2cid=209923008 
|url-access=subscription 
 }}</ref>
: <math>
\operatorname {E}[X^c] = \sum_{k=0}^c \left\{ {c \atop k} \right\} n^{\underline{k}} p^k,
</math>
where <math>\textstyle \left\{{c\atop k}\right\}</math> are the [[Stirling numbers of the second kind]], and <math>n^{\underline{k}} = n(n-1)\cdots(n-k+1)</math> is the <math>k</math>th [[Falling and rising factorials|falling power]] of <math>n</math>.
A simple bound
<ref>{{Citation |last1=D. Ahle |first1=Thomas |title=Sharp and Simple Bounds for the raw Moments of the Binomial and Poisson Distributions
|year=2022
|volume=182
|doi=10.1016/j.spl.2021.109306
|journal=Statistics & Probability Letters
|page=109306 |arxiv=2103.17027 }}</ref> follows by bounding the Binomial moments via the [[Poisson distribution#Higher moments|higher Poisson moments]]: 
: <math>
\operatorname {E}[X^c] \le
\left(\frac{c}{\ln(c/(np)+1)}\right)^c \le (np)^c \exp\left(\frac{c^2}{2np}\right).
</math>
This shows that if <math>c=O(\sqrt{np})</math>, then <math>\operatorname {E}[X^c]</math> is at most a constant factor away from <math>\operatorname {E}[X]^c</math>

=== Mode ===

Usually the [[mode (statistics)|mode]] of a binomial {{math|''B''(''n'', ''p'')}} distribution is equal to <math>\lfloor (n+1)p\rfloor</math>, where  <math>\lfloor\cdot\rfloor</math> is the [[floor function]]. However, when {{math|(''n'' + 1)''p''}} is an integer and {{math|''p''}} is neither 0 nor 1, then the distribution has two modes: {{math|(''n'' + 1)''p''}} and {{math|(''n'' + 1)''p'' − 1}}. When {{math|''p''}} is equal to 0 or 1, the mode will be 0 and {{math|''n''}} correspondingly. These cases can be summarized as follows:
: <math>\text{mode} =
      \begin{cases}
        \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\
        (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\
        n & \text{if }(n+1)p = n + 1.
      \end{cases}</math>

'''Proof:''' Let
: <math>f(k)=\binom nk p^k q^{n-k}.</math>

For <math>p=0</math> only <math>f(0)</math> has a nonzero value with <math>f(0)=1</math>. For <math>p=1</math> we find <math>f(n)=1</math> and <math>f(k)=0</math> for <math>k\neq n</math>. This proves that the mode is 0 for <math>p=0</math> and <math>n</math> for <math>p=1</math>.

Let <math>0 < p < 1</math>. We find
:<math>\frac{f(k+1)}{f(k)} = \frac{(n-k)p}{(k+1)(1-p)}</math>.

From this follows
: <math>\begin{align}
k > (n+1)p-1 \Rightarrow f(k+1) < f(k) \\
k = (n+1)p-1 \Rightarrow f(k+1) = f(k) \\
k < (n+1)p-1 \Rightarrow f(k+1) > f(k)
\end{align}</math>

So when <math>(n+1)p-1</math> is an integer, then <math>(n+1)p-1</math> and <math>(n+1)p</math> is a mode. In the case that <math>(n+1)p-1\notin \Z</math>, then only <math>\lfloor (n+1)p-1\rfloor+1=\lfloor (n+1)p\rfloor</math> is a mode.<ref>See also {{cite web |first=André |last=Nicolas |title=Finding mode in Binomial distribution |work=[[Stack Exchange]] |date=January 7, 2019 |url=https://math.stackexchange.com/q/117940 }}</ref>

=== Median ===
In general, there is no single formula to find the [[median]] for a binomial distribution, and it may even be non-unique. However, several special results have been established:
* If {{math|''np''}} is an integer, then the mean, median, and mode coincide and equal {{math|''np''}}.<ref>{{cite journal|last=Neumann|first=P.|year=1966|title=Über den Median der Binomial- and Poissonverteilung|journal=Wissenschaftliche Zeitschrift der Technischen Universität Dresden|volume=19|pages=29–33|language=de}}</ref><ref>Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", [[The Mathematical Gazette]] 94, 331-332.</ref>
* Any median {{math|''m''}} must lie within the interval <math>\lfloor np \rfloor\leq m \leq \lceil np \rceil</math>.<ref name="KaasBuhrman">{{cite journal|first1=R.|last1=Kaas|first2=J.M.|last2=Buhrman|title=Mean, Median and Mode in Binomial Distributions|journal=Statistica Neerlandica|year=1980|volume=34|issue=1|pages=13–18|doi=10.1111/j.1467-9574.1980.tb00681.x}}</ref>
* A median {{math|''m''}} cannot lie too far away from the mean:<math>|m-np|\leq \min\{{\ln2}, \max\{p,1-p\}\}</math> .<ref name="Hamza">
{{cite journal
 | last1 = Hamza | first1 = K.
 | doi = 10.1016/0167-7152(94)00090-U
 | title = The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions
 | journal = Statistics & Probability Letters
 | volume = 23
 | pages = 21–25
 | year = 1995
}}</ref>
* The median is unique and equal to {{math|1=''m'' = [[Rounding|round]](''np'')}} when {{math|1={{abs|''m'' − ''np''}} ≤ min{{brace|''p'', 1 − ''p''}}}} (except for the case when {{math|1=''p'' = 1/2}} and {{math|''n''}} is odd).<ref name="KaasBuhrman"/>
* When {{math|''p''}} is a rational number (with the exception of {{math|1=''p'' = 1/2}}\ and {{math|''n''}} odd) the median is unique.<ref name="Nowakowski">
{{cite journal
 | last1 = Nowakowski | first1 = Sz.
 | doi = 10.37418/amsj.10.4.9
 | issn=1857-8365
 | title = Uniqueness of a Median of a Binomial Distribution with Rational Probability
 | journal = Advances in Mathematics: Scientific Journal
 | volume = 10
 | issue = 4
 | pages = 1951–1958
 | year = 2021
 | arxiv = 2004.03280
 | s2cid = 215238991
}}</ref>
* When <math>p= \frac{1}{2} </math> and {{math|''n''}} is odd, any number {{math|''m''}} in the interval <math> \frac{1}{2} \bigl(n-1\bigr)\leq m \leq  \frac{1}{2}  \bigl(n+1\bigr)</math>  is a median of the binomial distribution. If <math>p= \frac{1}{2}  </math> and {{math|''n''}} is even, then  <math>m= \frac{n}{2} </math> is the unique median.

=== Tail bounds ===
For {{math|''k'' ≤ ''np''}}, upper bounds can be derived for the lower tail of the cumulative distribution function <math>F(k;n,p) = \Pr(X \le k)</math>, the probability that there are at most {{math|''k''}} successes. Since <math>\Pr(X \ge k) = F(n-k;n,1-p) </math>, these bounds can also be seen as bounds for the upper tail of the cumulative distribution function for {{math|''k'' ≥ ''np''}}.

[[Hoeffding's inequality]] yields the simple bound
: <math> F(k;n,p) \leq \exp\left(-2 n\left(p-\frac{k}{n}\right)^2\right), \!</math>
which is however not very tight. In particular, for {{math|1=''p'' = 1}}, we have that {{math|1=''F''(''k''; ''n'', ''p'') = 0}} (for fixed {{math|''k''}}, {{math|''n''}} with {{math|''k'' < ''n''}}), but Hoeffding's bound evaluates to a positive constant.

A sharper bound can be obtained from the [[Chernoff bound]]:<ref name="ag">{{cite journal |first1=R. |last1=Arratia |first2=L. |last2=Gordon |title=Tutorial on large deviations for the binomial distribution |journal=Bulletin of Mathematical Biology |volume=51 |issue=1 |year=1989 |pages=125–131 |doi=10.1007/BF02458840 |pmid=2706397 |s2cid=189884382 }}</ref>
: <math> F(k;n,p) \leq \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right)  </math>
where {{math|''D''(''a'' ∥ ''p'')}} is the [[Kullback–Leibler divergence|relative entropy (or Kullback-Leibler divergence)]] between an {{math|''a''}}-coin and a {{math|''p''}}-coin (i.e. between the {{math|Bernoulli(''a'')}} and {{math|Bernoulli(''p'')}} distribution):
: <math> D(a\parallel p)=(a)\ln\frac{a}{p}+(1-a)\ln\frac{1-a}{1-p}. \!</math>

Asymptotically, this bound is reasonably tight; see <ref name="ag"/> for details.

One can also obtain ''lower'' bounds on the tail {{math|''F''(''k''; ''n'', ''p'')}}, known as anti-concentration bounds. By approximating the binomial coefficient with [[Stirling's approximation|Stirling's formula]] it can be shown that<ref>{{cite book |author1=Robert B. Ash |title=Information Theory |url=https://archive.org/details/informationtheor00ashr |url-access=limited |date=1990 |publisher=Dover Publications |page=[https://archive.org/details/informationtheor00ashr/page/n81 115]|isbn=9780486665214 }}</ref>
: <math> F(k;n,p) \geq \frac{1}{\sqrt{8n\tfrac{k}{n}(1-\tfrac{k}{n})}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right),</math>
which implies the simpler but looser bound
: <math> F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right).</math>

For {{math|1=''p'' = 1/2}} and {{math|''k'' ≥ 3''n''/8}} for even {{math|''n''}}, it is possible to make the denominator constant:<ref>{{cite web |last1=Matoušek |first1=J. |last2=Vondrak |first2=J. |title=The Probabilistic Method |work=lecture notes |url=https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f09/www/handouts/matousek-vondrak-prob-ln.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f09/www/handouts/matousek-vondrak-prob-ln.pdf |archive-date=2022-10-09 |url-status=live }}</ref>
: <math> F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- 16n \left(\frac{1}{2} -\frac{k}{n}\right)^2\right). \!</math>