Editing Chernoff bound (section)

== Sums of independent Bernoulli random variables ==
The bounds in the following sections for [[Bernoulli random variable]]s are derived by using that, for a Bernoulli random variable <math>X_i</math> with probability ''p'' of being equal to 1,

:<math>\operatorname E \left[e^{t\cdot X_i} \right] = (1 - p) e^0 + p e^t = 1 + p (e^t -1) \leq e^{p (e^t - 1)}.</math>

One can encounter many flavors of Chernoff bounds: the original ''additive form'' (which gives a bound on the [[Approximation error|absolute error]]) or the more practical ''multiplicative form'' (which bounds the [[Approximation error|error relative]] to the mean).

=== Multiplicative form (relative error) ===
'''Multiplicative Chernoff bound.''' Suppose {{math|''X''<sub>1</sub>, ..., ''X<sub>n</sub>''}} are [[Statistical independence|independent]] random variables taking values in {{math|{0, 1}.}} Let {{mvar|X}} denote their sum and let {{math|''μ'' {{=}} E[''X'']}} denote the sum's expected value. Then for any {{math|''δ'' > 0}},
:<math>\Pr ( X \ge (1+\delta)\mu) \leq \left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^\mu.</math>
A similar proof strategy can be used to show that for {{math|0 < ''δ'' < 1}}

:<math>\Pr(X \le (1-\delta)\mu) \leq \left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^\mu.</math>

The above formula is often unwieldy in practice, so the following looser but more convenient bounds<ref name="MitzenmacherUpfal">{{cite book | url=https://books.google.com/books?id=0bAYl6d7hvkC | title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis | publisher=Cambridge University Press |author1=Mitzenmacher, Michael  |author2=Upfal, Eli | year=2005 | isbn=978-0-521-83540-4}}</ref> are often used, which follow from the inequality <math>\textstyle\frac{2\delta}{2+\delta} \le \log(1+\delta)</math> from [[List of logarithmic identities#Inequalities|the list of logarithmic inequalities]]:

:<math>\Pr( X \ge (1+\delta)\mu)\le e^{-\delta^2\mu/(2+\delta)}, \qquad 0 \le \delta,</math>
:<math>\Pr( X \le (1-\delta)\mu) \le e^{-\delta^2\mu/2}, \qquad 0 < \delta < 1,</math>
:<math>\Pr( |X - \mu| \ge \delta\mu) \le 2e^{-\delta^2\mu/3}, \qquad 0 < \delta < 1.</math>

Notice that the bounds are trivial for <math>\delta = 0</math>. 

In addition, based on the Taylor expansion for the [[Lambert W function]],<ref name="DillencourtGM"> {{cite journal 
 | last1 = Dillencourt
 | first1 = Michael
 | last2 = Goodrich
 | first2 = Michael
 | last3 = Mitzenmacher
 | first3 = Michael
 | title = Leveraging Parameterized Chernoff Bounds for Simplified Algorithm Analyses
 | journal = Information Processing Letters
 | number = 106516
 | year = 2024
 | volume = 187
 | doi = 10.1016/j.ipl.2024.106516
| doi-access = free
 }}</ref>

:<math>\Pr( X \ge R)\le 2^{-xR}, \qquad x > 0, \  R \ge (2^x e -1)\mu.</math>

=== Additive form (absolute error) ===
The following theorem is due to [[Wassily Hoeffding]]<ref>{{cite journal
 |last1=Hoeffding |first1=W.
 |year=1963
 |title=Probability Inequalities for Sums of Bounded Random Variables
 |journal=[[Journal of the American Statistical Association]]
 |volume=58 |issue=301 |pages=13–30
 |doi=10.2307/2282952
 |jstor=2282952
|url=http://repository.lib.ncsu.edu/bitstream/1840.4/2170/1/ISMS_1962_326.pdf
 }}</ref> and hence is called the Chernoff–Hoeffding theorem.

:'''Chernoff–Hoeffding theorem.''' Suppose {{math|''X''<sub>1</sub>, ..., ''X<sub>n</sub>''}} are [[i.i.d.]] random variables, taking values in {{math|{0, 1}.}} Let {{math|''p'' {{=}} E[''X''<sub>1</sub>]}} and {{math|''ε'' > 0}}.

::<math>\begin{align}
\Pr \left (\frac{1}{n} \sum X_i \geq p + \varepsilon \right ) \leq \left (\left (\frac{p}{p + \varepsilon}\right )^{p+\varepsilon} {\left (\frac{1 - p}{1-p- \varepsilon}\right )}^{1 - p- \varepsilon}\right )^n &= e^{-D(p+\varepsilon\parallel p) n} \\
\Pr \left (\frac{1}{n} \sum X_i \leq p - \varepsilon \right ) \leq \left (\left (\frac{p}{p - \varepsilon}\right )^{p-\varepsilon} {\left (\frac{1 - p}{1-p+ \varepsilon}\right )}^{1 - p+ \varepsilon}\right )^n &= e^{-D(p-\varepsilon\parallel p) n}
\end{align}</math>
:where
::<math> D(x\parallel y) = x \ln \frac{x}{y} + (1-x) \ln \left (\frac{1-x}{1-y} \right )</math>
:is the [[Kullback–Leibler divergence]] between [[Bernoulli distribution|Bernoulli distributed]] random variables with parameters ''x'' and ''y'' respectively. If {{math|''p'' ≥ {{sfrac|1|2}},}} then <math>D(p+\varepsilon\parallel p)\ge \tfrac{\varepsilon^2}{2p(1-p)}</math> which means

::<math> \Pr\left ( \frac{1}{n}\sum X_i>p+x \right ) \leq \exp \left (-\frac{x^2n}{2p(1-p)} \right ).</math>

A simpler bound follows by relaxing the theorem using {{math|''D''(''p'' + ''ε'' {{!!}} ''p'') ≥ 2''ε''<sup>2</sup>}}, which follows from the [[Convex function|convexity]] of {{math|''D''(''p'' + ''ε'' {{!!}} ''p'')}} and the fact that

:<math>\frac{d^2}{d\varepsilon^2} D(p+\varepsilon\parallel p) = \frac{1}{(p+\varepsilon)(1-p-\varepsilon) } \geq 4 =\frac{d^2}{d\varepsilon^2}(2\varepsilon^2).</math>

This result is a special case of [[Hoeffding's inequality]]. Sometimes, the bounds

:<math>
\begin{align}
D( (1+x) p \parallel p) \geq \frac{1}{4} x^2 p, & & & {-\tfrac{1}{2}} \leq x \leq \tfrac{1}{2},\\[6pt]
D(x \parallel y) \geq \frac{3(x-y)^2}{2(2y+x)}, \\[6pt]
D(x \parallel y) \geq \frac{(x-y)^2}{2y}, & & & x \leq y,\\[6pt]
D(x \parallel y) \geq \frac{(x-y)^2}{2x}, & & & x \geq y
\end{align}
</math>

which are stronger for {{math|''p'' < {{sfrac|1|8}},}} are also used.