Editing Azuma's inequality

{{Short description|Theorem in probability theory}}
In [[probability theory]], the '''Azuma–Hoeffding inequality''' (named after [[Kazuoki Azuma]] and [[Wassily Hoeffding]]) gives a [[concentration inequality|concentration result]] for the values of [[martingale (probability theory)|martingale]]s that have bounded differences.

Suppose <math>\{X_k: k=0,1,2,3,\dots\}</math> is a [[Martingale (probability theory)|martingale]] (or [[Martingale_(probability_theory)#Submartingales.2C_supermartingales.2C_and_relationship_to_harmonic_functions|super-martingale]]) and

:<math>|X_k - X_{k-1}| \leq c_k, \, </math>

[[almost surely]]. Then for all positive integers ''N'' and all positive [[real number|reals]] ''<math>\epsilon</math>'',

:<math>\text{P}(X_N - X_0 \geq \epsilon) \leq \exp\left ({-\epsilon^2 \over  2\sum_{k=1}^N c_k^2} \right). </math>

And symmetrically (when ''X''<sub>''k''</sub> is a sub-martingale):

:<math>\text{P}(X_N - X_0 \leq -\epsilon) \leq \exp\left ({-\epsilon^2 \over  2\sum_{k=1}^N c_k^2} \right). </math>

If ''X'' is a martingale, using both inequalities above and applying the [[union bound]] allows one to obtain a two-sided bound:

:<math>\text{P}(|X_N - X_0| \geq \epsilon) \leq 2\exp\left ({-\epsilon^2 \over  2\sum_{k=1}^N c_k^2} \right). </math>

==Proof==
The proof shares similar idea of the proof for the general form of Azuma's inequality listed below. Actually, this can be viewed as a direct corollary of the general form of Azuma's inequality.

==A general form of Azuma's inequality==

===Limitation of the vanilla Azuma's inequality===
Note that the vanilla Azuma's inequality requires symmetric bounds on martingale increments, i.e. <math>-c_t \leq X_t - X_{t-1} \leq c_t</math>. So, if known bound is asymmetric, e.g. <math>a_t \leq X_t - X_{t-1} \leq b_t</math>, to use Azuma's inequality, one need to choose <math>c_t = \max(|a_t|, |b_t|)</math> which might be a waste of information on the boundedness of <math>X_t - X_{t-1}</math>. However, this issue can be resolved and one can obtain a tighter probability bound with the following general form of Azuma's inequality.

===Statement===
Let <math>\left\{X_0, X_1, \cdots \right\}</math> be a martingale (or supermartingale) with respect to [[Filtration_(probability_theory) | filtration]] <math>\left\{\mathcal{F}_0, \mathcal{F}_1, \cdots \right\}</math>. Assume there are [[Predictable_process|predictable processes]] <math>\left\{A_0, A_1, \cdots\right\} </math> and <math>\left\{ B_0, B_1, \dots \right\}</math> with respect to <math>\left\{ \mathcal{F}_0, \mathcal{F}_1, \cdots \right\} </math>, i.e. for all <math>t</math>, <math>A_t, B_t</math> are <math>\mathcal{F}_{t-1}</math>-[[Measurable_function|measurable]], and constants <math>0<c_1, c_2, \cdots<\infty</math> such that
:<math>
A_t \leq X_t - X_{t-1} \leq B_t \quad \text{and} \quad B_t - A_t \leq c_t
</math>
almost surely. Then for all <math>\epsilon>0</math>,
:<math>
\text{P}(X_n - X_0 \geq \epsilon) \leq \exp \left( - \frac{2\epsilon^2}{ \sum_{t=1}^{n} c_t^2 } \right).
</math>
Since a submartingale is a supermartingale with signs reversed, we have if instead <math>\left\{X_0, X_1, \dots \right\}</math> is a martingale (or submartingale),
:<math>
\text{P}(X_n - X_0 \leq -\epsilon) \leq \exp \left(- \frac{2\epsilon^2}{ \sum_{t=1}^{n} c_t^2 } \right).
</math>
If <math>\left\{X_0, X_1, \dots \right\}</math> is a martingale, since it is both a supermartingale and submartingale, by applying union bound to the two inequalities above, we could obtain the two-sided bound:
:<math>
\text{P}(|X_n - X_0| \geq \epsilon) \leq 2\exp \left(- \frac{2\epsilon^2}{ \sum_{t=1}^{n} c_t^2 } \right).
</math>

===Proof===
We will prove the supermartingale case only as the rest are self-evident. By [[Doob_decomposition_theorem|Doob decomposition]], we could decompose supermartingale <math>\left\{X_t\right\}</math> as <math>X_t = Y_t + Z_t</math> where <math>\left\{Y_t, \mathcal{F}_t\right\}</math> is a martingale and <math>\left\{Z_t, \mathcal{F}_t\right\}</math> is a nonincreasing predictable sequence (Note that if <math>\left\{X_t\right\}</math> itself is a martingale, then <math>Z_t = 0</math>). From <math>A_t \leq X_t - X_{t-1} \leq B_t</math>, we have
:<math>
-(Z_t - Z_{t-1}) + A_t 
\leq 
Y_t - Y_{t-1}
\leq
-(Z_t - Z_{t-1}) + B_t 
</math>
Applying [[Chernoff bound]] to <math>Y_n - Y_0</math>, we have for <math>\epsilon>0</math>,

:<math>\begin{align}
\text{P}(Y_n-Y_0 \geq \epsilon) 
& \leq \underset{s>0}{\min} \ e^{-s\epsilon} \mathbb{E} [e^{s (Y_n-Y_0) }] \\
& = \underset{s>0}{\min} \ e^{-s\epsilon} \mathbb{E} \left[\exp \left( s \sum_{t=1}^{n}(Y_t-Y_{t-1}) \right) \right] \\
& = \underset{s>0}{\min} \ e^{-s\epsilon} \mathbb{E} \left[\exp \left( s \sum_{t=1}^{n-1}(Y_t-Y_{t-1}) \right)  \mathbb{E} \left[\exp \left( s(Y_n-Y_{n-1}) \right) \mid \mathcal{F}_{n-1} \right] \right]
\end{align}</math>

For the inner expectation term, since 

(i) <math>\mathbb{E}[Y_t - Y_{t-1} \mid \mathcal{F}_{t-1}]=0</math> as <math>\left\{Y_t\right\}</math> is a martingale; 

(ii) <math>	-(Z_t - Z_{t-1}) + A_t \leq Y_t - Y_{t-1} \leq -(Z_t - Z_{t-1}) + B_t </math>;

(iii) <math>-(Z_t - Z_{t-1}) + A_t</math> and <math>-(Z_t - Z_{t-1}) + B_t</math> are both <math>\mathcal{F}_{t-1}</math>-measurable as <math>\left\{Z_t\right\}</math> is a predictable process; 

(iv) <math>B_t - A_t \leq c_t</math>;

by applying [[Hoeffding's lemma]]{{NoteTag|It is not a direct application of Hoeffding's lemma though. The statement of Hoeffding's lemma handles the total expectation, but it also holds for the case when the expectation is conditional expectation and the bounds are measurable with respect to the sigma-field the conditional expectation is conditioned on. The proof is the same as for the classical Hoeffding's lemma.}}, we have
:<math>
\mathbb{E} \left[\exp \left(  s(Y_t-Y_{t-1}) \right) \mid \mathcal{F}_{t-1} \right]
\leq
\exp \left(\frac{s^2 (B_t - A_t)^2}{8} \right)
\leq
\exp \left(\frac{s^2 c_t^2}{8} \right).
</math>
Repeating this step, one could get
:<math>
\text{P}(Y_n-Y_0 \geq \epsilon)
\leq
\underset{s>0}{\min} \ e^{-s\epsilon} \exp \left(\frac{s^2 \sum_{t=1}^{n}c_t^2}{8}\right).
</math>
Note that the minimum is achieved at <math>s = \frac{4 \epsilon}{\sum_{t=1}^{n}c_t^2}</math>, so we have
:<math>
\text{P}(Y_n-Y_0 \geq \epsilon)
\leq
\exp \left(-\frac{2 \epsilon^2}{\sum_{t=1}^{n}c_t^2}\right).
</math>	
Finally, since <math>X_n - X_0 = (Y_n - Y_0) + (Z_n - Z_0)</math> and <math>Z_n - Z_0 \leq 0</math> as <math>\left\{Z_n \right\}</math> is nonincreasing, so event <math>\left\{X_n - X_0 \geq \epsilon\right\}</math> implies <math>\left\{Y_n - Y_0 \geq \epsilon\right\}</math>, and therefore
:<math>
\text{P}(X_n-X_0 \geq \epsilon)
\leq
\text{P}(Y_n-Y_0 \geq \epsilon)
\leq
\exp \left(-\frac{2 \epsilon^2}{\sum_{t=1}^{n}c_t^2}\right). \square
</math>

===Remark===
Note that by setting <math>A_t = -c_t, B_t = c_t</math>, we could obtain the vanilla Azuma's inequality.

Note that for either submartingale or supermartingale, only one side of Azuma's inequality holds. We can't say much about how fast a submartingale with bounded increments rises (or a supermartingale falls).

This general form of Azuma's inequality applied to the [[Doob martingale]] gives [[Doob_martingale#McDiarmid's_inequality|McDiarmid's inequality]] which is common in the analysis of [[randomized algorithm]]s.


==Simple example of Azuma's inequality for coin flips==
Let ''F''<sub>''i''</sub> be a sequence of independent and identically distributed random coin flips (i.e., let ''F''<sub>''i''</sub> be equally likely to be −1 or 1 independent of the other values of ''F''<sub>''i''</sub>).  Defining <math>X_i = \sum_{j=1}^i F_j</math> yields a [[Martingale (probability theory)|martingale]] with |''X''<sub>''k''</sub>&nbsp;&minus;&nbsp;''X''<sub>''k''&minus;1</sub>|&nbsp;≤&nbsp;1, allowing us to apply Azuma's inequality.  Specifically, we get

:<math> \operatorname{P}(X_n > t) \leq \exp\left(\frac{-t^2}{2 n}\right).</math>

For example, if we set ''t'' proportional to ''n'', then this tells us that although the ''maximum'' possible value of ''X''<sub>''n''</sub> scales linearly with ''n'', the ''probability'' that the sum scales linearly with ''n'' [[exponential decay|decreases exponentially fast]] with&nbsp;''n''.

If we set <math>t=\sqrt{2 n \ln n}</math> we get:

:<math> \operatorname{P}\left(X_n > \sqrt{2 n \ln n}\right) \leq \frac1n,</math>

which means that the probability of deviating more than <math>\sqrt{2 n \ln n}</math> approaches 0 as ''n'' goes to infinity.

==Remark==

A [[Bernstein inequalities (probability theory)|similar inequality]] was proved under weaker assumptions by [[Sergei Bernstein]] in 1937.

Hoeffding proved this result for independent variables rather than martingale differences, and also observed that slight modifications of his argument establish the result for martingale differences (see page 9 of his 1963 paper).

==See also==
* [[Concentration inequality]] - a summary of tail-bounds on random variables.

== Notes ==
{{NoteFoot}}

==References==

* {{cite book|first1=N. |last1=Alon |first2= J. |last2=Spencer|title=The Probabilistic Method|publisher= Wiley|location=New York|year= 1992}}
* {{cite journal|doi=10.2748/tmj/1178243286|first=K. |last=Azuma|title=Weighted Sums of Certain Dependent Random Variables|journal=[[Tôhoku Mathematical Journal]]|volume=19|pages= 357–367 |year=1967|issue=3|url=http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.tmj/1178243286|format=PDF|mr=0221571|doi-access=free|url-access=subscription}}
* {{cite journal| last=Bernstein | first=Sergei N. | authorlink=Sergei Natanovich Bernstein | year=1937 |trans-title=On certain modifications of Chebyshev's inequality | journal=Doklady Akademii Nauk SSSR | volume=17 | issue=6 | pages=275–277 |script-title=ru:О некоторых модификациях неравенства Чебышёва|language=Russian }} (vol. 4, item 22 in the collected works)
* {{cite book| first=C. |last=McDiarmid|chapter= On the method of bounded differences|title=Surveys in Combinatorics|series= London Math. Soc. Lectures Notes 141|publisher= Cambridge Univ. Press|location= Cambridge |year=1989|pages=148–188|mr=1036755}}
* {{Cite journal|doi=10.2307/2282952|first1=W. |last1=Hoeffding|title=Probability inequalities for sums of bounded random variables|journal=Journal of the American Statistical Association|volume=58|issue=301|pages= 13–30|year= 1963|mr=  0144363  |jstor=2282952 |url=http://www.lib.ncsu.edu/resolver/1840.4/2170 |url-access=subscription}}
* {{Cite book|first1=A. P. |last1=Godbole |first2= P. |last2=Hitczenko |title=Microsurveys in Discrete Probability |chapter=Beyond the method of bounded differences |series=DIMACS Series in Discrete Mathematics and Theoretical Computer Science |volume= 41|pages=43–58|year= 1998 |mr=1630408  |doi=10.1090/dimacs/041/03 |isbn=9780821808276 }}

[[Category:Probabilistic inequalities]]
[[Category:Martingale theory]]