Editing Chernoff bound (section)

== Matrix Chernoff bound ==
{{main|Matrix Chernoff bound}}

[[Rudolf Ahlswede]] and [[Andreas Winter]] introduced a Chernoff bound for matrix-valued random variables.<ref>{{cite journal
 |last1=Ahlswede |first1=R.
 |last2=Winter |first2=A.
 |year=2003
 |title=Strong Converse for Identification via Quantum Channels
 |volume=48 |issue=3 |pages=569–579
 |journal=[[IEEE Transactions on Information Theory]]
 |arxiv=quant-ph/0012127
 |doi=10.1109/18.985947
|s2cid=523176
 }}</ref> The following version of the inequality can be found in the work of Tropp.<ref>{{cite journal
 |last1=Tropp |first1=J.
 |year=2010
 |title=User-friendly tail bounds for sums of random matrices
 |arxiv=1004.4389
 |doi=10.1007/s10208-011-9099-z
 |volume=12
 |issue=4
 |journal=Foundations of Computational Mathematics
 |pages=389–434
|s2cid=17735965
 }}</ref>

Let {{math|''M''<sub>1</sub>, ..., ''M<sub>t</sub>''}} be independent matrix valued random variables such that <math> M_i\in \mathbb{C}^{d_1 \times d_2} </math> and <math> \mathbb{E}[M_i]=0</math>.
Let us denote by <math> \lVert M \rVert </math> the operator norm of the matrix <math> M </math>. If <math> \lVert M_i \rVert \leq \gamma </math> holds almost surely for all <math> i\in\{1,\ldots, t\} </math>, then for every {{math|''ε'' > 0}}

:<math>\Pr\left( \left\| \frac{1}{t} \sum_{i=1}^t M_i \right\| > \varepsilon \right) \leq (d_1+d_2) \exp \left( -\frac{3\varepsilon^2 t}{8\gamma^2} \right).</math>

Notice that in order to conclude that the deviation from 0 is bounded by {{math|''ε''}} with high probability, we need to choose a number of samples <math>t </math> proportional to the logarithm of <math> d_1+d_2 </math>. In general, unfortunately, a dependence on  <math> \log(\min(d_1,d_2)) </math> is inevitable: take for example a diagonal random sign matrix of dimension <math>d\times d </math>. The operator norm of the sum of ''t'' independent samples is precisely the maximum deviation among ''d'' independent random walks of length ''t''. In order to achieve a fixed bound on the maximum deviation with constant probability, it is easy to see that ''t'' should grow logarithmically with ''d'' in this scenario.<ref>{{cite arXiv |last1=Magen |first1=A.|author1-link=Avner Magen |last2=Zouzias |first2=A. |year=2011 |title=Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication |class=cs.DM |eprint=1005.2724 }}</ref>

The following theorem can be obtained by assuming ''M'' has low rank, in order to avoid the dependency on the dimensions.

===Theorem without the dependency on the dimensions===
Let {{math|0 < ''ε'' < 1}} and ''M'' be a random symmetric real matrix with <math>\| \operatorname E[M] \| \leq 1 </math> and <math>\| M\| \leq \gamma </math> almost surely. Assume that each element on the support of ''M'' has at most rank ''r''. Set 
:<math> t = \Omega \left( \frac{\gamma\log (\gamma/\varepsilon^2)}{\varepsilon^2} \right).</math>
If <math> r \leq t </math> holds almost surely, then

:<math>\Pr\left(\left\| \frac{1}{t} \sum_{i=1}^t M_i - \operatorname E[M] \right\| > \varepsilon \right) \leq \frac{1}{\mathbf{poly}(t)}</math>

where {{math|''M''<sub>1</sub>, ..., ''M<sub>t</sub>''}} are i.i.d. copies of ''M''.