Editing Chebyshev's inequality (section)

==Extensions==
Several extensions of Chebyshev's inequality have been developed.

===Selberg's inequality===
Selberg derived a generalization to arbitrary intervals.<ref name=Selberg1940>{{cite journal |last=Selberg |first=Henrik L. |title=Zwei Ungleichungen zur Ergänzung des Tchebycheffschen Lemmas |trans-title=Two Inequalities Supplementing the Tchebycheff Lemma |journal=Skandinavisk Aktuarietidskrift (Scandinavian Actuarial Journal) |year=1940 |volume=1940 |issue=3–4 |pages=121–125 |doi=10.1080/03461238.1940.10404804 |language=de |issn=0346-1238 |oclc=610399869}}</ref> Suppose ''X'' is a random variable with mean ''μ'' and variance ''σ''<sup>''2''</sup>. Selberg's inequality states<ref name="Godwin55">{{Cite journal|last=Godwin|first=H. J.|date=September 1955|title=On Generalizations of Tchebychef's Inequality|url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1955.10501978|journal=Journal of the American Statistical Association|language=en|volume=50|issue=271|pages=923–945|doi=10.1080/01621459.1955.10501978|issn=0162-1459}}</ref> that if <math>\beta \geq \alpha \geq 0</math>,

: <math> \Pr( X \in [\mu - \alpha, \mu + \beta] ) \ge \begin{cases}\frac{ \alpha^2 }{\alpha^2 + \sigma^2} &\text{if } \alpha(\beta-\alpha) \geq 2\sigma^2 \\ \frac{4\alpha\beta - 4\sigma^2}{(\alpha + \beta)^2} &\text{if } 2\alpha\beta \geq 2\sigma^2 \geq \alpha(\beta - \alpha) \\ 0 & \sigma^2 \geq \alpha\beta\end{cases}  </math>

When <math>\alpha = \beta</math>, this reduces to Chebyshev's inequality. These are known to be the best possible bounds.<ref name=Conlon00>{{cite web |last1=Conlon |first1=J. |last2=Dulá |first2=J. H. |title=A geometric derivation and interpretation of Tchebyscheff's Inequality |url=http://www.people.vcu.edu/~jdula/WORKINGPAPERS/tcheby.pdf |access-date=2 October 2012}}</ref>

===Finite-dimensional vector===
{{main|Multidimensional Chebyshev's inequality}}

Chebyshev's inequality naturally extends to the multivariate setting, where one has ''n'' random variables {{mvar|X<sub>i</sub>}} with mean {{mvar|μ<sub>i</sub>}} and variance ''σ''<sub>i</sub><sup>2</sup>. Then the following inequality holds.

:<math>\Pr\left(\sum_{i=1}^n (X_i - \mu_i)^2 \ge k^2 \sum_{i=1}^n \sigma_i^2 \right) \le \frac{1}{k^2} </math>

This is known as the Birnbaum–Raymond–Zuckerman inequality after the authors who proved it for two dimensions.<ref name=Birnbaum1947>{{cite journal |last1=Birnbaum |first1=Z. W. |last2=Raymond |first2=J. |last3=Zuckerman |first3=H. S. |title=A Generalization of Tshebyshev's Inequality to Two Dimensions |journal=The Annals of Mathematical Statistics |issn=0003-4851 |year=1947 |volume=18 |issue=1 |pages=70–79 |doi=10.1214/aoms/1177730493 |mr=19849 |zbl=0032.03402 |url=http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177730493 |access-date=7 October 2012|doi-access=free }}</ref> This result can be rewritten in terms of [[Multivariate random variable|vectors]] {{math|''X'' {{=}} (''X''<sub>1</sub>, ''X''<sub>2</sub>, ...)}} with mean {{math|''μ'' {{=}} (''μ''<sub>1</sub>, ''μ''<sub>2</sub>, ...)}}, standard deviation ''σ'' = (''σ''<sub>1</sub>, ''σ''<sub>2</sub>, ...), in the Euclidean norm {{math|{{!!}} ⋅ {{!!}}}}.<ref name=Ferentinos1982>{{cite journal | last1 = Ferentinos | first1 = K | year = 1982 | title = On Tchebycheff type inequalities | journal = Trabajos Estadıst Investigacion Oper | volume = 33 | pages = 125–132 | doi = 10.1007/BF02888707 | s2cid = 123762564 }}</ref>

: <math> \Pr(\| X - \mu \| \ge k \| \sigma \|) \le \frac{ 1 } { k^2 }. </math>

One can also get a similar [[Multidimensional Chebyshev's inequality#Infinite dimensions|infinite-dimensional Chebyshev's inequality]]. A second related inequality has also been derived by Chen.<ref name=Chen2007>{{cite arXiv |author=Xinjia Chen |eprint=0707.0805v2 |title=A New Generalization of Chebyshev Inequality for Random Vectors |year=2007 |class=math.ST }}</ref> Let {{mvar|n}} be the [[dimension]] of the stochastic vector {{mvar|X}} and let {{math|E(''X'')}} be the mean of {{mvar|X}}. Let {{mvar|S}} be the [[covariance matrix]] and {{math|''k'' > 0}}. Then

: <math> \Pr \left( ( X - \operatorname{E}(X) )^T S^{-1} (X - \operatorname{E}(X)) < k  \right) \ge 1 - \frac{n}{k} </math>

where ''Y''<sup>T</sup> is the [[transpose]] of {{mvar|Y}}.
The inequality can be written in terms of the [[Mahalanobis distance]] as

: <math> \Pr \left( d^2_S(X,\operatorname{E}(X)) < k  \right) \ge 1 - \frac{n}{k} </math>

where the Mahalanobis distance based on S is defined by

: <math> d_S(x,y) =\sqrt{ (x -y)^T S^{-1} (x -y) } </math>

Navarro<ref name=Navarro2014>{{cite journal |author=Jorge Navarro |volume=91 |pages=1–5 |title=Can the bounds in the multivariate Chebyshev inequality be attained?   |journal=Statistics and Probability Letters  |year=2014 |doi=10.1016/j.spl.2014.03.028}}</ref> proved that these bounds are sharp, that is, they are the best possible bounds for that regions when we just know the mean and the covariance matrix of X.

Stellato et al.<ref name=":0">{{Cite journal|last1=Stellato|first1=Bartolomeo|last2=Parys|first2=Bart P. G. Van|last3=Goulart|first3=Paul J.|date=2016-05-31|title=Multivariate Chebyshev Inequality with Estimated Mean and Variance|journal=The American Statistician|volume=71|issue=2|pages=123–127|doi=10.1080/00031305.2016.1186559|issn=0003-1305|arxiv=1509.08398|s2cid=53407286}}</ref> showed that this multivariate version of the Chebyshev inequality can be easily derived analytically as a special case of Vandenberghe et al.<ref>{{Cite journal|last1=Vandenberghe|first1=L.|last2=Boyd|first2=S.|last3=Comanor|first3=K.|date=2007-01-01|title=Generalized Chebyshev Bounds via Semidefinite Programming|journal=SIAM Review|volume=49|issue=1|pages=52–64|doi=10.1137/S0036144504440543|issn=0036-1445|bibcode=2007SIAMR..49...52V|citeseerx=10.1.1.126.9105}}</ref> where the bound is computed by solving a [[Semidefinite programming|semidefinite program (SDP).]]

==== Known correlation ====

If the variables are independent this inequality can be sharpened.<ref name=Kotz2000>{{cite book |last1=Kotz |first1=Samuel  |author-link1=Samuel Kotz |last2=Balakrishnan |first2=N. |last3= Johnson |first3=Norman L. |author-link3=Norman Lloyd Johnson |title=Continuous Multivariate Distributions, Volume 1, Models and Applications |year=2000 |publisher=Houghton Mifflin |location=Boston [u.a.] |isbn=978-0-471-18387-7 |url=http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471183873.html |edition=2nd |access-date=7 October 2012}}</ref>

:<math>\Pr\left (\bigcap_{i = 1}^n \frac{|X_i - \mu_i|}{\sigma_i} \le k_i \right ) \ge \prod_{i=1}^n \left (1 - \frac{1}{k_i^2} \right)</math>

Berge derived an inequality for two correlated variables {{math|''X''<sub>1</sub>, ''X''<sub>2</sub>}}.<ref name=Berge1938>{{cite journal | last1 = Berge | first1 = P. O. | year = 1938 | title = A note on a form of Tchebycheff's theorem for two variables | journal = Biometrika | volume = 29 | issue = 3/4| pages = 405–406 | doi=10.2307/2332015| jstor = 2332015 }}</ref> Let {{mvar|ρ}} be the correlation coefficient between ''X''<sub>1</sub> and ''X''<sub>2</sub> and let ''σ''<sub>''i''</sub><sup>2</sup> be the variance of {{mvar|X<sub>i</sub>}}. Then

: <math> \Pr\left( \bigcap_{ i = 1}^2 \left[ \frac{ | X_i - \mu_i | } { \sigma_i }  < k \right] \right) \ge 1 - \frac{ 1 + \sqrt{ 1 - \rho^2 } } { k^2 }.</math>

This result can be sharpened to having different bounds for the two random variables<ref name=Lal1955>Lal D. N. (1955) A note on a form of Tchebycheff's inequality for two or more variables. [[Sankhya (journal)|Sankhya]] 15(3):317–320</ref> and having asymmetric bounds, as in Selberg's inequality.
<ref name=Isii1959>Isii K. (1959) On a method for generalizations of Tchebycheff's inequality. Ann Inst Stat Math 10: 65–88</ref>

Olkin and Pratt derived an inequality for {{mvar|n}} correlated variables.<ref name=Olkin1958>{{cite journal|last1=Olkin|first1=Ingram |author-link1=Ingram Olkin | last2=Pratt |first2=John W. |author-link2=John W. Pratt |title=A Multivariate Tchebycheff Inequality| journal=The Annals of Mathematical Statistics|year=1958|volume=29|issue=1|pages=226–234|doi=10.1214/aoms/1177706720|zbl=0085.35204 |mr=93865 |doi-access=free}}</ref>

: <math> \Pr\left(\bigcap_{i = 1 }^n \frac{|X_i - \mu_i|}{\sigma_i} < k_i \right) \ge 1 - \frac{1}{n^2} \left(\sqrt{u} + \sqrt{n-1} \sqrt{n \sum_i \frac 1 { k_i^2} - u} \right)^2 </math>

where the sum is taken over the ''n'' variables and

: <math> u = \sum_{i=1}^n \frac{1}{ k_i^2} + 2\sum_{i=1}^n \sum_{j<i} \frac{\rho_{ij}}{k_i k_j} </math>

where {{mvar|ρ<sub>ij</sub>}} is the correlation between {{mvar|X<sub>i</sub>}} and {{mvar|X<sub>j</sub>}}.

Olkin and Pratt's inequality was subsequently generalised by Godwin.<ref name=Godwin1964>Godwin H. J. (1964) Inequalities on distribution functions. New York, Hafner Pub. Co.</ref>

===Higher moments===

[[Michael Mitzenmacher|Mitzenmacher]] and [[Eli Upfal|Upfal]]<ref name=Mitzenmacher2005>{{cite book |last1=Mitzenmacher |first1=Michael |author-link1=Michael Mitzenmacher |last2=Upfal |first2=Eli |author-link2=Eli Upfal |title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis |date=January 2005 |publisher=Cambridge Univ. Press |location=Cambridge [u.a.] |isbn=978-0-521-83540-4 |url=http://www.cambridge.org/us/knowledge/isbn/item1171566/?site_locale=en_US |edition=Repr. |access-date=6 October 2012}}</ref> note that by applying Markov's inequality to the nonnegative variable <math>| X - \operatorname{E}(X) |^n</math>, one can get a family of tail bounds

:<math> \Pr\left(| X - \operatorname{E}(X) | \ge k \operatorname{E}(|X - \operatorname{E}(X) |^n )^{ \frac{1}{n} }\right) \le \frac{1 } {k^n}, \qquad k >0, n \geq 2.</math>

For ''n'' = 2 we obtain Chebyshev's inequality. For ''k'' ≥ 1, ''n'' > 4 and assuming that the ''n''<sup>th</sup> moment exists, this bound is tighter than Chebyshev's inequality.{{citation needed|date=November 2021}} This strategy, called the [[Method of moments (probability theory)|method of moments]], is often used to prove tail bounds.

===Exponential moment===
A related inequality sometimes known as the exponential Chebyshev's inequality<ref name=RassoulAgha2010>[http://www.math.utah.edu/~firas/Papers/rassoul-seppalainen-ldp.pdf  Section 2.1] {{webarchive |url=https://web.archive.org/web/20150430075226/http://www.math.utah.edu/~firas/Papers/rassoul-seppalainen-ldp.pdf |date=April 30, 2015 }}</ref> is the inequality

:<math> \Pr(X \ge \varepsilon) \le e^{ -t \varepsilon }\operatorname{E}\left (e^{ t X } \right), \qquad t > 0.</math>

Let {{math|''K''(''t'')}} be the [[cumulant generating function]],

: <math> K( t ) = \log \left(\operatorname{E}\left( e^{ t x } \right) \right). </math>

Taking the [[Legendre–Fenchel transformation]]{{clarify|reason=articles should be reasonably self contained, more explanation needed|date=May 2012}} of {{math|''K''(''t'')}} and using the exponential Chebyshev's inequality we have

: <math>-\log( \Pr (X \ge \varepsilon )) \ge \sup_t( t \varepsilon - K( t ) ). </math>

This inequality may be used to obtain exponential inequalities for unbounded variables.<ref name=Baranoski2001>{{cite journal |last1=Baranoski |first1=Gladimir V. G. |last2=Rokne |first2=Jon G. |last3=Xu |first3=Guangwu |title=Applying the exponential Chebyshev inequality to the nondeterministic computation of form factors |journal=Journal of Quantitative Spectroscopy and Radiative Transfer |date=15 May 2001 |volume=69 |issue=4 |pages=199–200 |doi=10.1016/S0022-4073(00)00095-9 |bibcode=2001JQSRT..69..447B}} (the references for this article are corrected by {{cite journal|last1=Baranoski |first1=Gladimir V. G. |last2=Rokne |first2=Jon G. |author3=Guangwu Xu |title=Corrigendum to: 'Applying the exponential Chebyshev inequality to the nondeterministic computation of form factors' |journal=Journal of Quantitative Spectroscopy and Radiative Transfer |date=15 January 2002 |volume=72 |issue=2 |pages=199–200 |doi=10.1016/S0022-4073(01)00171-6 |bibcode=2002JQSRT..72..199B|doi-access=free }})</ref>

===Bounded variables===
If P(''x'') has finite support based on the interval {{math|[''a'', ''b'']}}, let {{math|''M'' {{=}} max({{!}}''a''{{!}}, {{!}}''b''{{!}})}} where |''x''| is the [[absolute value]] of {{mvar|x}}. If the mean of P(''x'') is zero then for all {{math|''k'' > 0}}<ref name=Dufour2003>Dufour (2003) [http://www2.cirano.qc.ca/~dufourj/Web_Site/ResE/Dufour_1999_C_TS_Moments.pdf Properties of moments of random variables]</ref>

: <math>\frac{\operatorname{E}(|X|^r ) - k^r }{M^r} \le \Pr( | X |  \ge k ) \le \frac{\operatorname{E}(| X |^r ) }{ k^r }.</math>

The second of these inequalities with {{math|''r'' {{=}} 2}} is the Chebyshev bound. The first provides a lower bound for the value of P(''x'').