Editing Chebyshev's inequality

{{short description|Bound on probability of a random variable being far from its mean}}
{{For|the similarly named inequality involving series|Chebyshev's sum inequality}}

In [[probability theory]], '''Chebyshev's inequality''' (also called the '''Bienaymé–Chebyshev inequality''') provides an upper bound on the probability of deviation of a [[random variable]] (with finite variance) from its mean. More specifically, the probability that a random variable deviates from its mean by more than <math>k\sigma</math> is at most <math>1/k^2</math>, where <math>k</math> is any positive constant and <math>\sigma</math> is the [[standard deviation]] (the square root of the variance).

The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the [[weak law of large numbers]].

Its practical usage is similar to the [[68–95–99.7 rule]], which applies only to [[normal distribution]]s. Chebyshev's inequality is more general, stating that a minimum of just 75% of values must lie within two standard deviations of the mean and 88.88% within three standard deviations for a broad range of different [[probability distributions]].<ref name=Kvanli>{{cite book |last1=Kvanli |first1=Alan H. |last2=Pavur |first2=Robert J. |last3=Keeling |first3= Kellie B. |title=Concise Managerial Statistics |url=https://books.google.com/books?id=h6CQ1J0gwNgC&pg=PT95 |year=2006 |publisher=[[cEngage Learning]] |isbn=978-0-324-22388-0 |pages=81–82}}</ref><ref name=Chernick>{{cite book |last=Chernick |first=Michael R. |title=The Essentials of Biostatistics for Physicians, Nurses, and Clinicians |url=https://books.google.com/books?id=JP4azqd8ONEC&pg=PA50 |year=2011 |publisher=[[John Wiley & Sons]] |isbn=978-0-470-64185-9 |pages=49–50}}</ref>

The term ''Chebyshev's inequality'' may also refer to [[Markov's inequality]], especially in the context of analysis. They are closely related, and some authors refer to [[Markov's inequality]] as "Chebyshev's First Inequality," and the similar one referred to on this page as "Chebyshev's Second Inequality."

Chebyshev's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.<ref>{{Cite web |title=Error Term of Chebyshev inequality? |url=https://math.stackexchange.com/a/776424/352034 |access-date=2023-12-11 |website=Mathematics Stack Exchange |language=en}}</ref>

==History==
The theorem is named after Russian mathematician [[Pafnuty Chebyshev]], although it was first formulated by his friend and colleague [[Irénée-Jules Bienaymé]].<ref>{{cite book
 |title=The Art of Computer Programming: Fundamental Algorithms, Volume 1
 |edition=3rd
 |last1=Knuth
 |first1=Donald
 |author-link1=Donald Knuth
 |year=1997
 |publisher=Addison–Wesley
 |location=Reading, Massachusetts
 |isbn=978-0-201-89683-1
 |url=http://www-cs-faculty.stanford.edu/~uno/taocp.html
 |access-date=1 October 2012
 |ref=KnuthTAOCP1
 |archive-date=26 February 2009
 |archive-url=https://web.archive.org/web/20090226183954/http://www-cs-faculty.stanford.edu/~uno/taocp.html
 |url-status=dead
 }}</ref>{{rp|98}} The theorem was first proved by Bienaymé in 1853<ref name="Bienaymé1853">{{cite journal | last1 = Bienaymé | first1 = I.-J. | year = 1853 | title = Considérations àl'appui de la découverte de Laplace | journal = Comptes Rendus de l'Académie des Sciences | volume = 37 | pages = 309–324 }}</ref> and more generally proved by Chebyshev in 1867.<ref name=Chebyshev1867>{{cite journal|last=Tchebichef|first=P.|title=Des valeurs moyennes|journal=Journal de Mathématiques Pures et Appliquées|year=1867|volume=12|series=2|pages=177–184}}</ref><ref>{{Cite book| last = Routledge| first = Richard| title = Chebyshev's inequality| url = https://www.britannica.com/science/Chebyshevs-inequality|publisher = Encyclopedia Britannica}}</ref> His student [[Andrey Markov]] provided another proof in his 1884 Ph.D. thesis.<ref name=Markov1884>Markov A. (1884) On certain applications of algebraic continued fractions, Ph.D. thesis, St. Petersburg</ref>

==Statement==
Chebyshev's inequality is usually stated for [[random variable]]s, but can be generalized to a statement about [[measure theory|measure spaces]].

===Probabilistic statement===
Let ''X'' (integrable) be a [[random variable]] with finite non-zero [[variance]] ''σ''<sup>2</sup> (and thus finite [[expected value]] ''μ'').<ref>Feller, W., 1968. An introduction to probability theory and its applications, vol. 1. p227 (Wiley, New York).</ref> Then for any [[real number]] {{nowrap|''k'' > 0}},
: <math>
    \Pr(|X-\mu|\geq k\sigma) \leq \frac{1}{k^2}.
  </math>

Only the case <math>k > 1</math> is useful. When <math>k \leq 1</math> the right-hand side <math> \frac{1}{k^2} \geq 1 </math> and the inequality is trivial as all probabilities are&nbsp;≤&nbsp;1.

As an example, using <math>k = \sqrt{2}</math> shows that the probability values lie outside the interval <math>(\mu - \sqrt{2}\sigma, \mu + \sqrt{2}\sigma)</math> does not exceed <math>\frac{1}{2}</math>. Equivalently, it implies that the probability of values lying within the interval (i.e. its [[Coverage_probability|"coverage"]]) is ''at least'' <math>\frac{1}{2}</math>.

Because it can be applied to completely arbitrary distributions provided they have a known finite mean and variance, the inequality generally gives a poor bound compared to what might be deduced if more aspects are known about the distribution involved.

{|class="wikitable" style="background-color:#FFFFFF; text-align:center"
|-
! k
! Min. % within ''k'' standard<br />deviations of mean
! Max. % beyond ''k'' standard<br />deviations from mean
|-
|  1
|| 0%
|| 100%
|-
|  {{sqrt|2}}
|| 50%
|| 50%
|-
|  1.5
|| 55.55%
|| 44.44%
|-
|  2
|| 75%
|| 25%
|-
|  2{{sqrt|2}}
|| 87.5%
|| 12.5%
|-
|  3
|| 88.8888%
|| 11.1111%
|-
|  4
|| 93.75%
|| 6.25%
|-
|  5
|| 96%
|| 4%
|-
|  6
|| 97.2222%
|| 2.7778%
|-
|  7
|| 97.9592%
|| 2.0408%
|-
|  8
|| 98.4375%
|| 1.5625%
|-
|  9
|| 98.7654%
|| 1.2346%
|-
|  10
|| 99%
|| 1%
|}

===Measure-theoretic statement===
Let <math>(X,\,\Sigma,\,\mu)</math> be a [[measure space]], and let ''f'' be an [[extended real number line|extended real]]-valued [[measurable function]] defined on ''X''. Then for any real number <math>t > 0</math> and <math>0 < p < \infty</math>,

:<math>\mu(\{x\in X\,:\,\,|f(x)|\geq t\}) \leq {1\over t^p} \int_{X} |f|^p \, d\mu.</math>

More generally, if ''g'' is an extended real-valued measurable function, nonnegative and nondecreasing, with <math>g(t) \neq 0</math> then: {{citation needed|date=May 2012}}

:<math>\mu(\{x\in X\,:\,\,f(x)\geq t\}) \leq {1\over g(t)} \int_X g\circ f\, d\mu.</math>

This statement follows from the [[Markov inequality]], <math>
\mu(\{x\in X:|F(x)|\geq \varepsilon\}) \leq\frac1\varepsilon \int_X|F|d\mu
</math>, with <math>F=g\circ f</math> and <math>\varepsilon=g(t)</math>, since in this case <math>\mu(\{x\in X\,:\,\,g\circ f(x)\geq g(t)\}) \geq \mu(\{x\in X\,:\,\,f(x)\geq t\}) </math>.
The previous statement then follows by defining <math>g(x)</math> as <math>|x|^p</math> if <math>x\ge t</math> and <math>0</math> otherwise.

==Example==
Suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within <math>k=2</math> standard deviations of the mean) must be at least 75%, because there is no more than <math>1/k^2 = 1/4</math> chance to be outside that range, by Chebyshev's inequality. But if we additionally know that the distribution is [[normal distribution|normal]], we can say there is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

==Sharpness of bounds==
As shown in the example above, the theorem typically provides rather loose bounds. However, these bounds cannot in general (remaining true for arbitrary distributions) be improved upon. The bounds are sharp for the following example: for any ''k''&nbsp;≥&nbsp;1,
: <math>
    X = \begin{cases}
          -1, & \text{with probability }\;\;\frac{1}{2k^2} \\
\phantom{-}0, & \text{with probability }1 - \frac{1}{k^2} \\
          +1, & \text{with probability }\;\;\frac{1}{2k^2}
        \end{cases}
  </math>

For this distribution, the mean ''μ'' = 0 and the variance ''σ''<sup>2</sup> = {{sfrac|(−1)<sup>2</sup>|2''k''<sup>2</sup>}} + 0 + {{sfrac|1<sup>2</sup>|2''k''<sup>2</sup>}} = {{sfrac|1|''k''<sup>2</sup>}}, so the standard deviation ''σ'' = {{sfrac|1|''k'' }} and
: <math>
    \Pr(|X-\mu| \ge k\sigma) = \Pr(|X| \ge 1) = \frac{1}{k^2}.
  </math>
Chebyshev's inequality is an equality for precisely those distributions which are [[affine transformation]]s of this example.

==Proof==

[[Markov's inequality]] states that for any non-negative real-valued random variable ''Y'' and any positive number ''a'', we have <math>\Pr(|Y| \geq a) \leq \mathbb{E}[|Y|]/a</math>. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable <math>Y = (X - \mu)^2</math> with <math>a = (k \sigma)^2</math>:
:<math> \Pr(|X - \mu| \geq k\sigma) = \Pr((X - \mu)^2 \geq k^2\sigma^2) \leq \frac{\mathbb{E}[(X - \mu)^2]}{k^2\sigma^2} = \frac{\sigma^2}{k^2\sigma^2} = \frac{1}{k^2}. </math>

It can also be proved directly using [[conditional expectation]]:
:<math>\begin{align}
\sigma^2&=\mathbb{E}[(X-\mu)^2]\\[5pt]
&=\mathbb{E}[(X-\mu)^2\mid k\sigma\leq |X-\mu|]\Pr[k\sigma\leq|X-\mu|]+\mathbb{E}[(X-\mu)^2\mid k\sigma>|X-\mu|]\Pr[k\sigma>|X-\mu|] \\[5pt]
&\geq(k\sigma)^2\Pr[k\sigma\leq|X-\mu|]+0\cdot\Pr[k\sigma>|X-\mu|] \\[5pt]
&=k^2\sigma^2\Pr[k\sigma\leq|X-\mu|]
\end{align}</math>
Chebyshev's inequality then follows by dividing by ''k''<sup>2</sup>''&sigma;''<sup>2</sup>.
This proof also shows why the bounds are quite loose in typical cases: the conditional expectation on the event where |''X''&nbsp;−&nbsp;''&mu;''|&nbsp;<&nbsp;''k&sigma;'' is thrown away, and the lower bound of ''k''<sup>2</sup>''&sigma;''<sup>2</sup> on the event |''X''&nbsp;−&nbsp;''&mu;''|&nbsp;&ge;&nbsp;''k&sigma;'' can be quite poor.

Chebyshev's inequality can also be obtained directly from a simple comparison of areas, starting from the representation of an expected value as the difference of two improper Riemann integrals ([[Expected value#EX as difference of integrals|last formula]] in the [[Expected value#Arbitrary real-valued random variables|definition of expected value for arbitrary real-valued random variables]]).<ref>{{cite book |last1=Uhl |first1=Roland |title=Charakterisierung des Erwartungswertes am Graphen der Verteilungsfunktion |trans-title=Characterization of the expected value on the graph of the cumulative distribution function |date=2023 |publisher=Technische Hochschule Brandenburg |doi=10.25933/opus4-2986 |doi-access=free |url=https://opus4.kobv.de/opus4-fhbrb/files/2986/Uhl2023.pdf}} p.&nbsp;5.</ref>

==Extensions==
Several extensions of Chebyshev's inequality have been developed.

===Selberg's inequality===
Selberg derived a generalization to arbitrary intervals.<ref name=Selberg1940>{{cite journal |last=Selberg |first=Henrik L. |title=Zwei Ungleichungen zur Ergänzung des Tchebycheffschen Lemmas |trans-title=Two Inequalities Supplementing the Tchebycheff Lemma |journal=Skandinavisk Aktuarietidskrift (Scandinavian Actuarial Journal) |year=1940 |volume=1940 |issue=3–4 |pages=121–125 |doi=10.1080/03461238.1940.10404804 |language=de |issn=0346-1238 |oclc=610399869}}</ref> Suppose ''X'' is a random variable with mean ''μ'' and variance ''σ''<sup>''2''</sup>. Selberg's inequality states<ref name="Godwin55">{{Cite journal|last=Godwin|first=H. J.|date=September 1955|title=On Generalizations of Tchebychef's Inequality|url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1955.10501978|journal=Journal of the American Statistical Association|language=en|volume=50|issue=271|pages=923–945|doi=10.1080/01621459.1955.10501978|issn=0162-1459}}</ref> that if <math>\beta \geq \alpha \geq 0</math>,

: <math> \Pr( X \in [\mu - \alpha, \mu + \beta] ) \ge \begin{cases}\frac{ \alpha^2 }{\alpha^2 + \sigma^2} &\text{if } \alpha(\beta-\alpha) \geq 2\sigma^2 \\ \frac{4\alpha\beta - 4\sigma^2}{(\alpha + \beta)^2} &\text{if } 2\alpha\beta \geq 2\sigma^2 \geq \alpha(\beta - \alpha) \\ 0 & \sigma^2 \geq \alpha\beta\end{cases}  </math>

When <math>\alpha = \beta</math>, this reduces to Chebyshev's inequality. These are known to be the best possible bounds.<ref name=Conlon00>{{cite web |last1=Conlon |first1=J. |last2=Dulá |first2=J. H. |title=A geometric derivation and interpretation of Tchebyscheff's Inequality |url=http://www.people.vcu.edu/~jdula/WORKINGPAPERS/tcheby.pdf |access-date=2 October 2012}}</ref>

===Finite-dimensional vector===
{{main|Multidimensional Chebyshev's inequality}}

Chebyshev's inequality naturally extends to the multivariate setting, where one has ''n'' random variables {{mvar|X<sub>i</sub>}} with mean {{mvar|μ<sub>i</sub>}} and variance ''σ''<sub>i</sub><sup>2</sup>. Then the following inequality holds.

:<math>\Pr\left(\sum_{i=1}^n (X_i - \mu_i)^2 \ge k^2 \sum_{i=1}^n \sigma_i^2 \right) \le \frac{1}{k^2} </math>

This is known as the Birnbaum–Raymond–Zuckerman inequality after the authors who proved it for two dimensions.<ref name=Birnbaum1947>{{cite journal |last1=Birnbaum |first1=Z. W. |last2=Raymond |first2=J. |last3=Zuckerman |first3=H. S. |title=A Generalization of Tshebyshev's Inequality to Two Dimensions |journal=The Annals of Mathematical Statistics |issn=0003-4851 |year=1947 |volume=18 |issue=1 |pages=70–79 |doi=10.1214/aoms/1177730493 |mr=19849 |zbl=0032.03402 |url=http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177730493 |access-date=7 October 2012|doi-access=free }}</ref> This result can be rewritten in terms of [[Multivariate random variable|vectors]] {{math|''X'' {{=}} (''X''<sub>1</sub>, ''X''<sub>2</sub>, ...)}} with mean {{math|''μ'' {{=}} (''μ''<sub>1</sub>, ''μ''<sub>2</sub>, ...)}}, standard deviation ''σ'' = (''σ''<sub>1</sub>, ''σ''<sub>2</sub>, ...), in the Euclidean norm {{math|{{!!}} ⋅ {{!!}}}}.<ref name=Ferentinos1982>{{cite journal | last1 = Ferentinos | first1 = K | year = 1982 | title = On Tchebycheff type inequalities | journal = Trabajos Estadıst Investigacion Oper | volume = 33 | pages = 125–132 | doi = 10.1007/BF02888707 | s2cid = 123762564 }}</ref>

: <math> \Pr(\| X - \mu \| \ge k \| \sigma \|) \le \frac{ 1 } { k^2 }. </math>

One can also get a similar [[Multidimensional Chebyshev's inequality#Infinite dimensions|infinite-dimensional Chebyshev's inequality]]. A second related inequality has also been derived by Chen.<ref name=Chen2007>{{cite arXiv |author=Xinjia Chen |eprint=0707.0805v2 |title=A New Generalization of Chebyshev Inequality for Random Vectors |year=2007 |class=math.ST }}</ref> Let {{mvar|n}} be the [[dimension]] of the stochastic vector {{mvar|X}} and let {{math|E(''X'')}} be the mean of {{mvar|X}}. Let {{mvar|S}} be the [[covariance matrix]] and {{math|''k'' > 0}}. Then

: <math> \Pr \left( ( X - \operatorname{E}(X) )^T S^{-1} (X - \operatorname{E}(X)) < k  \right) \ge 1 - \frac{n}{k} </math>

where ''Y''<sup>T</sup> is the [[transpose]] of {{mvar|Y}}.
The inequality can be written in terms of the [[Mahalanobis distance]] as

: <math> \Pr \left( d^2_S(X,\operatorname{E}(X)) < k  \right) \ge 1 - \frac{n}{k} </math>

where the Mahalanobis distance based on S is defined by

: <math> d_S(x,y) =\sqrt{ (x -y)^T S^{-1} (x -y) } </math>

Navarro<ref name=Navarro2014>{{cite journal |author=Jorge Navarro |volume=91 |pages=1–5 |title=Can the bounds in the multivariate Chebyshev inequality be attained?   |journal=Statistics and Probability Letters  |year=2014 |doi=10.1016/j.spl.2014.03.028}}</ref> proved that these bounds are sharp, that is, they are the best possible bounds for that regions when we just know the mean and the covariance matrix of X.

Stellato et al.<ref name=":0">{{Cite journal|last1=Stellato|first1=Bartolomeo|last2=Parys|first2=Bart P. G. Van|last3=Goulart|first3=Paul J.|date=2016-05-31|title=Multivariate Chebyshev Inequality with Estimated Mean and Variance|journal=The American Statistician|volume=71|issue=2|pages=123–127|doi=10.1080/00031305.2016.1186559|issn=0003-1305|arxiv=1509.08398|s2cid=53407286}}</ref> showed that this multivariate version of the Chebyshev inequality can be easily derived analytically as a special case of Vandenberghe et al.<ref>{{Cite journal|last1=Vandenberghe|first1=L.|last2=Boyd|first2=S.|last3=Comanor|first3=K.|date=2007-01-01|title=Generalized Chebyshev Bounds via Semidefinite Programming|journal=SIAM Review|volume=49|issue=1|pages=52–64|doi=10.1137/S0036144504440543|issn=0036-1445|bibcode=2007SIAMR..49...52V|citeseerx=10.1.1.126.9105}}</ref> where the bound is computed by solving a [[Semidefinite programming|semidefinite program (SDP).]]

==== Known correlation ====

If the variables are independent this inequality can be sharpened.<ref name=Kotz2000>{{cite book |last1=Kotz |first1=Samuel  |author-link1=Samuel Kotz |last2=Balakrishnan |first2=N. |last3= Johnson |first3=Norman L. |author-link3=Norman Lloyd Johnson |title=Continuous Multivariate Distributions, Volume 1, Models and Applications |year=2000 |publisher=Houghton Mifflin |location=Boston [u.a.] |isbn=978-0-471-18387-7 |url=http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471183873.html |edition=2nd |access-date=7 October 2012}}</ref>

:<math>\Pr\left (\bigcap_{i = 1}^n \frac{|X_i - \mu_i|}{\sigma_i} \le k_i \right ) \ge \prod_{i=1}^n \left (1 - \frac{1}{k_i^2} \right)</math>

Berge derived an inequality for two correlated variables {{math|''X''<sub>1</sub>, ''X''<sub>2</sub>}}.<ref name=Berge1938>{{cite journal | last1 = Berge | first1 = P. O. | year = 1938 | title = A note on a form of Tchebycheff's theorem for two variables | journal = Biometrika | volume = 29 | issue = 3/4| pages = 405–406 | doi=10.2307/2332015| jstor = 2332015 }}</ref> Let {{mvar|ρ}} be the correlation coefficient between ''X''<sub>1</sub> and ''X''<sub>2</sub> and let ''σ''<sub>''i''</sub><sup>2</sup> be the variance of {{mvar|X<sub>i</sub>}}. Then

: <math> \Pr\left( \bigcap_{ i = 1}^2 \left[ \frac{ | X_i - \mu_i | } { \sigma_i }  < k \right] \right) \ge 1 - \frac{ 1 + \sqrt{ 1 - \rho^2 } } { k^2 }.</math>

This result can be sharpened to having different bounds for the two random variables<ref name=Lal1955>Lal D. N. (1955) A note on a form of Tchebycheff's inequality for two or more variables. [[Sankhya (journal)|Sankhya]] 15(3):317–320</ref> and having asymmetric bounds, as in Selberg's inequality.
<ref name=Isii1959>Isii K. (1959) On a method for generalizations of Tchebycheff's inequality. Ann Inst Stat Math 10: 65–88</ref>

Olkin and Pratt derived an inequality for {{mvar|n}} correlated variables.<ref name=Olkin1958>{{cite journal|last1=Olkin|first1=Ingram |author-link1=Ingram Olkin | last2=Pratt |first2=John W. |author-link2=John W. Pratt |title=A Multivariate Tchebycheff Inequality| journal=The Annals of Mathematical Statistics|year=1958|volume=29|issue=1|pages=226–234|doi=10.1214/aoms/1177706720|zbl=0085.35204 |mr=93865 |doi-access=free}}</ref>

: <math> \Pr\left(\bigcap_{i = 1 }^n \frac{|X_i - \mu_i|}{\sigma_i} < k_i \right) \ge 1 - \frac{1}{n^2} \left(\sqrt{u} + \sqrt{n-1} \sqrt{n \sum_i \frac 1 { k_i^2} - u} \right)^2 </math>

where the sum is taken over the ''n'' variables and

: <math> u = \sum_{i=1}^n \frac{1}{ k_i^2} + 2\sum_{i=1}^n \sum_{j<i} \frac{\rho_{ij}}{k_i k_j} </math>

where {{mvar|ρ<sub>ij</sub>}} is the correlation between {{mvar|X<sub>i</sub>}} and {{mvar|X<sub>j</sub>}}.

Olkin and Pratt's inequality was subsequently generalised by Godwin.<ref name=Godwin1964>Godwin H. J. (1964) Inequalities on distribution functions. New York, Hafner Pub. Co.</ref>

===Higher moments===

[[Michael Mitzenmacher|Mitzenmacher]] and [[Eli Upfal|Upfal]]<ref name=Mitzenmacher2005>{{cite book |last1=Mitzenmacher |first1=Michael |author-link1=Michael Mitzenmacher |last2=Upfal |first2=Eli |author-link2=Eli Upfal |title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis |date=January 2005 |publisher=Cambridge Univ. Press |location=Cambridge [u.a.] |isbn=978-0-521-83540-4 |url=http://www.cambridge.org/us/knowledge/isbn/item1171566/?site_locale=en_US |edition=Repr. |access-date=6 October 2012}}</ref> note that by applying Markov's inequality to the nonnegative variable <math>| X - \operatorname{E}(X) |^n</math>, one can get a family of tail bounds

:<math> \Pr\left(| X - \operatorname{E}(X) | \ge k \operatorname{E}(|X - \operatorname{E}(X) |^n )^{ \frac{1}{n} }\right) \le \frac{1 } {k^n}, \qquad k >0, n \geq 2.</math>

For ''n'' = 2 we obtain Chebyshev's inequality. For ''k'' ≥ 1, ''n'' > 4 and assuming that the ''n''<sup>th</sup> moment exists, this bound is tighter than Chebyshev's inequality.{{citation needed|date=November 2021}} This strategy, called the [[Method of moments (probability theory)|method of moments]], is often used to prove tail bounds.

===Exponential moment===
A related inequality sometimes known as the exponential Chebyshev's inequality<ref name=RassoulAgha2010>[http://www.math.utah.edu/~firas/Papers/rassoul-seppalainen-ldp.pdf  Section 2.1] {{webarchive |url=https://web.archive.org/web/20150430075226/http://www.math.utah.edu/~firas/Papers/rassoul-seppalainen-ldp.pdf |date=April 30, 2015 }}</ref> is the inequality

:<math> \Pr(X \ge \varepsilon) \le e^{ -t \varepsilon }\operatorname{E}\left (e^{ t X } \right), \qquad t > 0.</math>

Let {{math|''K''(''t'')}} be the [[cumulant generating function]],

: <math> K( t ) = \log \left(\operatorname{E}\left( e^{ t x } \right) \right). </math>

Taking the [[Legendre–Fenchel transformation]]{{clarify|reason=articles should be reasonably self contained, more explanation needed|date=May 2012}} of {{math|''K''(''t'')}} and using the exponential Chebyshev's inequality we have

: <math>-\log( \Pr (X \ge \varepsilon )) \ge \sup_t( t \varepsilon - K( t ) ). </math>

This inequality may be used to obtain exponential inequalities for unbounded variables.<ref name=Baranoski2001>{{cite journal |last1=Baranoski |first1=Gladimir V. G. |last2=Rokne |first2=Jon G. |last3=Xu |first3=Guangwu |title=Applying the exponential Chebyshev inequality to the nondeterministic computation of form factors |journal=Journal of Quantitative Spectroscopy and Radiative Transfer |date=15 May 2001 |volume=69 |issue=4 |pages=199–200 |doi=10.1016/S0022-4073(00)00095-9 |bibcode=2001JQSRT..69..447B}} (the references for this article are corrected by {{cite journal|last1=Baranoski |first1=Gladimir V. G. |last2=Rokne |first2=Jon G. |author3=Guangwu Xu |title=Corrigendum to: 'Applying the exponential Chebyshev inequality to the nondeterministic computation of form factors' |journal=Journal of Quantitative Spectroscopy and Radiative Transfer |date=15 January 2002 |volume=72 |issue=2 |pages=199–200 |doi=10.1016/S0022-4073(01)00171-6 |bibcode=2002JQSRT..72..199B|doi-access=free }})</ref>

===Bounded variables===
If P(''x'') has finite support based on the interval {{math|[''a'', ''b'']}}, let {{math|''M'' {{=}} max({{!}}''a''{{!}}, {{!}}''b''{{!}})}} where |''x''| is the [[absolute value]] of {{mvar|x}}. If the mean of P(''x'') is zero then for all {{math|''k'' > 0}}<ref name=Dufour2003>Dufour (2003) [http://www2.cirano.qc.ca/~dufourj/Web_Site/ResE/Dufour_1999_C_TS_Moments.pdf Properties of moments of random variables]</ref>

: <math>\frac{\operatorname{E}(|X|^r ) - k^r }{M^r} \le \Pr( | X |  \ge k ) \le \frac{\operatorname{E}(| X |^r ) }{ k^r }.</math>

The second of these inequalities with {{math|''r'' {{=}} 2}} is the Chebyshev bound. The first provides a lower bound for the value of P(''x'').

==Finite samples==

=== Univariate case ===
Saw ''et al'' extended Chebyshev's inequality to cases where the population mean and variance are not known and may not exist, but the sample mean and sample standard deviation from ''N'' samples are to be employed to bound the expected value of a new drawing from the same distribution.<ref name=":1">{{cite journal
  |title = Chebyshev Inequality with Estimated Mean and Variance
  |last1 = Saw
  |first1 = John G.
  |last2 = Yang
  |first2 = Mark C. K.
  |last3 = Mo
  |first3 = Tse Chin
  |journal = [[The American Statistician]]
  |issn = 0003-1305
  |volume = 38
  |issue = 2
  |year = 1984
  |pages = 130–2
  |doi = 10.2307/2683249
  |jstor = 2683249
  }}</ref> The following simpler version of this inequality is given by Kabán.<ref name="Kabán2011">{{cite journal
  |last = Kabán
  |first = Ata
  |title = Non-parametric detection of meaningless distances in high dimensional data
  |journal = [[Statistics and Computing]]
  |volume = 22
  |issue = 2
  |pages = 375–85
  |year = 2012
  |doi = 10.1007/s11222-011-9229-0
|s2cid = 6018114
 }}</ref>

: <math>\Pr( | X - m | \ge ks ) \le \frac 1 {N + 1} \left\lfloor \frac {N+1} N \left(\frac{N - 1}{k^2} + 1 \right) \right\rfloor</math>

where ''X'' is a random variable which we have sampled ''N'' times, ''m'' is the sample mean, ''k'' is a constant and ''s'' is the sample standard deviation.

This inequality holds even when the population moments do not exist, and when the sample is only [[Exchangeable random variables|weakly exchangeably]] distributed; this criterion is met for randomised sampling. A table of values for the Saw–Yang–Mo inequality for finite sample sizes (''N'' < 100) has been determined by Konijn.<ref name=Konijn1987>{{cite journal |last=Konijn |first=Hendrik S. |title=Distribution-Free and Other Prediction Intervals |journal=[[The American Statistician]] |date=February 1987 |volume=41 |issue=1 |pages=11–15 |jstor=2684311 |doi=10.2307/2684311  }}</ref> The table allows the calculation of various confidence intervals for the mean, based on multiples, C, of the standard error of the mean as calculated from the sample. For example, Konijn shows that for ''N''&nbsp;=&nbsp;59, the 95 percent confidence interval for the mean ''m'' is {{nowrap|(''m'' − ''Cs'', ''m'' + ''Cs'')}} where {{nowrap|1=''C'' = 4.447 × 1.006 = 4.47}} (this is 2.28 times larger than the value found on the assumption of normality showing the loss on precision resulting from ignorance of the precise nature of the distribution).

An equivalent inequality can be derived in terms of the sample mean instead,<ref name="Kabán2011" />

: <math>\Pr( | X - m | \ge km ) \le \frac{N - 1} N \frac 1 {k^2} \frac{s^2}{m^2} + \frac 1 N.</math>

A table of values for the Saw–Yang–Mo inequality for finite sample sizes (''N'' < 100) has been determined by Konijn.<ref name="Konijn1987"/>

For fixed ''N'' and large ''m'' the Saw–Yang–Mo inequality is approximately<ref name=Beasley2004>{{cite journal |last1=Beasley |first1=T. Mark |last2=Page |first2=Grier P. |last3=Brand |first3=Jaap P. L. |last4=Gadbury |first4=Gary L. |last5=Mountz |first5=John D. |last6=Allison |first6=David B. |author-link6=David B. Allison |title=Chebyshev's inequality for nonparametric testing with small ''N'' and α in microarray research |journal=Journal of the Royal Statistical Society |issn=1467-9876 |date=January 2004 |volume=53 |series=C (Applied Statistics) |issue=1 |pages=95–108 |doi=10.1111/j.1467-9876.2004.00428.x |s2cid=122678278 |doi-access=free }}</ref>

: <math> \Pr( | X - m | \ge ks ) \le \frac 1 {N + 1}. </math>

Beasley ''et al'' have suggested a modification of this inequality<ref name=Beasley2004 />

: <math> \Pr( | X - m | \ge ks ) \le \frac 1 {k^2( N + 1 )}. </math>

In empirical testing this modification is conservative but appears to have low statistical power. Its theoretical basis currently remains unexplored.

====Dependence on sample size====
The bounds these inequalities give on a finite sample are less tight than those the Chebyshev inequality gives for a distribution. To illustrate this let the sample size ''N'' = 100 and let ''k'' = 3. Chebyshev's inequality states that at most approximately 11.11% of the distribution will lie at least three standard deviations away from the mean. Kabán's version of the inequality for a finite sample states that at most approximately 12.05% of the sample lies outside these limits. The dependence of the confidence intervals on sample size is further illustrated below.

For ''N'' = 10, the 95% confidence interval is approximately ±13.5789 standard deviations.

For ''N'' = 100 the 95% confidence interval is approximately ±4.9595 standard deviations; the 99% confidence interval is approximately ±140.0 standard deviations.

For ''N'' = 500 the 95% confidence interval is approximately ±4.5574 standard deviations; the 99% confidence interval is approximately ±11.1620 standard deviations.

For ''N'' = 1000 the 95% and 99% confidence intervals are approximately ±4.5141 and approximately ±10.5330 standard deviations respectively.

The Chebyshev inequality for the distribution gives 95% and 99% confidence intervals of approximately ±4.472 standard deviations and ±10 standard deviations respectively.

====Samuelson's inequality====
{{main|Samuelson's inequality}}
Although Chebyshev's inequality is the best possible bound for an arbitrary distribution, this is not necessarily true for finite samples. [[Samuelson's inequality]] states that all values of a sample must lie within  {{radic|''N''&nbsp;−&nbsp;1}} sample standard deviations of the mean.

By comparison, Chebyshev's inequality states that all but a ''1/N'' fraction of the sample will lie within {{radic|''N''}} standard deviations of the mean. Since there are ''N'' samples, this means that no samples will lie outside {{radic|''N''}} standard deviations of the mean, which is worse than Samuelson's inequality. However, the benefit of Chebyshev's inequality is that it can be applied more generally to get confidence bounds for ranges of standard deviations that do not depend on the number of samples.

====Semivariances====
An alternative method of obtaining sharper bounds is through the use of [[Variance#Semivariance|semivariance]]s (partial variances). The upper (''σ''<sub>+</sub><sup>2</sup>) and lower (''σ''<sub>−</sub><sup>2</sup>) semivariances are defined as

: <math> \sigma_+^2 = \frac { \sum_{x>m} (x - m)^2 } { n - 1 } ,</math>

: <math> \sigma_-^2 = \frac { \sum_{x<m} (m - x)^2 } { n - 1 }, </math>

where ''m'' is the arithmetic mean of the sample and ''n'' is the number of elements in the sample.

The variance of the sample is the sum of the two semivariances:

: <math> \sigma^2 = \sigma_+^2 + \sigma_-^2. </math>

In terms of the lower semivariance Chebyshev's inequality can be written<ref name=Berck1982>{{cite journal|author-link1=Peter Berck |last1=Berck |first1=Peter |last2=Hihn |first2=Jairus M. |title=Using the Semivariance to Estimate Safety-First Rules |journal=American Journal of Agricultural Economics |date=May 1982 |volume=64 |issue=2 |pages=298–300 |doi=10.2307/1241139 |issn=0002-9092|jstor=1241139 |doi-access= }}</ref>

: <math> \Pr(x \le m - a \sigma_-) \le \frac { 1 } { a^2 }.</math>

Putting

: <math> a = \frac{ k \sigma } { \sigma_- }. </math>

Chebyshev's inequality can now be written

: <math> \Pr(x \le m - k \sigma) \le \frac { 1 } { k^2 } \frac { \sigma_-^2 } { \sigma^2 }.</math>

A similar result can also be derived for the upper semivariance.

If we put

: <math> \sigma_u^2 = \max(\sigma_-^2, \sigma_+^2) , </math>

Chebyshev's inequality can be written

: <math> \Pr(| x \le m - k \sigma |) \le \frac 1 {k^2} \frac { \sigma_u^2 } { \sigma^2 } .</math>

Because ''σ''<sub>u</sub><sup>2</sup> ≤ ''σ''<sup>2</sup>, use of the semivariance sharpens the original inequality.

If the distribution is known to be symmetric, then

: <math> \sigma_+^2 = \sigma_-^2  = \frac{ 1 } { 2 } \sigma^2 </math>

and

: <math> \Pr(x \le m - k \sigma) \le \frac 1 {2k^2} .</math>

This result agrees with that derived using standardised variables.

;Note: The inequality with the lower semivariance has been found to be of use in estimating downside risk in finance and agriculture.<ref name="Berck1982"/><ref name=Nantell1979>{{cite journal |last1=Nantell |first1=Timothy J. |last2=Price |first2=Barbara |title=An Analytical Comparison of Variance and Semivariance Capital Market Theories |journal=[[The Journal of Financial and Quantitative Analysis]] |date=June 1979 |volume=14 |issue=2 |pages=221–42 |doi=10.2307/2330500 |jstor=2330500  |s2cid=154652959 }}</ref><ref name=Neave2008>{{cite journal
  |title = Distinguishing upside potential from downside risk
  |last1 = Neave
  |first1 = Edwin H.
  |last2 = Ross
  |first2 = Michael N.
  |last3 = Yang
  |first3 = Jun
  |journal = [[Management Research News]]
  |issn = 0140-9174
  |year = 2009
  |volume = 32
  |issue = 1
  |pages = 26–36
  |doi = 10.1108/01409170910922005
  }}</ref>

=== Multivariate case ===
Stellato et al.<ref name=":0" /> simplified the notation and extended the empirical Chebyshev inequality from Saw et al.<ref name=":1" /> to the multivariate case. Let <math display="inline">\xi \in \mathbb{R}^{n_\xi}</math> be a random variable and let <math display="inline">N \in \mathbb{Z}_{\geq n_\xi}</math>. We draw <math display="inline">N+1</math> iid samples of <math display="inline">\xi</math> denoted as <math display="inline">\xi^{(1)},\dots,\xi^{(N)},\xi^{(N+1)} \in \mathbb{R}^{n_\xi}</math>. Based on the first <math display="inline">N</math> samples, we define the empirical mean as <math display="inline">\mu_N = \frac 1 N \sum_{i=1}^N \xi^{(i)}</math> and the unbiased empirical covariance as <math display="inline">\Sigma_N = \frac 1 N \sum_{i=1}^N (\xi^{(i)} - \mu_{N})(\xi^{(i)} - \mu_N)^\top</math>. If <math>\Sigma_N</math> is nonsingular, then for all <math>\lambda \in \mathbb{R}_{\geq 0} </math> then

: <math>
\begin{align}
& P^{N+1} \left((\xi^{(N+1)} - \mu_N)^\top \Sigma_N^{-1}(\xi^{(N+1)} - \mu_N) \geq \lambda^2\right) \\[8pt]
\leq {} & \min\left\{1, \frac 1 {N+1} \left\lfloor \frac{n_\xi(N+1)(N^2 - 1 + N\lambda^2)}{N^2\lambda^2}\right\rfloor\right\}.
\end{align}
</math>

==== Remarks ====
In the univariate case, i.e. <math display="inline">n_\xi = 1</math>, this inequality corresponds to the one from Saw et al.<ref name=":1" />  Moreover, the right-hand side can be simplified by upper bounding the floor function by its argument

: <math>P^{N+1}\left((\xi^{(N+1)} - \mu_N)^\top \Sigma_N^{-1}(\xi^{(N+1)} - \mu_N) \geq \lambda^2\right) \leq \min\left\{1,  \frac{n_\xi(N^2 - 1 + N\lambda^2)}{N^2\lambda^2}\right\}.</math>

As <math display="inline">N \to \infty</math>, the right-hand side tends to <math display="inline">\min \left\{1, \frac{n_\xi}{\lambda^2}\right\}</math> which corresponds to the [[#Vector version|multivariate Chebyshev inequality]] over ellipsoids shaped according to <math display="inline">\Sigma</math> and centered in <math display="inline">\mu</math>.

==Sharpened bounds==
Chebyshev's inequality is important because of its applicability to any distribution. As a result of its generality it may not (and usually does not) provide as sharp a bound as alternative methods that can be used if the distribution of the random variable is known. To improve the sharpness of the bounds provided by Chebyshev's inequality a number of methods have been developed; for a review see eg.<ref name="Godwin55"/><ref>[http://nvlpubs.nist.gov/nistpubs/jres/65B/jresv65Bn3p211_A1b.pdf Savage, I. Richard. "Probability inequalities of the Tchebycheff type." Journal of Research of the National Bureau of Standards-B. Mathematics and Mathematical Physics B 65 (1961): 211-222]</ref>

===Cantelli's inequality===
[[Cantelli's inequality]]<ref name=Cantelli1910>Cantelli F. (1910) Intorno ad un teorema fondamentale della teoria del rischio. Bolletino dell Associazione degli Attuari Italiani</ref> due to [[Francesco Paolo Cantelli]] states that for a real random variable (''X'') with mean (''μ'') and variance (''σ''<sup>2</sup>)

: <math> \Pr(X - \mu \ge a) \le \frac{\sigma^2}{ \sigma^2 + a^2 } </math>

where ''a'' ≥ 0.

This inequality can be used to prove a one tailed variant of Chebyshev's inequality with ''k'' > 0<ref name=Grimmett00>Grimmett and Stirzaker, problem 7.11.9. Several proofs of this result can be found in [http://www.mcdowella.demon.co.uk/Chebyshev.html Chebyshev's Inequalities] {{Webarchive|url=https://web.archive.org/web/20190224000121/http://www.mcdowella.demon.co.uk/Chebyshev.html |date=2019-02-24 }} by A. G. McDowell.</ref>

:<math> \Pr(X - \mu \geq k \sigma) \leq \frac{ 1 }{ 1 + k^2 }. </math>

The bound on the one tailed variant is known to be sharp. To see this consider the random variable ''X'' that takes the values

: <math> X = 1 </math> with probability <math> \frac{ \sigma^2 } { 1 + \sigma^2 }</math>
: <math> X = - \sigma^2 </math> with probability <math> \frac{ 1 } { 1 + \sigma^2 }.</math>

Then E(''X'') = 0 and E(''X''<sup>2</sup>) = ''σ''<sup>2</sup> and P(''X'' < 1) = 1 / (1 + ''σ''<sup>2</sup>).

==== An application: distance between the mean and the median ====
<!-- This section is linked from [[median]] and [[exponential distribution]]. -->

The one-sided variant can be used to prove the proposition that for [[probability distribution]]s having an [[expected value]] and a [[median]], the mean and the median can never differ from each other by more than one [[standard deviation]].  To express this in symbols let ''μ'', ''ν'', and ''σ'' be respectively the mean, the median, and the standard deviation. Then

:<math> \left | \mu - \nu \right | \leq \sigma. </math>

There is no need to assume that the variance is finite because this inequality is trivially true if the variance is infinite.

The proof is as follows. Setting ''k''&nbsp;=&nbsp;1 in the statement for the one-sided inequality gives:

:<math>\Pr(X - \mu \geq \sigma) \leq \frac{ 1 }{ 2 } \implies \Pr(X \geq \mu + \sigma) \leq \frac{ 1 }{ 2 }. </math>

Changing the sign of ''X'' and of ''μ'', we get

:<math>\Pr(X \leq \mu - \sigma) \leq \frac{ 1 }{ 2 }. </math>

As the median is by definition any real number&nbsp;''m'' that satisfies the inequalities

:<math>\Pr(X\leq m) \geq \frac{1}{2}\text{ and }\Pr(X\geq m) \geq \frac{1}{2}</math>

this implies that the median lies within one standard deviation of the mean. A proof using Jensen's inequality also [[Median#Inequality relating means and medians|exists]].

===Bhattacharyya's inequality===
Bhattacharyya<ref name=Bhattacharyya1987>{{cite journal |last=Bhattacharyya |first=B. B. |title=One-sided chebyshev inequality when the first four moments are known |journal=Communications in Statistics – Theory and Methods |year=1987 |volume=16 |issue=9 |pages=2789–91 |doi=10.1080/03610928708829540 |issn=0361-0926}}</ref> extended Cantelli's inequality using the third and fourth moments of the distribution.

Let <math>\mu = 0</math> and <math>\sigma^2</math> be the variance. Let <math>\gamma = E[X^3] / \sigma^3</math> and <math>\kappa = E[X^4]/\sigma^4</math>.

If <math>k^2 - k \gamma - 1 > 0</math> then

:<math> \Pr(X > k\sigma) \le \frac{ \kappa - \gamma^2 - 1 }{ (\kappa - \gamma^2 - 1) (1 + k^2) + (k^2 - k\gamma - 1) }.</math>

The necessity of <math>k^2 - k \gamma - 1 > 0</math> may require <math>k</math> to be reasonably large.

In the case <math>E[X^3]=0</math> this simplifies to
:<math>\Pr(X > k\sigma) \le \frac{\kappa-1}{\kappa \left(k^2+1\right)-2}
\quad \text{for } k > 1.
</math>
Since <math>\frac{\kappa-1}{\kappa \left(k^2+1\right)-2} = \frac{1}{2}-\frac{\kappa (k-1)}{2 (\kappa-1)}+O\left((k-1)^2\right)</math> for <math>k</math> close to 1, this bound improves slightly over Cantelli's bound <math>\frac{1}{2}-\frac{k-1}{2}+O\left((k-1)^2\right)</math> as <math>\kappa > 1</math>.

wins a factor 2 over Chebyshev's inequality.

===Gauss's inequality===
{{main|Gauss's inequality}}

In 1823 [[Gauss]] showed that for a [[unimodal distribution|distribution with a unique mode]] at zero,<ref name=Gauss1823>Gauss C. F. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Pars Prior. Pars Posterior. Supplementum. Theory of the Combination of Observations Least Subject to Errors. Part One. Part Two. Supplement. 1995. Translated by G. W. Stewart. Classics in Applied Mathematics Series, Society for Industrial and Applied Mathematics, Philadelphia</ref>

: <math> \Pr( | X | \ge k ) \le  \frac{ 4 \operatorname{ E }( X^2 ) } { 9k^2 } \quad\text{if} \quad k^2 \ge \frac{ 4 } { 3 } \operatorname{E} (X^2) ,</math>

: <math> \Pr( | X | \ge k ) \le  1 - \frac{ k } { \sqrt{3} \operatorname{ E }( X^2 ) } \quad \text{if} \quad k^2 \le \frac{ 4 } { 3 } \operatorname{ E }( X^2 ). </math>

===Vysochanskij–Petunin inequality===
{{main|Vysochanskij–Petunin inequality}}

The Vysochanskij–Petunin inequality generalizes Gauss's inequality, which only holds for deviation from the mode of a unimodal distribution, to deviation from the mean, or more generally, any center.<ref name="Pukelsheim94">{{Cite journal|last=Pukelsheim|first=Friedrich|date=May 1994|title=The Three Sigma Rule|url=http://www.tandfonline.com/doi/abs/10.1080/00031305.1994.10476030|journal=The American Statistician|language=en|volume=48|issue=2|pages=88–91|doi=10.1080/00031305.1994.10476030|s2cid=122587510 |issn=0003-1305}}</ref> If ''X'' is a [[unimodal distribution]] with mean ''μ'' and variance ''σ''<sup>2</sup>, then the inequality states that

: <math> \Pr( | X - \mu | \ge k \sigma ) \le \frac{ 4 }{ 9k^2 } \quad \text{if} \quad k \ge \sqrt{8/3} = 1.633.</math> 

: <math> \Pr( | X - \mu | \ge k \sigma ) \le \frac{ 4 }{ 3k^2 } - \frac13 \quad \text{if} \quad k \le \sqrt{8/3}.</math>

For symmetrical unimodal distributions, the median and the mode are equal, so both the Vysochanskij–Petunin inequality and Gauss's inequality apply to the same center. Further, for symmetrical distributions, one-sided bounds can be obtained by noticing that

:<math> \Pr(  X - \mu  \ge k \sigma ) = \Pr(  X - \mu  \le -k \sigma ) = \frac{1}{2} \Pr(  |X - \mu|  \ge k \sigma ).</math>

The additional fraction of <math>4/9</math> present in these tail bounds lead to better confidence intervals than Chebyshev's inequality. For example, for any symmetrical unimodal distribution, the Vysochanskij–Petunin inequality states that 4/(9 × 3^2) = 4/81 ≈ 4.9% of the distribution lies outside 3 standard deviations of the mode.

===Bounds for specific distributions===

DasGupta has shown that if the distribution is known to be normal<ref name=DasGupta2000>{{cite journal | last1 = DasGupta | first1 = A | year = 2000 | title = Best constants in Chebychev inequalities with various applications | journal = Metrika | volume = 5 | issue = 1| pages = 185–200 | doi = 10.1007/s184-000-8316-9 | s2cid = 121436601 }}</ref>

: <math> \Pr( | X - \mu | \ge k \sigma ) \le \frac{ 1 }{ 3 k^2 } .</math>

From DasGupta's inequality it follows that for a normal distribution at least 95% lies within approximately 2.582 standard deviations of the mean. This is less sharp than the true figure (approximately 1.96 standard deviations of the mean).

*DasGupta has determined a set of best possible bounds for a [[normal distribution]] for this inequality.<ref name=DasGupta2000 />
*Steliga and Szynal have extended these bounds to the [[Pareto distribution]].<ref name=Steliga2010>{{cite journal |last1=Steliga |first1=Katarzyna |last2=Szynal |first2=Dominik |title=On Markov-Type Inequalities |journal=International Journal of Pure and Applied Mathematics |year=2010 |volume=58 |issue=2 |pages=137–152 |url=http://ijpam.eu/contents/2010-58-2/2/2.pdf |access-date=10 October 2012 |issn=1311-8080}}</ref>
*Grechuk et al. developed a general method for deriving the best possible bounds in Chebyshev's inequality for any family of distributions, and any [[deviation risk measure]] in place of standard deviation. In particular, they derived Chebyshev inequality for distributions with [[Logarithmically concave function|log-concave]] densities.<ref name="cheb">Grechuk, B., Molyboha, A., Zabarankin, M. (2010).
[https://www.researchgate.net/publication/231939730_Chebyshev_inequalities_with_law-invariant_deviation_measures Chebyshev Inequalities with Law Invariant Deviation Measures], Probability in the Engineering and Informational Sciences, 24(1), 145-170.</ref>

==Related inequalities==
Several other related inequalities are also known.

===Paley–Zygmund inequality===
{{main|Paley–Zygmund inequality}}

The Paley–Zygmund inequality gives a lower bound on tail probabilities, as opposed to Chebyshev's inequality which gives an upper bound.<ref name=Godwin1964a>Godwin H. J. (1964) Inequalities on distribution functions. (Chapter 3) New York, Hafner Pub. Co.</ref> Applying it to the square of a random variable, we get

: <math> \Pr( | Z | > \theta \sqrt{E[Z^2]} ) \ge \frac{ ( 1 - \theta^2 )^2 E[Z^2]^2 }{E[Z^4]}.</math>

===Haldane's transformation===
One use of Chebyshev's inequality in applications is to create confidence intervals for variates with an unknown distribution. [[J. B. S. Haldane|Haldane]] noted,<ref name=Haldane1952>{{cite journal | last1 = Haldane | first1 = J. B.|author-link=J. B. S. Haldane | year = 1952 | title = Simple tests for bimodality and bitangentiality | journal = [[Annals of Eugenics]] | volume = 16 | issue = 4| pages = 359–364 | doi = 10.1111/j.1469-1809.1951.tb02488.x | pmid = 14953132}}</ref> using an equation derived by [[Maurice Kendall|Kendall]],<ref name=Kendall1943>Kendall M. G. (1943) The Advanced Theory of Statistics, 1. London</ref> that if a variate (''x'') has a zero mean, unit variance and both finite [[skewness]] (''γ'') and [[kurtosis]] (''κ'') then the variate can be converted to a normally distributed [[standard score]] (''z''):

: <math> z = x - \frac{\gamma}{6} (x^2 - 1) + \frac{ x }{ 72 } [ 2 \gamma^2 (4 x^2 - 7) - 3 \kappa (x^2 - 3) ] + \cdots </math>

This transformation may be useful as an alternative to Chebyshev's inequality or as an adjunct to it for deriving confidence intervals for variates with unknown distributions.

While this transformation may be useful for moderately skewed and/or kurtotic distributions, it performs poorly when the distribution is markedly skewed and/or kurtotic.

===He, Zhang and Zhang's inequality===
For any collection of {{mvar|n}} non-negative independent random variables {{mvar|X<sub>i</sub>}} with expectation 1 <ref name=He2010>{{cite journal
| last1=He | first1=Simai
| last2=Zhang | first2=Jiawei
| last3=Zhang | first3=Shuzhong
| s2cid=11298475
| date=2010
| title=Bounding probability of small deviation: a fourth moment approach
| journal=[[Mathematics of Operations Research]]
| volume=35
| issue=1
| pages=208–232
| doi=10.1287/moor.1090.0438}}</ref>

: <math> \Pr\left ( \frac{\sum_{i=1}^n X_i }{n} - 1 \ge \frac{1}{n} \right) \le \frac{ 7 }{ 8 }. </math>

==Integral Chebyshev inequality==

There is a second (less well known) inequality also named after Chebyshev<ref name=Fink1984>{{cite book |last1=Fink |first1=A. M. |last2=Jodeit |first2=Max Jr. |title= Inequalities in Statistics and Probability|isbn=978-0-940600-04-1 |mr=789242 |editor1-first=Y. L. |editor1-last=Tong |editor2-last=Gupta |editor2-first=Shanti S. |year=1984 |volume=5 |series=Institute of Mathematical Statistics Lecture Notes - Monograph Series |pages=115–120 |doi=10.1214/lnms/1215465637 |chapter-url=http://projecteuclid.org/euclid.lnms/1215465617 |access-date=7 October 2012|chapter=On Chebyshev's other inequality }}</ref>

If ''f'', ''g'' : [''a'', ''b''] → '''R''' are two [[monotonic]] [[function (mathematics)|function]]s of the same monotonicity, then

: <math> \frac{ 1 }{ b - a } \int_a^b \! f(x) g(x) \,dx \ge  \left[ \frac{ 1 }{ b - a } \int_a^b \! f(x) \,dx \right] \left[ \frac{ 1 }{ b - a } \int_a^b \! g(x) \,dx \right] .</math>

If ''f'' and ''g'' are of opposite monotonicity, then the above inequality works in the reverse way.

{{Math theorem|Let <math>f</math> and <math>g</math> be monotonic functions of the same monotonicity on <math>[a,b]</math>. Then for any <math>x, y \in [a,b]</math> we have
<math display="block">(f(x)-f(y))(g(x)-g(y)) \geq 0.</math>|name=Lemma}}

{{Math proof|Integrate this inequality with respect to <math>x</math> and <math>y</math> over <math>[a,b]</math>:
<math display="block">\int_a^b \int_a^b (f(x)-f(y))(g(x)-g(y)) \,dx\,dy \geq 0.</math>

Expanding the integrand gives:
<math display="block">\int_a^b \int_a^b \left[f(x)g(x) - f(x)g(y) - f(y)g(x) + f(y)g(y)\right] \,dx\,dy \geq 0.</math>

Separate the double integral into four parts:
<math display="block">\int_a^b \int_a^b f(x)g(x) \,dx\,dy - \int_a^b \int_a^b f(x)g(y) \,dx\,dy - \int_a^b \int_a^b f(y)g(x) \,dx\,dy + \int_a^b \int_a^b f(y)g(y) \,dx\,dy \geq 0.</math>

Since the integration variable in each inner integral is independent, we have:
* <math>\int_a^b \int_a^b f(x)g(x) \,dx\,dy = (b-a) \int_a^b f(x)g(x) \,dx,</math>
* <math>\int_a^b \int_a^b f(y)g(y) \,dx\,dy = (b-a) \int_a^b f(y)g(y) \,dy = (b-a) \int_a^b f(x)g(x) \,dx,</math>
* <math>\int_a^b \int_a^b f(x)g(y) \,dx\,dy = \left(\int_a^b f(x) \,dx\right)\left(\int_a^b g(y) \,dy\right),</math>
* <math>\int_a^b \int_a^b f(y)g(x) \,dx\,dy = \left(\int_a^b f(y) \,dy\right)\left(\int_a^b g(x) \,dx\right) = \left(\int_a^b f(x) \,dx\right)\left(\int_a^b g(x) \,dx\right).</math>

Let

<math display="block">I = \int_a^b f(x)g(x) \,dx, \quad F = \int_a^b f(x) \,dx, \quad G = \int_a^b g(x) \,dx.</math>

Substitute these into the inequality:

<math display="block">(b-a)I - FG - FG + (b-a)I \geq 0.</math>

Simplify:

<math display="block">2(b-a)I - 2FG \geq 0.</math>

Dividing by <math>2(b-a)</math> (noting that <math>b-a>0</math>):

<math display="block">I \geq \frac{FG}{(b-a)}.</math>

Divide both sides by <math>b-a</math> to obtain:

<math display="block">\frac{1}{b-a} \int_a^b f(x)g(x) \,dx \geq \left(\frac{1}{b-a}\int_a^b f(x) \,dx\right) \left(\frac{1}{b-a}\int_a^b g(x) \,dx\right).</math>

This completes the proof.}}

This inequality is related to [[Jensen's inequality]],<ref name=Niculescu2001>{{cite journal |last=Niculescu |first=Constantin P. |title=An extension of Chebyshev's inequality and its connection with Jensen's inequality |journal=Journal of Inequalities and Applications |year=2001 |volume=6 |issue=4 |pages=451–462 |doi=10.1155/S1025583401000273 |url=http://emis.matem.unam.mx/journals/HOA/JIA/Volume6_4/462.html |access-date=6 October 2012 |issn=1025-5834|citeseerx=10.1.1.612.7056 |doi-access=free }}</ref> [[Kantorovich's inequality]],<ref name=Niculescu2001a>{{cite journal |last1=Niculescu |first1=Constantin P. |last2=Pečarić |first2=Josip |author-link2=Josip Pečarić |title=The Equivalence of Chebyshev's Inequality to the Hermite–Hadamard Inequality |journal=Mathematical Reports |year=2010 |volume=12 |issue=62 |pages=145–156 |url=http://www.csm.ro/reviste/Mathematical_Reports/Pdfs/2010/2/Niculescu.pdf |access-date=6 October 2012 |issn=1582-3067}}</ref> the [[Hermite–Hadamard inequality]]<ref name="Niculescu2001a"/> and [[Walter's conjecture]].<ref name=Malamud2001>{{cite journal |last=Malamud |first=S. M. |title=Some complements to the Jensen and Chebyshev inequalities and a problem of W. Walter |journal=Proceedings of the American Mathematical Society |date=15 February 2001 |volume=129 |issue=9 |pages=2671–2678 |doi=10.1090/S0002-9939-01-05849-X |mr=1838791 |url=https://www.ams.org/journals/proc/2001-129-09/S0002-9939-01-05849-X/ |access-date=7 October 2012 |issn=0002-9939|doi-access=free }}</ref>

===Other inequalities===

There are also a number of other inequalities associated with Chebyshev:

*[[Chebyshev's sum inequality]]
*[[Chebyshev–Markov–Stieltjes inequalities]]

==Notes==

The [[United States Environmental Protection Agency|Environmental Protection Agency]] has suggested best practices for the use of Chebyshev's inequality for estimating confidence intervals.<ref>{{cite report
 | title      = Calculating Upper Confidence Limits for Exposure Point Concentrations at hazardous Waste Sites
 | publisher  = Office of Emergency and Remedial Response of the U.S. Environmental Protection Agency
 |date=December 2002
 | url        = http://nepis.epa.gov/Exe/ZyNET.exe/P100CYCE.TXT?ZyActionD=ZyDocument&Client=EPA&Index=2000+Thru+2005&Docs=&Query=&Time=&EndTime=&SearchMethod=1&TocRestrict=n&Toc=&TocEntry=&QField=&QFieldYear=&QFieldMonth=&QFieldDay=&IntQFieldOp=0&ExtQFieldOp=0&XmlQuery=&File=D%3A%5Czyfiles%5CIndex%20Data%5C00thru05%5CTxt%5C00000029%5CP100CYCE.txt&User=ANONYMOUS&Password=anonymous&SortMethod=h%7C-&MaximumDocuments=1&FuzzyDegree=0&ImageQuality=r75g8/r75g8/x150y150g16/i425&Display=p%7Cf&DefSeekPage=x&SearchBack=ZyActionL&Back=ZyActionS&BackDesc=Results%20page&MaximumPages=1&ZyEntry=1&SeekPage=x&ZyPURL#
 | access-date = 5 August 2016}}</ref>

==See also==
*[[Multidimensional Chebyshev's inequality]]
*[[Concentration inequality]] – a summary of tail-bounds on random variables.
*[[Cornish–Fisher expansion]]
*[[Eaton's inequality]]
*[[Kolmogorov's inequality]]
*[[Law of large numbers/Proof|Proof of the weak law of large numbers]] using Chebyshev's inequality
*[[Le Cam's theorem]]
*[[Paley–Zygmund inequality]]
*[[Vysochanskiï–Petunin inequality]] — a stronger result applicable to [[unimodal probability distributions]]
*[[Lenglart's inequality]]

==References==
{{reflist|30em}}

==Further reading==
* A. Papoulis (1991), ''Probability, Random Variables, and Stochastic Processes'', 3rd ed. McGraw–Hill. {{isbn|0-07-100870-5}}. pp.&nbsp;113–114.
* [[Geoffrey Grimmett|G. Grimmett]] and D. Stirzaker (2001), ''Probability and Random Processes'', 3rd ed. Oxford. {{isbn|0-19-857222-0}}. Section 7.3.

==External links==
{{commons category}}
* {{springer|title=Chebyshev inequality in probability theory|id=p/c021890}}
* [https://web.archive.org/web/20131204193123/http://mws.cs.ru.nl/mwiki/random_2.html#T7 Formal proof] in the [[Mizar system]].

{{Lp spaces}}

[[Category:Articles containing proofs]]
[[Category:Probabilistic inequalities]]
[[Category:Statistical inequalities]]