Jensen's inequality

Template:Short description Template:For Template:Use American English Template:More citations needed

Jensen's inequality generalizes the statement that a secant line of a convex function lies above its graph.

Visualizing convexity and Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,<ref>Template:Cite journal</ref> building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.<ref>Template:Cite journal</ref> Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation (or equivalently, the opposite inequality for concave transformations).<ref>Template:Cite book</ref>

Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for t ∈ [0,1]),

while the graph of the function is the convex function of the weighted means,

Thus, Jensen's inequality in this case is

In the context of probability theory, it is generally stated in the following form: if X is a random variable and Template:Mvar is a convex function, then

<math qid=Q107203920>\varphi(\operatorname{E}[X]) \leq \operatorname{E} \left[\varphi(X)\right].</math>

The difference between the two sides of the inequality, <math>\operatorname{E} \left[\varphi(X)\right] - \varphi\left(\operatorname{E}[X]\right)</math>, is called the Jensen gap.<ref name="Gao et al.">Template:Cite journal</ref>

StatementsEdit

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.

Finite formEdit

For a real convex function <math>\varphi</math>, numbers <math>x_1, x_2, \ldots, x_n</math> in its domain, and positive weights <math>a_i</math>, Jensen's inequality can be stated as:

Template:NumBlk

and the inequality is reversed if <math>\varphi</math> is concave, which is

Template:NumBlk

Equality holds if and only if <math>x_1=x_2=\cdots =x_n</math> or <math>\varphi</math> is linear on a domain containing <math>x_1,x_2,\cdots ,x_n</math>.

As a particular case, if the weights <math>a_i</math> are all equal, then (Template:EquationNote) and (Template:EquationNote) become

Template:NumBlk Template:NumBlk

For instance, the function Template:Math is concave, so substituting <math>\varphi(x) = \log(x)</math> in the previous formula (Template:EquationNote) establishes the (logarithm of the) familiar arithmetic-mean/geometric-mean inequality:

<math display=block>\log\!\left( \frac{\sum_{i=1}^n x_i}{n}\right) \geq \frac{\sum_{i=1}^n \log\!\left( x_i \right)}{n}</math> <math display=block>\exp\!\left(\log\!\left( \frac{\sum_{i=1}^n x_i}{n}\right)\right) \geq \exp\!\left(\frac{\sum_{i=1}^n \log\!\left( x_i \right)} {n}\right)</math> <math display=block>\frac{x_1 + x_2 + \cdots + x_n}{n} \geq \sqrt[n]{x_1 \cdot x_2 \cdots x_n}</math>

A common application has Template:Mvar as a function of another variable (or set of variables) Template:Mvar, that is, <math>x_i = g(t_i)</math>. All of this carries directly over to the general continuous case: the weights Template:Math are replaced by a non-negative integrable function Template:Math, such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic formEdit

Let <math>(\Omega, A, \mu)</math> be a probability space. Let <math>f : \Omega \to \mathbb{R}</math> be a <math>\mu</math>-measurable function and <math>\varphi : \mathbb{R} \to \mathbb{R}</math> be convex. Then:<ref>p. 25 of Template:Cite book</ref> <math display='block'> \varphi\left(\int_\Omega f \,\mathrm{d}\mu\right) \leq \int_\Omega \varphi \circ f \,\mathrm{d}\mu </math>

In real analysis, we may require an estimate on

<math>\varphi\left(\int_a^b f(x)\, dx\right)</math>

where <math>a, b \in \mathbb{R}</math>, and <math>f\colon[a, b] \to \R</math> is a non-negative Lebesgue-integrable function. In this case, the Lebesgue measure of <math>[a, b]</math> need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get<ref>Niculescu, Constantin P. "Integral inequalities", P. 12.</ref>

<math>\varphi\left(\frac{1}{b-a}\int_a^b f(x)\, dx\right) \le \frac{1}{b-a} \int_a^b \varphi(f(x)) \,dx. </math>

Probabilistic formEdit

The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let <math>(\Omega, \mathfrak{F},\operatorname{P})</math> be a probability space, X an integrable real-valued random variable and <math>\varphi</math> a convex function. Then<ref>Template:Cite book</ref> <math display="block">

\varphi\big(\operatorname{E}[X]\big) \leq \operatorname{E}[\varphi(X)].

</math>

In this probability setting, the measure Template:Mvar is intended as a probability <math>\operatorname{P}</math>, the integral with respect to Template:Mvar as an expected value <math>\operatorname{E}</math>, and the function <math>f</math> as a random variable X.

Note that the equality holds if and only if <math>\varphi</math> is a linear function on some convex set <math>A</math> such that <math>P(X \in A)=1</math> (which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic settingEdit

More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element <math>\operatorname{E}[X]</math> in T, such that for any element z in the dual space of T: <math>\operatorname{E}|\langle z, X \rangle|<\infty </math>, and <math>\langle z, \operatorname{E}[X]\rangle = \operatorname{E}[\langle z, X \rangle]</math>. Then, for any measurable convex function Template:Mvar and any sub-σ-algebra <math>\mathfrak{G}</math> of <math>\mathfrak{F}</math>:

<math>\varphi\left(\operatorname{E}\left[X\mid\mathfrak{G}\right]\right) \leq \operatorname{E}\left[\varphi(X)\mid\mathfrak{G}\right].</math>

Here <math>\operatorname{E}[\cdot\mid\mathfrak{G}]</math> stands for the expectation conditioned to the σ-algebra <math>\mathfrak{G}</math>. This general statement reduces to the previous ones when the topological vector space Template:Mvar is the real axis, and <math>\mathfrak{G}</math> is the trivial Template:Mvar-algebra Template:Math (where Template:Math is the empty set, and Template:Math is the sample space).<ref>Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 in Template:Cite journal</ref>

A sharpened and generalized formEdit

Let X be a one-dimensional random variable with mean <math>\mu</math> and variance <math>\sigma^2\ge 0</math>. Let <math>\varphi(x)</math> be a twice differentiable function, and define the function

<math>

h(x)\triangleq\frac{\varphi \left(x\right)-\varphi \left(\mu \right)}{\left(x-\mu \right)^2}-\frac{\varphi '\left(\mu \right)}{x-\mu}. </math>

Then<ref name="Liao & Berg">Template:Cite journal</ref>

<math>

\sigma^2\inf \frac{\varphi(x)}{2} \le \sigma^2\inf h(x) \leq E\left[\varphi \left(X\right)\right]-\varphi\left(E[X]\right)\le \sigma^2\sup h(x) \le \sigma^2\sup \frac{\varphi(x)}{2}. </math>

In particular, when <math>\varphi(x)</math> is convex, then <math>\varphi(x)\ge 0</math>, and the standard form of Jensen's inequality immediately follows for the case where <math>\varphi(x)</math> is additionally assumed to be twice differentiable.

ProofsEdit

Intuitive graphical proofEdit

File:Jensen graph.svg

A graphical "proof" of Jensen's inequality for the probabilistic case. The dashed curve along the Template:Mvar axis is the hypothetical distribution of Template:Mvar, while the dashed curve along the Template:Mvar axis is the corresponding distribution of Template:Mvar values. Note that the convex mapping Template:Math increasingly "stretches" the distribution for increasing values of Template:Mvar.

File:Jensen's Inequality Proof Without Words.png

This is a proof without words of Jensen's inequality for Template:Mvar variables. Without loss of generality, the sum of the positive weights is Template:Math. It follows that the weighted point lies in the convex hull of the original points, which lies above the function itself by the definition of convexity. The conclusion follows.<ref>Template:Cite book</ref>

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where Template:Mvar is a real number (see figure). Assuming a hypothetical distribution of Template:Mvar values, one can immediately identify the position of <math>\operatorname{E}[X]</math> and its image <math> \varphi(\operatorname{E}[X])</math> in the graph. Noticing that for convex mappings Template:Math of some Template:Mvar values the corresponding distribution of Template:Mvar values is increasingly "stretched up" for increasing values of Template:Mvar, it is easy to see that the distribution of Template:Mvar is broader in the interval corresponding to Template:Math and narrower in Template:Math for any Template:Math; in particular, this is also true for <math> X_0 = \operatorname{E}[X]</math>. Consequently, in this picture the expectation of Template:Mvar will always shift upwards with respect to the position of <math> \varphi(\operatorname{E}[X])</math>. A similar reasoning holds if the distribution of Template:Mvar covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

<math>\varphi(\operatorname{E}[X]) \leq \operatorname{E}[\varphi(X)] = \operatorname{E}[Y], </math>

with equality when Template:Math is not strictly convex, e.g. when it is a straight line, or when Template:Mvar follows a degenerate distribution (i.e. is a constant).

The proofs below formalize this intuitive notion.

Proof 1 (finite form)Edit

If Template:Math and Template:Math are two arbitrary nonnegative real numbers such that Template:Math then convexity of Template:Mvar implies

<math>\forall x_1, x_2: \qquad \varphi \left (\lambda_1 x_1+\lambda_2 x_2 \right )\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2).</math>

This can be generalized: if Template:Math are nonnegative real numbers such that Template:Math, then

<math>\varphi(\lambda_1 x_1+\lambda_2 x_2+\cdots+\lambda_n x_n)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2)+\cdots+\lambda_n\,\varphi(x_n),</math>

for any Template:Math.

The finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose the statement is true for some n, so

<math>\varphi\left(\sum_{i=1}^{n}\lambda_i x_i\right) \leq \sum_{i=1}^{n}\lambda_i \varphi\left(x_i\right)

</math>

for any Template:Math such that Template:Math.

One needs to prove it for Template:Math. At least one of the Template:Math is strictly smaller than <math>1</math>, say Template:Math; therefore by convexity inequality:

<math>\begin{align}

\varphi\left(\sum_{i=1}^{n+1}\lambda_i x_i\right) &= \varphi\left((1-\lambda_{n+1})\sum_{i=1}^{n} \frac{\lambda_i}{1-\lambda_{n+1}} x_i + \lambda_{n+1} x_{n+1} \right) \\ &\leq (1-\lambda_{n+1}) \varphi\left(\sum_{i=1}^{n} \frac{\lambda_i}{1-\lambda_{n+1}} x_i \right)+\lambda_{n+1}\,\varphi(x_{n+1}). \end{align}</math>

Since Template:Math,

<math>\sum_{i=1}^{n} \frac{\lambda_i}{1-\lambda_{n+1}} = 1</math>,

applying the inductive hypothesis gives

<math>\varphi\left(\sum_{i=1}^{n}\frac{\lambda_i}{1-\lambda_{n+1}} x_i\right) \leq \sum_{i=1}^{n}\frac{\lambda_i}{1-\lambda_{n+1}} \varphi(x_i)</math>

therefore

<math>\begin{align}

\varphi\left(\sum_{i=1}^{n+1}\lambda_i x_i\right) &\leq (1-\lambda_{n+1}) \sum_{i=1}^{n}\frac{\lambda_i}{1-\lambda_{n+1}} \varphi(x_i)+\lambda_{n+1}\,\varphi(x_{n+1}) =\sum_{i=1}^{n+1}\lambda_i \varphi(x_i) \end{align}</math>

We deduce the inequality is true for Template:Math, by induction it follows that the result is also true for all integer Template:Math greater than 2.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

<math>\varphi\left(\int x\,d\mu_n(x) \right)\leq \int \varphi(x)\,d\mu_n(x),</math>

where μ_n is a measure given by an arbitrary convex combination of Dirac deltas:

<math>\mu_n= \sum_{i=1}^n \lambda_i \delta_{x_i}.</math>

Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)Edit

Let <math>g</math> be a real-valued <math>\mu</math>-integrable function on a probability space <math>\Omega</math>, and let <math>\varphi</math> be a convex function on the real numbers. Since <math>\varphi</math> is convex, at each real number <math>x</math> we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of <math>\varphi</math> at <math>x</math>, but which are below the graph of <math>\varphi</math> at all points (support lines of the graph).

Now, if we define

<math>x_0:=\int_\Omega g\, d\mu,</math>

because of the existence of subderivatives for convex functions, we may choose <math>a</math> and <math>b</math> such that

<math>ax + b \leq \varphi(x),</math>

for all real <math>x</math> and

<math>ax_0+ b = \varphi(x_0).</math>

But then we have that

<math>\varphi \circ g (\omega) \geq ag(\omega)+ b</math>

for almost all <math>\omega \in \Omega</math>. Since we have a probability measure, the integral is monotone with <math>\mu(\Omega) = 1</math> so that

<math>\int_\Omega \varphi \circ g\, d\mu \geq \int_\Omega (ag + b)\, d\mu = a\int_\Omega g\, d\mu + b\int_\Omega d\mu = ax_0 + b = \varphi (x_0) = \varphi \left (\int_\Omega g\, d\mu \right ),</math>

as desired.

Proof 3 (general inequality in a probabilistic setting)Edit

Let X be an integrable random variable that takes values in a real topological vector space T. Since <math>\varphi: T \to \R</math> is convex, for any <math>x,y \in T</math>, the quantity

<math>\frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta},</math>

is decreasing as Template:Mvar approaches 0⁺. In particular, the subdifferential of <math>\varphi</math> evaluated at Template:Mvar in the direction Template:Mvar is well-defined by

<math>(D\varphi)(x)\cdot y:=\lim_{\theta \downarrow 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}=\inf_{\theta \neq 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}.</math>

It is easily seen that the subdifferential is linear in Template:Mvar Template:Citation needed (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for Template:Math, one gets

<math>\varphi(x)\leq \varphi(x+y)-(D\varphi)(x)\cdot y.</math>

In particular, for an arbitrary sub-Template:Mvar-algebra <math> \mathfrak{G}</math> we can evaluate the last inequality when <math> x = \operatorname{E}[X\mid\mathfrak{G}],\,y=X-\operatorname{E}[X\mid\mathfrak{G}]</math> to obtain

<math>\varphi(\operatorname{E}[X\mid\mathfrak{G}]) \leq \varphi(X)-(D\varphi)(\operatorname{E}[X\mid\mathfrak{G}])\cdot (X-\operatorname{E}[X\mid\mathfrak{G}]).</math>

Now, if we take the expectation conditioned to <math> \mathfrak{G}</math> on both sides of the previous expression, we get the result since:

<math>\operatorname{E} \left [\left[(D\varphi)(\operatorname{E}[X\mid\mathfrak{G}])\cdot (X-\operatorname{E}[X\mid\mathfrak{G}])\right]\mid\mathfrak{G} \right] = (D\varphi)(\operatorname{E}[X\mid\mathfrak{G}])\cdot \operatorname{E}[\left( X-\operatorname{E}[X\mid\mathfrak{G}] \right) \mid \mathfrak{G}]=0,</math>

by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation:

<math>\operatorname{E} \left [ \left(\operatorname{E}[X\mid\mathfrak{G}] \right) \mid\mathfrak{G} \right ] = \operatorname{E}[ X \mid\mathfrak{G}].</math>

Applications and special casesEdit

Form involving a probability density functionEdit

Suppose Template:Math is a measurable subset of the real line and f(x) is a non-negative function such that

<math>\int_{-\infty}^\infty f(x)\,dx = 1.</math>

In probabilistic language, f is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

If g is any real-valued measurable function and <math display="inline">\varphi</math> is convex over the range of g, then

<math> \varphi\left(\int_{-\infty}^\infty g(x)f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(g(x)) f(x)\, dx. </math>

If g(x) = x, then this form of the inequality reduces to a commonly used special case:

<math>\varphi\left(\int_{-\infty}^\infty x\, f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(x)\,f(x)\, dx.</math>

This is applied in Variational Bayesian methods.

Example: even moments of a random variableEdit

If g(x) = x²ⁿ, and X is a random variable, then g is convex as

<math> \frac{d^{2}g}{dx^{2}}(x) = 2n(2n - 1)x^{2n - 2} \geq 0\quad \forall\ x \in \R</math>

and so

<math>

g(\operatorname{E}[X]) = (\operatorname{E}[X])^{2n} \leq\operatorname{E}[X^{2n}]. </math>

In particular, if some even moment 2n of X is finite, X has a finite mean. An extension of this argument shows X has finite moments of every order <math>l\in\N</math> dividing n.

Alternative finite formEdit

Let Template:Math and take Template:Mvar to be the counting measure on Template:Math, then the general form reduces to a statement about sums:

<math> \varphi\left(\sum_{i=1}^{n} g(x_i)\lambda_i \right) \le \sum_{i=1}^{n} \varphi(g(x_i)) \lambda_i, </math>

provided that Template:Math and

<math>\lambda_1 + \cdots + \lambda_n = 1.</math>

There is also an infinite discrete form.

Statistical physicsEdit

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

<math> e^{\operatorname{E}[X]} \leq \operatorname{E} \left [ e^X \right ],</math>

where the expected values are with respect to some probability distribution in the random variable Template:Mvar.

Proof: Let <math>\varphi(x) = e^x</math> in <math>\varphi\left(\operatorname{E}[X]\right) \leq \operatorname{E} \left[ \varphi(X) \right].</math>

Information theoryEdit

If Template:Math is the true probability density for Template:Mvar, and Template:Math is another density, then applying Jensen's inequality for the random variable Template:Math and the convex function Template:Math gives

<math>\operatorname{E}[\varphi(Y)] \ge \varphi(\operatorname{E}[Y])</math>

Therefore:

<math>-D(p(x)\|q(x))=\int p(x) \log \left (\frac{q(x)}{p(x)} \right ) \, dx \le \log \left ( \int p(x) \frac{q(x)}{p(x)}\,dx \right ) = \log \left (\int q(x)\,dx \right ) =0 </math>

a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is non-negative is called the Kullback–Leibler divergence of q from p, where <math>D(p(x)\|q(x))=\int p(x) \log \left (\frac{p(x)}{q(x)} \right ) dx</math>.

Since Template:Math is a strictly convex function for Template:Math, it follows that equality holds when Template:Math equals Template:Math almost everywhere.

Rao–Blackwell theoremEdit

Template:Main article

If L is a convex function and <math>\mathfrak{G}</math> a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get

<math>L(\operatorname{E}[\delta(X) \mid \mathfrak{G}]) \le \operatorname{E}[L(\delta(X)) \mid \mathfrak{G}] \quad \Longrightarrow \quad \operatorname{E}[L(\operatorname{E}[\delta(X) \mid \mathfrak{G}])] \le \operatorname{E}[L(\delta(X))].</math>

So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating

<math>\delta_1 (X) = \operatorname{E}_{\theta}[\delta(X') \mid T(X')= T(X)], </math>

the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed. Further, because T is a sufficient statistic, <math>\delta_1 (X)</math> does not depend on θ, hence, becomes a statistic.

This result is known as the Rao–Blackwell theorem.

Risk aversionEdit

The relation between risk aversion and declining marginal utility for scalar outcomes can be stated formally with Jensen's inequality: risk aversion can be stated as preferring a certain outcome <math>u(E[x])</math> to a fair gamble with potentially larger but uncertain outcome of <math>u(x)</math>:

<math>u(E[x]) > E[u(x)]</math>.

But this is simply Jensen's inequality for a concave <math>u(x)</math>: a utility function that exhibits declining marginal utility.<ref>Template:Cite book</ref>

GeneralizationsEdit

Beyond its classical formulation for real numbers and convex functions, Jensen’s inequality has been extended to the realm of operator theory. In this non‐commutative setting the inequality is expressed in terms of operator convex functions—that is, functions defined on an interval I that satisfy

<math>f\bigl(\lambda x + (1-\lambda)y\bigr)\le\lambda f(x)+(1-\lambda)f(y)</math>

for every pair of self‐adjoint operators x and y (with spectra in I) and every scalar <math>\lambda\in[0,1]</math>. Hansen and Pedersen<ref name="HP2003">Template:Cite journal</ref> established a definitive version of this inequality by considering genuine non‐commutative convex combinations. In particular, if one has an n‑tuple of bounded self‐adjoint operators <math>x_1,\dots,x_n</math> with spectra in I and an n‑tuple of operators <math>a_1,\dots,a_n</math> satisfying

then the following operator Jensen inequality holds:

This result shows that the convex transformation “respects” non-commutative convex combinations, thereby extending the classical inequality to operators without the need for additional restrictions on the interval of definition.<ref name="HP2003" /> A closely related extension is given by the Jensen trace inequality. For a continuous convex function f defined on I, if one considers self‐adjoint matrices <math>x_1,\dots,x_n</math> (with spectra in I) and matrices <math>a_1,\dots,a_n</math> satisfying <math>\sum_{i=1}^{n}a_i^*a_i=I</math>, then one has

<math>\operatorname{Tr}\Bigl(f\Bigl(\sum_{i=1}^{n}a_i^*x_ia_i\Bigr)\Bigr)\le\operatorname{Tr}\Bigl(\sum_{i=1}^{n}a_i^*f(x_i)a_i\Bigr).</math>

This inequality naturally extends to C*-algebras equipped with a finite trace and is particularly useful in applications ranging from quantum statistical mechanics to information theory. Furthermore, contractive versions of these operator inequalities are available when one only assumes <math>\sum_{i=1}^{n}a_i^t a_i\le I</math>, provided that additional conditions such as <math>f(0)\le0</math> (when 0 ∈ I) are imposed. Extensions to continuous fields of operators and to settings involving conditional expectations on C-algebras further illustrate the broad applicability of these generalizations.

NotesEdit

Template:Reflist

ReferencesEdit

Template:Cite book
Tristan Needham (1993) "A Visual Explanation of Jensen's Inequality", American Mathematical Monthly 100(8):768–71.
Template:Cite book
Template:Cite book
Template:Cite book
Sam Savage (2012) The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty (1st ed.) Wiley. ISBN 978-0471381976

External linksEdit

Jensen's Operator Inequality of Hansen and Pedersen.
Template:Springer
{{#invoke:Template wrapper|{{#if:|list|wrap}}|_template=cite web

|_exclude=urlname, _debug, id |url = https://mathworld.wolfram.com/{{#if:JensensInequality%7CJensensInequality.html}} |title = Jensen's inequality |author = Weisstein, Eric W. |website = MathWorld |access-date = |ref = Template:SfnRef }}

{{#invoke:citation/CS1|citation

|CitationClass=web }}

Template:Convex analysis and variational analysis

Jensen's inequality

Contents

StatementsEdit

Finite formEdit

Measure-theoretic formEdit

Probabilistic formEdit

General inequality in a probabilistic settingEdit

A sharpened and generalized formEdit

ProofsEdit

Intuitive graphical proofEdit

Proof 1 (finite form)Edit

Proof 2 (measure-theoretic form)Edit

Proof 3 (general inequality in a probabilistic setting)Edit

Applications and special casesEdit

Form involving a probability density functionEdit

Example: even moments of a random variableEdit

Alternative finite formEdit

Statistical physicsEdit

Information theoryEdit

Rao–Blackwell theoremEdit

Risk aversionEdit

GeneralizationsEdit

See alsoEdit

NotesEdit

ReferencesEdit

External linksEdit