Berry–Esseen theorem

Template:Short description In probability theory, the central limit theorem states that, under certain circumstances, the probability distribution of the scaled mean of a random sample converges to a normal distribution as the sample size increases to infinity. Under stronger assumptions, the Berry–Esseen theorem, or Berry–Esseen inequality, gives a more quantitative result, because it also specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. In the case of independent samples, the convergence rate is Template:Math, where Template:Math is the sample size, and the constant is estimated in terms of the third absolute normalized moment. It is also possible to give non-uniform bounds which become more strict for more extreme events.

Statement of the theoremEdit

Statements of the theorem vary, as it was independently discovered by two mathematicians, Andrew C. Berry (in 1941) and Carl-Gustav Esseen (1942), who then, along with other authors, refined it repeatedly over subsequent decades.

Identically distributed summandsEdit

One version, sacrificing generality somewhat for the sake of clarity, is the following:

There exists a positive constant C such that if X₁, X₂, ..., are i.i.d. random variables with E(X₁) = 0, E(X₁²) = σ² > 0, and E(|X₁|³) = ρ < ∞,<ref group="note">Since the random variables are identically distributed, X₂, X₃, ... all have the same moments as X₁.</ref> and if we define

<math>Y_n = {X_1 + X_2 + \cdots + X_n \over n}</math>

the sample mean, with F_n the cumulative distribution function of

<math>{Y_n \sqrt{n} \over {\sigma}},</math>

and Φ the cumulative distribution function of the standard normal distribution, then for all x and n,

<math>\left|F_n(x) - \Phi(x)\right| \le {C \rho \over \sigma^3\sqrt{n}}.\ \ \ \ (1)</math>

File:BerryEsseenTheoremCDFGraphExample.png

Illustration of the difference in cumulative distribution functions alluded to in the theorem.

That is: given a sequence of independent and identically distributed random variables, each having mean zero and positive variance, if additionally the third absolute moment is finite, then the cumulative distribution functions of the standardized sample mean and the standard normal distribution differ (vertically, on a graph) by no more than the specified amount. Note that the approximation error for all n (and hence the limiting rate of convergence for indefinite n sufficiently large) is bounded by the order of n^−1/2.

Calculated upper bounds on the constant C have decreased markedly over the years, from the original value of 7.59 by Esseen in 1942.<ref>Template:Harvtxt. For improvements see Template:Harvtxt, Template:Harvtxt, Template:Harvtxt, Template:Harvtxt, Template:Harvtxt, Template:Harvtxt, Template:Harvtxt. The detailed review can be found in the papers Template:Harvtxt and Template:Harvtxt.</ref> The estimate C < 0.4748 follows from the inequality

<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le {0.33554 (\rho+0.415\sigma^3)\over \sigma^3\sqrt{n}},</math>

since σ³ ≤ ρ and 0.33554 · 1.415 < 0.4748. However, if ρ ≥ 1.286σ³, then the estimate

<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le {0.3328 (\rho+0.429\sigma^3)\over \sigma^3\sqrt{n}},</math>

is even tighter.Template:Sfnp

Template:Harvtxt proved that the constant also satisfies the lower bound

<math>

   C\geq\frac{\sqrt{10}+3}{6\sqrt{2\pi}} \approx 0.40973 \approx \frac{1}{\sqrt{2\pi}} + 0.01079 .
 </math>

Non-identically distributed summandsEdit

Let X₁, X₂, ..., be independent random variables with E(X_i) = 0, E(X_i²) = σ_i² > 0, and E(|X_i|³) = ρ_i < ∞. Also, let

<math>S_n = {X_1 + X_2 + \cdots + X_n \over \sqrt{\sigma_1^2+\sigma_2^2+\cdots+\sigma_n^2} }</math>

be the normalized n-th partial sum. Denote F_n the cdf of S_n, and Φ the cdf of the standard normal distribution. For the sake of convenience denote

<math>\vec{\sigma}=(\sigma_1,\ldots,\sigma_n),\ \vec{\rho}=(\rho_1,\ldots,\rho_n).</math>

In 1941, Andrew C. Berry proved that for all n there exists an absolute constant C₁ such that

<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le C_1\cdot\psi_1,\ \ \ \ (2)</math>

where

<math>\psi_1=\psi_1\big(\vec{\sigma},\vec{\rho}\big)=\Big({\textstyle\sum\limits_{i=1}^n\sigma_i^2}\Big)^{-1/2}\cdot\max_{1\le

i\le n}\frac{\rho_i}{\sigma_i^2}.</math>

Independently, in 1942, Carl-Gustav Esseen proved that for all n there exists an absolute constant C₀ such that

<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le C_0\cdot\psi_0, \ \ \ \ (3)</math>

where

<math>\psi_0=\psi_0\big(\vec{\sigma},\vec{\rho}\big)=\Big({\textstyle\sum\limits_{i=1}^n\sigma_i^2}\Big)^{-3/2}\cdot\sum\limits_{i=1}^n\rho_i.</math>

It is easy to make sure that ψ₀≤ψ₁. Due to this circumstance inequality (3) is conventionally called the Berry–Esseen inequality, and the quantity ψ₀ is called the Lyapunov fraction of the third order. Moreover, in the case where the summands X₁, ..., X_n have identical distributions

<math>\psi_0=\psi_1=\frac{\rho_1}{\sigma_1^3\sqrt{n}},</math>

and thus the bounds stated by inequalities (1), (2) and (3) coincide apart from the constant.

Regarding C₀, obviously, the lower bound established by Template:Harvtxt remains valid:

<math>

   C_0\geq\frac{\sqrt{10}+3}{6\sqrt{2\pi}} = 0.4097\ldots.
 </math>

The lower bound is exactly reached only for certain Bernoulli distributions (see Template:Harvtxt for their explicit expressions).

The upper bounds for C₀ were subsequently lowered from Esseen's original estimate 7.59 to 0.5600.<ref>Template:Harvtxt; Template:Harvtxt; Template:Harvtxt; Template:Harvtxt; Template:Harvtxt; Template:Harvtxt; Template:Harvtxt.</ref>

Sum of a random number of random variablesEdit

Berry–Esseen theorems exist for the sum of a random number of random variables. The following is Theorem 1 from Korolev (1989), substituting in the constants from Remark 3.<ref>Template:Cite journal</ref> It is only a portion of the results that they established:

Let <math>\{X_i\}</math> be independent, identically distributed random variables with <math>E(X_i) = \mu</math>, <math>\operatorname{Var}(X_i) = \sigma^2</math>, <math>E|X_i - \mu|^3 = \kappa^3</math>. Let <math>N</math> be a non-negative integer-valued random variable, independent from <math>\{X_i\}</math>. Let <math>S_N = X_1 + \cdots + X_N</math>, and define

<math>

 \Delta = \sup_{x} \left|
   P\left(
     \frac{S_N - E(S_N)}{\sqrt{\operatorname{Var}(S_N)}}
     \leq
     z
   \right)
   -
   \Phi(z)
 \right|

</math>

Then

<math>

 \Delta \leq
 3.8696\frac{\kappa^3}{\sqrt{E(N)}\sigma^3} +
 1.0395\frac{E|N - E(N)|}{E(N)} +
 0.2420\frac{\mu^2 \operatorname{Var}(N)}{\sigma^2 E(N)}

</math>

Multidimensional versionEdit

As with the multidimensional central limit theorem, there is a multidimensional version of the Berry–Esseen theorem.<ref>Bentkus, Vidmantas. "A Lyapunov-type bound in R^d." Theory of Probability & Its Applications 49.2 (2005): 311–323.</ref><ref name=":0" />

Let <math>X_1,\dots,X_n</math> be independent <math>\mathbb R^d</math>-valued random vectors each having mean zero. Write <math>S_n = \sum_{i=1}^n X_i</math> and assume <math>\Sigma_n = \operatorname{Cov}[S_n]</math> is invertible. Let <math>Z_n\sim\operatorname{N}(0,{\Sigma_n})</math> be a <math>d</math>-dimensional Gaussian with the same mean and covariance matrix as <math>S_n</math>. Then for all convex sets <math>U\subseteq\mathbb R^d</math>,

<math>\big|\Pr[S_n\in U]-\Pr[Z_n\in U]\,\big| \le C d^{1/4} \gamma_n</math>,

where <math>C</math> is a universal constant and <math>\gamma_n=\sum_{i=1}^n \operatorname{E}\big[\|\Sigma_n^{-1/2}X_i\|_2^3\big]</math> (the third power of the L² norm).

The dependency on <math>d^{1/4}</math> is conjectured to be optimal, but might not be.<ref name=":0">Template:Cite journal</ref>

Non-uniform boundsEdit

The bounds given above consider the maximal difference between the cdf's. They are 'uniform' in that they do not depend on <math>x</math> and quantify the uniform convergence <math>F_n \to \Phi</math>. However, because <math>F_n(x) - \Phi(x)</math> goes to zero for large <math>x</math> by general properties of cdf's, these uniform bounds will be overestimating the difference for such arguments. This is despite the uniform bounds being sharp in general. It is therefore desirable to obtain upper bounds which depend on <math>x</math> and in this way become smaller for large <math>x</math>.

One such result going back to Template:Harvard citation that was since improved multiple times is the following.

As above, let X₁, X₂, ..., be independent random variables with E(X_i) = 0, E(X_i²) = σ_i² > 0, and E(|X_i|³) = ρ_i < ∞. Also, let <math>\sigma^2 = \sum_{i=1}^{n} \sigma_i^2</math> and

<math>S_n = {X_1 + X_2 + \cdots + X_n \over \sigma}</math>

be the normalized n-th partial sum. Denote F_n the cdf of S_n, and Φ the cdf of the standard normal distribution. Then

<math>|F_n(x) - \Phi(x)| \leq \frac{C_3}{\sigma^{3} + |x|^3} \cdot \sum_{i = 1}^n \rho_i</math>,

where <math>C_3</math> is a universal constant.

The constant <math>C_3</math> may be taken as 114.667.<ref>Template:Cite book</ref> Moreover, if the <math>X_i</math> are identically distributed, it can be taken as <math>C + 8(1+\mathrm{e})</math>, where <math>C</math> is the constant from the first theorem above, and hence 30.2211 works.<ref>Template:Cite journal</ref>

NotesEdit

ReferencesEdit

Template:Reflist

BibliographyEdit

Template:Refbegin

Template:Cite journal
Durrett, Richard (1991). Probability: Theory and Examples. Pacific Grove, CA: Wadsworth & Brooks/Cole. Template:ISBN.
Template:Cite journal
Template:Cite journal
Template:Cite journal
Feller, William (1972). An Introduction to Probability Theory and Its Applications, Volume II (2nd ed.). New York: John Wiley & Sons. Template:ISBN.
Template:Cite journal
Template:Cite journal
Manoukian, Edward B. (1986). Modern Concepts and Theorems of Mathematical Statistics. New York: Springer-Verlag. Template:ISBN.
Serfling, Robert J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons. Template:ISBN.
Template:Cite journal
Template:Cite journal
Template:Cite journal
Template:Cite arXiv
Template:Cite journal
Template:Cite journal
Template:Cite journal
Template:Cite journal
Template:Cite journal

Template:Refend

External linksEdit

Gut, Allan & Holst Lars. Carl-Gustav Esseen, retrieved Mar. 15, 2004.
Template:Springer

Berry–Esseen theorem

Contents

Statement of the theoremEdit

Identically distributed summandsEdit

Non-identically distributed summandsEdit

Sum of a random number of random variablesEdit

Multidimensional versionEdit

Non-uniform boundsEdit

See alsoEdit

NotesEdit

ReferencesEdit

BibliographyEdit

External linksEdit