Editing Berry–Esseen theorem (section)

==Statement of the theorem==
Statements of the theorem vary, as it was independently discovered by two [[mathematician]]s, [[Andrew C. Berry]] (in 1941) and [[Carl-Gustav Esseen]] (1942), who then, along with other authors, refined it repeatedly over subsequent decades.

===Identically distributed summands===

One version, sacrificing generality somewhat for the sake of clarity, is the following:

:There exists a positive [[Constant (mathematics)|constant]] ''C'' such that if ''X''<sub>1</sub>, ''X''<sub>2</sub>, ..., are [[Independent and identically distributed random variables|i.i.d. random variables]] with [[Expected value|E]](''X''<sub>1</sub>) = 0, E(''X''<sub>1</sub><sup>2</sup>) = ''σ''<sup>2</sup> > 0, and E(|''X''<sub>1</sub>|<sup>3</sup>) = ''ρ'' < ∞,<ref group="note">Since the random variables are identically distributed, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... all have the same [[moment (mathematics)|moments]] as ''X''<sub>1</sub>.</ref> and if we define
::<math>Y_n = {X_1 + X_2 + \cdots + X_n \over n}</math>
:the [[sample mean]], with ''F''<sub>''n''</sub> the [[cumulative distribution function]] of
::<math>{Y_n \sqrt{n} \over {\sigma}},</math><!-- please DO NOT CHANGE this formula unless you have read and understood the relevant comments on the talk page -->
:and Φ the cumulative distribution function of the [[standard normal distribution]], then for all ''x'' and ''n'',
::<math>\left|F_n(x) - \Phi(x)\right| \le {C \rho \over \sigma^3\sqrt{n}}.\ \ \ \ (1)</math>

[[Image:BerryEsseenTheoremCDFGraphExample.png|thumb|250px|Illustration of the difference in cumulative distribution functions alluded to in the theorem.]]
That is: given a sequence of [[independent and identically distributed random variables]], each having [[mean]] zero and positive [[variance]], if additionally the third absolute [[moment (mathematics)|moment]] is finite, then the [[cumulative distribution function]]s of the [[Standard score|standardized]] sample mean and the standard normal distribution differ (vertically, on a graph) by no more than the specified amount.  Note that the approximation error for all ''n'' (and hence the limiting rate of convergence for indefinite ''n'' sufficiently large) is bounded by  the [[Big O notation|order]] of ''n''<sup>−1/2</sup>.

Calculated upper bounds on the constant ''C'' have decreased markedly over the years, from the original value of 7.59 by Esseen in 1942.<ref>{{harvtxt|Esseen|1942}}. For improvements see {{harvtxt|van Beek|1972}}, {{harvtxt|Shiganov|1986}}, {{harvtxt|Shevtsova|2007}}, {{harvtxt|Shevtsova|2008}},  {{harvtxt|Tyurin|2009}}, {{harvtxt|Korolev|Shevtsova|2010a}}, {{harvtxt|Tyurin|2010}}. The detailed review can be found in the papers {{harvtxt|Korolev|Shevtsova|2010a}} and {{harvtxt|Korolev|Shevtsova|2010b}}.</ref> The estimate ''C''&nbsp;<&nbsp;0.4748 follows from the inequality
:<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le {0.33554 (\rho+0.415\sigma^3)\over \sigma^3\sqrt{n}},</math>
since ''σ''<sup>3</sup>&nbsp;≤&nbsp;''ρ'' and 0.33554&nbsp;·&nbsp;1.415&nbsp;<&nbsp;0.4748. However, if ''ρ''&nbsp;≥&nbsp;1.286''σ''<sup>3</sup>, then the estimate 
:<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le {0.3328 (\rho+0.429\sigma^3)\over \sigma^3\sqrt{n}},</math>
is even tighter.{{sfnp|Shevtsova|2011}}

{{harvtxt|Esseen|1956}} proved that the constant also satisfies the lower bound
: <math>
    C\geq\frac{\sqrt{10}+3}{6\sqrt{2\pi}} \approx 0.40973 \approx \frac{1}{\sqrt{2\pi}} + 0.01079 .
  </math>

===Non-identically distributed summands===

:Let ''X''<sub>1</sub>, ''X''<sub>2</sub>, ..., be independent random variables with [[expected value|E]](''X''<sub>''i''</sub>) = 0, E(''X''<sub>''i''</sub><sup>2</sup>) = ''σ''<sub>''i''</sub><sup>2</sup> > 0, and E(|''X''<sub>''i''</sub>|<sup>3</sup>) = ''ρ''<sub>''i''</sub> < ∞. Also, let
::<math>S_n = {X_1 + X_2 + \cdots + X_n \over \sqrt{\sigma_1^2+\sigma_2^2+\cdots+\sigma_n^2} }</math>
:be the normalized ''n''-th partial sum. Denote ''F''<sub>''n''</sub> the [[cumulative distribution function|cdf]] of ''S''<sub>''n''</sub>, and Φ the cdf of the [[standard normal distribution]]. For the sake of convenience denote  
::<math>\vec{\sigma}=(\sigma_1,\ldots,\sigma_n),\ \vec{\rho}=(\rho_1,\ldots,\rho_n).</math>
:In 1941, [[Andrew C. Berry]] proved that for all ''n'' there exists an absolute constant ''C''<sub>1</sub> such that
::<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le C_1\cdot\psi_1,\ \ \ \ (2)</math>
:where
::<math>\psi_1=\psi_1\big(\vec{\sigma},\vec{\rho}\big)=\Big({\textstyle\sum\limits_{i=1}^n\sigma_i^2}\Big)^{-1/2}\cdot\max_{1\le
i\le n}\frac{\rho_i}{\sigma_i^2}.</math>

:Independently, in 1942, [[Carl-Gustav Esseen]] proved that for all ''n'' there exists an absolute constant ''C''<sub>0</sub> such that
::<math>\sup_{x\in\mathbb R}\left|F_n(x) - \Phi(x)\right| \le C_0\cdot\psi_0, \ \ \ \ (3)</math>
:where
::<math>\psi_0=\psi_0\big(\vec{\sigma},\vec{\rho}\big)=\Big({\textstyle\sum\limits_{i=1}^n\sigma_i^2}\Big)^{-3/2}\cdot\sum\limits_{i=1}^n\rho_i.</math>

It is easy to make sure that ψ<sub>0</sub>≤ψ<sub>1</sub>. Due to this circumstance inequality (3) is conventionally called the Berry–Esseen inequality, and the quantity ψ<sub>0</sub> is called the Lyapunov fraction of the third order. Moreover, in the case where the summands ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> have identical distributions 
::<math>\psi_0=\psi_1=\frac{\rho_1}{\sigma_1^3\sqrt{n}},</math>
and thus the bounds stated by inequalities (1), (2) and (3) coincide apart from the constant.

Regarding ''C''<sub>0</sub>, obviously, the lower bound established by {{harvtxt|Esseen|1956}} remains valid:
: <math>
    C_0\geq\frac{\sqrt{10}+3}{6\sqrt{2\pi}} = 0.4097\ldots.
  </math>

The lower bound is exactly reached only for certain Bernoulli distributions (see {{harvtxt|Esseen|1956}} for their explicit expressions).

The upper bounds for ''C''<sub>0</sub> were subsequently lowered from Esseen's original estimate 7.59 to 0.5600.<ref>{{harvtxt|Esseen|1942}}; {{harvtxt|Zolotarev|1967}}; {{harvtxt|van Beek|1972}}; {{harvtxt|Shiganov|1986}}; {{harvtxt|Tyurin|2009}}; {{harvtxt|Tyurin|2010}}; {{harvtxt|Shevtsova|2010}}.</ref>

===Sum of a random number of random variables===

Berry–Esseen theorems exist for the sum of a random number of random variables. The following is Theorem 1 from Korolev (1989), substituting in the constants from Remark 3.<ref>{{cite journal |last1=Korolev |first1=V. Yu |title=On the Accuracy of Normal Approximation for the Distributions of Sums of a Random Number of Independent Random Variables |journal=Theory of Probability & Its Applications |date=1989 |volume=33 |issue=3 |pages=540–544 |doi=10.1137/1133079}}</ref> It is only a portion of the results that they established:

:Let <math>\{X_i\}</math> be independent, identically distributed random variables with <math>E(X_i) = \mu</math>, <math>\operatorname{Var}(X_i) = \sigma^2</math>, <math>E|X_i - \mu|^3 = \kappa^3</math>. Let <math>N</math> be a non-negative integer-valued random variable, independent from <math>\{X_i\}</math>. Let <math>S_N = X_1 + \cdots + X_N</math>, and define
::<math>
  \Delta = \sup_{x} \left|
    P\left(
      \frac{S_N - E(S_N)}{\sqrt{\operatorname{Var}(S_N)}}
      \leq
      z
    \right)
    -
    \Phi(z)
  \right|
</math>
:Then
::<math>
  \Delta \leq
  3.8696\frac{\kappa^3}{\sqrt{E(N)}\sigma^3} +
  1.0395\frac{E|N - E(N)|}{E(N)} +
  0.2420\frac{\mu^2 \operatorname{Var}(N)}{\sigma^2 E(N)}
</math>

===Multidimensional version===
As with the [[Central limit theorem#Multidimensional CLT|multidimensional central limit theorem]], there is a multidimensional version of the Berry–Esseen theorem.<ref>Bentkus, Vidmantas. "A Lyapunov-type bound in R<sup>d</sup>." Theory of Probability & Its Applications 49.2 (2005): 311–323.</ref><ref name=":0" />

:Let <math>X_1,\dots,X_n</math> be independent <math>\mathbb R^d</math>-valued random vectors each having mean zero. Write <math>S_n = \sum_{i=1}^n X_i</math> and assume <math>\Sigma_n = \operatorname{Cov}[S_n]</math> is invertible. Let <math>Z_n\sim\operatorname{N}(0,{\Sigma_n})</math> be a <math>d</math>-dimensional Gaussian with the same mean and covariance matrix as <math>S_n</math>. Then for all convex sets <math>U\subseteq\mathbb R^d</math>,
::<math>\big|\Pr[S_n\in U]-\Pr[Z_n\in U]\,\big| \le C d^{1/4} \gamma_n</math>,
:where <math>C</math> is a universal constant and <math>\gamma_n=\sum_{i=1}^n \operatorname{E}\big[\|\Sigma_n^{-1/2}X_i\|_2^3\big]</math> (the third power of the [[L2 norm|L<sup>2</sup> norm]]).

The dependency on <math>d^{1/4}</math> is conjectured to be optimal, but might not be.<ref name=":0">{{Cite journal|last=Raič|first=Martin|date=2019|title=A multivariate Berry--Esseen theorem with explicit constants|journal=Bernoulli|volume=25|issue=4A|pages=2824–2853|doi=10.3150/18-BEJ1072|issn=1350-7265|arxiv=1802.06475|s2cid=119607520}}</ref><!-- did you mean "might not necessarily be"? -->