Editing Central limit theorem (section)

==Independent sequences==
[[File:IllustrationCentralTheorem.png|thumb|upright=1.4 |right|Whatever the form of the population distribution, the sampling distribution tends to a Gaussian, and its dispersion is given by the central limit theorem.<ref>{{cite book |last=Rouaud |first=Mathieu |title=Probability, Statistics and Estimation|year=2013 |page=10 |url=http://www.incertitudes.fr/book.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://www.incertitudes.fr/book.pdf |archive-date=2022-10-09 |url-status=live}}</ref>]]

===Classical CLT===
Let <math>\{X_1, \ldots, X_n}\</math> be a sequence of [[Independent and identically distributed random variables|i.i.d. random variables]] having a distribution with [[expected value]] given by <math>\mu</math> and finite [[variance]] given by <math>\sigma^2.</math> Suppose we are interested in the [[sample mean|sample average]]

<math display="block">\bar{X}_n \equiv \frac{X_1 + \cdots + X_n}{n}.</math>

By the [[law of large numbers]], the sample average [[Almost sure convergence|converges almost surely]] (and therefore also [[Convergence in probability|converges in probability]]) to the expected value <math>\mu</math> as <math>n\to\infty.</math>

The classical central limit theorem describes the size and the distributional form of the {{linktext|stochastic}} fluctuations around the deterministic number <math>\mu</math> during this convergence. More precisely, it states that as <math>n</math> gets larger, the distribution of the normalized mean <math>\sqrt{n}(\bar{X}_n - \mu)</math>, i.e. the difference between the sample average <math>\bar{X}_n</math> and its limit <math>\mu,</math> scaled by the factor <math>\sqrt{n}</math>, approaches the [[normal distribution]] with mean <math>0</math> and variance <math>\sigma^2.</math> For large enough <math>n,</math> the distribution of <math>\bar{X}_n</math> gets arbitrarily close to the normal distribution with mean <math>\mu</math> and variance <math>\sigma^2/n.</math>

The usefulness of the theorem is that the distribution of <math>\sqrt{n}(\bar{X}_n - \mu)</math> approaches normality regardless of the shape of the distribution of the individual <math>X_i.</math> Formally, the theorem can be stated as follows:

{{math theorem | name = Lindeberg–Lévy CLT | math_statement =
Suppose <math> X_1, X_2, X_3 \ldots</math> is a sequence of [[independent and identically distributed|i.i.d.]] random variables with <math>\operatorname E[X_i] = \mu</math> and <math>\operatorname{Var}[X_i] = \sigma^2 < \infty.</math> Then, as <math>n</math> approaches infinity, the random variables <math>\sqrt{n}(\bar{X}_n - \mu)</math> [[convergence in distribution|converge in distribution]] to a [[normal distribution|normal]] <math>\mathcal{N}(0, \sigma^2)</math>:{{sfnp|Billingsley|1995|p=357}}

<math display="block">\sqrt{n}\left(\bar{X}_n - \mu\right) \mathrel{\overset{d}{\longrightarrow}} \mathcal{N}\left(0,\sigma^2\right) .</math>}}

In the case <math>\sigma > 0,</math> convergence in distribution means that the [[cumulative distribution function]]s of <math>\sqrt{n}(\bar{X}_n - \mu)</math> converge pointwise to the cdf of the <math>\mathcal{N}(0, \sigma^2)</math> distribution: for every real number <math>z,</math>

<math display="block">\lim_{n\to\infty} \mathbb{P}\left[\sqrt{n}(\bar{X}_n-\mu) \le z\right] = \lim_{n\to\infty} \mathbb{P}\left[\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma } \le \frac{z}{\sigma}\right]= 
\Phi\left(\frac{z}{\sigma}\right) ,</math>

where <math>\Phi(z)</math> is the standard normal cdf evaluated at <math>z.</math> The convergence is uniform in <math>z</math> in the sense that

<math display="block">\lim_{n\to\infty}\;\sup_{z\in\R}\;\left|\mathbb{P}\left[\sqrt{n}(\bar{X}_n-\mu) \le z\right] - \Phi\left(\frac{z}{\sigma}\right)\right| = 0~,</math>

where <math>\sup</math> denotes the least upper bound (or [[supremum]]) of the set.{{sfnp|Bauer|2001|loc=Theorem 30.13|p=199}}

===Lyapunov CLT===
In this variant of the central limit theorem the random variables <math display="inline">X_i</math> have to be independent, but not necessarily identically distributed. The theorem also requires that random variables <math display="inline">\left| X_i\right|</math> have [[moment (mathematics)|moment]]s of some order {{nowrap|<math display="inline">(2+\delta)</math>,}} and that the rate of growth of these moments is limited by the Lyapunov condition given below.

{{math theorem | name = Lyapunov CLT{{sfnp|Billingsley|1995|p=362}} | math_statement =
Suppose <math display="inline">\{X_1, \ldots, X_n, \ldots\}</math> is a sequence of independent random variables, each with finite expected value <math display="inline">\mu_i</math> and variance {{nowrap|<math display="inline">\sigma_i^2</math>.}} Define

<math display="block">s_n^2 = \sum_{i=1}^n \sigma_i^2 .</math>

If for some {{nowrap|<math display="inline">\delta > 0</math>,}} ''Lyapunov’s condition''

<math display="block">\lim_{n\to\infty} \; \frac{1}{s_{n}^{2+\delta}} \, \sum_{i=1}^{n} \operatorname E\left[\left|X_{i} - \mu_{i}\right|^{2+\delta}\right] = 0</math>

is satisfied, then a sum of <math display="inline">\frac{X_i - \mu_i}{s_n}</math> converges in distribution to a standard normal random variable, as <math display="inline">n</math> goes to infinity:

<math display="block">\frac{1}{s_n}\,\sum_{i=1}^{n} \left(X_i - \mu_i\right) \mathrel{\overset{d}{\longrightarrow}} \mathcal{N}(0,1) .</math>}}

In practice it is usually easiest to check Lyapunov's condition for {{nowrap|<math display="inline">\delta = 1</math>.}}

If a sequence of random variables satisfies Lyapunov's condition, then it also satisfies Lindeberg's condition. The converse implication, however, does not hold.

===Lindeberg (-Feller) CLT===
{{Main|Lindeberg's condition}}

In the same setting and with the same notation as above, the Lyapunov condition can be replaced with the following weaker one (from [[Jarl Waldemar Lindeberg|Lindeberg]] in 1920).

Suppose that for every <math display="inline">\varepsilon > 0</math>,

<math display="block"> \lim_{n \to \infty} \frac{1}{s_n^2}\sum_{i = 1}^{n} \operatorname E\left[(X_i - \mu_i)^2 \cdot \mathbf{1}_{\left\{\left| X_i - \mu_i \right| > \varepsilon s_n \right\}} \right] = 0</math>

where <math display="inline">\mathbf{1}_{\{\ldots\}}</math> is the [[indicator function]]. Then the distribution of the standardized sums

<math display="block">\frac{1}{s_n}\sum_{i = 1}^n \left( X_i - \mu_i \right)</math>

converges towards the standard normal distribution {{nowrap|<math display="inline">\mathcal{N}(0, 1)</math>.}}

===CLT for the sum of a random number of random variables===

Rather than summing an integer number <math>n</math> of random variables and taking <math>n \to \infty</math>, the sum can be of a random number <math>N</math> of random variables, with conditions on <math>N</math>.

{{math theorem | name = Robbins CLT<ref>{{cite journal |last1=Robbins |first1=Herbert |title=The asymptotic distribution of the sum of a random number of random variables |journal=Bull. Amer. Math. Soc. |date=1948 |volume=54 |issue=12 |pages=1151–1161 |doi=10.1090/S0002-9904-1948-09142-X |url=https://projecteuclid.org/journals/bulletin-of-the-american-mathematical-society/volume-54/issue-12/The-asymptotic-distribution-of-the-sum-of-a-random-number/bams/1183513324.full|doi-access=free }}</ref><ref>{{cite book |last1=Chen |first1=Louis H.Y. |last2=Goldstein |first2=Larry |last3=Shao |first3=Qi-Man |title=Normal Approximation by Stein's Method |date=2011 |publisher=Springer-Verlag |location=Berlin Heidelberg |pages=270–271}}</ref> | math_statement =
Let <math>\{X_i, i \geq 1\}</math> be independent, identically distributed random variables with <math>E(X_i) = \mu</math> and <math>\text{Var}(X_i) = \sigma^2</math>, and let <math>\{N_n, n \geq 1\}</math> be a sequence of non-negative integer-valued random variables that are independent of <math>\{X_i, i \geq 1\}</math>. Assume for each <math>n = 1, 2, \dots</math> that <math>E(N_n^2) < \infty</math> and

<math display="block">
  \frac{N_n - E(N_n)}{\sqrt{\text{Var}(N_n)}} \xrightarrow{\quad d \quad} \mathcal{N}(0,1)
</math>

where <math>\xrightarrow{\,d\,}</math> denotes convergence in distribution and <math>\mathcal{N}(0,1)</math> is the normal distribution with mean 0, variance 1.
Then

<math display="block">
  \frac{\sum_{i=1}^{N_n} X_i - \mu E(N_n)}{\sqrt{\sigma^2E(N_n) + \mu^2\text{Var}(N_n)}} \xrightarrow{\quad d \quad} \mathcal{N}(0,1)
</math>
}}

===Multidimensional CLT===
Proofs that use characteristic functions can be extended to cases where each individual <math display="inline">\mathbf{X}_i</math> is a [[random vector]] in {{nowrap|<math display="inline">\R^k</math>,}} with mean vector <math display="inline">\boldsymbol\mu = \operatorname E[\mathbf{X}_i]</math> and [[covariance matrix]] <math display="inline">\mathbf{\Sigma}</math> (among the components of the vector), and these random vectors are independent and identically distributed. The multidimensional central limit theorem states that when scaled, sums converge to a [[multivariate normal distribution]].<ref name="vanderVaart">{{Cite book |last=van der Vaart |first=A.W. |title=Asymptotic statistics |year=1998 |publisher=Cambridge University Press |location=New York, NY |isbn=978-0-521-49603-2 |lccn=98015176}}</ref> Summation of these vectors is done component-wise. 

For <math>i = 1, 2, 3, \ldots,</math> let

<math display="block">\mathbf{X}_i = \begin{bmatrix} X_{i}^{(1)} \\ \vdots \\ X_{i}^{(k)} \end{bmatrix}</math>

be independent random vectors. The sum of the random vectors <math>\mathbf{X}_1, \ldots, \mathbf{X}_n</math> is

<math display="block">\sum_{i=1}^{n} \mathbf{X}_i = \begin{bmatrix} X_{1}^{(1)} \\ \vdots \\ X_{1}^{(k)} 
\end{bmatrix} + \begin{bmatrix} X_{2}^{(1)} \\ \vdots \\ X_{2}^{(k)} \end{bmatrix} + \cdots + \begin{bmatrix} X_{n}^{(1)} \\ \vdots \\ X_{n}^{(k)} \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{n} X_{i}^{(1)} \\ \vdots \\ \sum_{i=1}^{n} X_{i}^{(k)} \end{bmatrix}</math>

and their average is

<math display="block">\mathbf{\bar X_n} = \begin{bmatrix} \bar X_{i}^{(1)} \\ \vdots \\ \bar X_{i}^{(k)} \end{bmatrix} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{X}_i.</math>

Therefore,

<math display="block">\frac{1}{\sqrt{n}} \sum_{i=1}^{n} \left[ \mathbf{X}_i - \operatorname E \left( \mathbf{X}_i \right) \right] = \frac{1}{\sqrt{n}}\sum_{i=1}^{n} ( \mathbf{X}_i - \boldsymbol\mu ) = \sqrt{n}\left(\overline{\mathbf{X}}_n - \boldsymbol\mu\right). </math>

The multivariate central limit theorem states that

<math display="block">\sqrt{n}\left( \overline{\mathbf{X}}_n - \boldsymbol\mu \right) \mathrel{\overset{d}{\longrightarrow}} \mathcal{N}_k(0,\boldsymbol\Sigma),</math>
where the [[covariance matrix]] <math>\boldsymbol{\Sigma}</math> is equal to
<math display="block"> \boldsymbol\Sigma = \begin{bmatrix}
{\operatorname{Var} \left (X_{1}^{(1)} \right)} & \operatorname{Cov} \left (X_{1}^{(1)},X_{1}^{(2)} \right) & \operatorname{Cov} \left (X_{1}^{(1)},X_{1}^{(3)} \right) & \cdots & \operatorname{Cov} \left (X_{1}^{(1)},X_{1}^{(k)} \right) \\
\operatorname{Cov} \left (X_{1}^{(2)},X_{1}^{(1)} \right) & \operatorname{Var} \left( X_{1}^{(2)} \right) & \operatorname{Cov} \left(X_{1}^{(2)},X_{1}^{(3)} \right) & \cdots & \operatorname{Cov} \left(X_{1}^{(2)},X_{1}^{(k)} \right) \\
\operatorname{Cov}\left (X_{1}^{(3)},X_{1}^{(1)} \right) & \operatorname{Cov} \left (X_{1}^{(3)},X_{1}^{(2)} \right) & \operatorname{Var} \left (X_{1}^{(3)} \right) & \cdots & \operatorname{Cov} \left (X_{1}^{(3)},X_{1}^{(k)} \right) \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\operatorname{Cov} \left (X_{1}^{(k)},X_{1}^{(1)} \right) & \operatorname{Cov} \left (X_{1}^{(k)},X_{1}^{(2)} \right) & \operatorname{Cov} \left (X_{1}^{(k)},X_{1}^{(3)} \right) & \cdots & \operatorname{Var} \left (X_{1}^{(k)} \right) \\
\end{bmatrix}~.</math>

The multivariate central limit theorem can be proved using the [[Cramér–Wold theorem]].<ref name="vanderVaart"/>

The rate of convergence is given by the following [[Berry–Esseen theorem|Berry–Esseen]] type result:

{{math theorem | name = Theorem<ref>{{cite web |first=Ryan |last=O’Donnell | author-link = Ryan O'Donnell (computer scientist) |year=2014 |title=Theorem&nbsp;5.38 |url=http://www.contrib.andrew.cmu.edu/~ryanod/?p=866 |access-date=2017-10-18 |archive-date=2019-04-08 |archive-url=https://web.archive.org/web/20190408054104/http://www.contrib.andrew.cmu.edu/~ryanod/?p=866 |url-status=dead }}</ref> | math_statement =
Let <math>X_1, \dots, X_n, \dots</math> be independent <math>\R^d</math>-valued random vectors, each having mean zero. Write <math>S =\sum^n_{i=1}X_i</math> and assume <math>\Sigma = \operatorname{Cov}[S]</math> is invertible. Let <math>Z \sim \mathcal{N}(0,\Sigma)</math> be a <math>d</math>-dimensional Gaussian with the same mean and same covariance matrix as <math>S</math>. Then for all convex sets {{nowrap|<math>U \subseteq \R^d</math>,}}

<math display="block">\left|\mathbb{P}[S \in U] - \mathbb{P}[Z \in U]\right| \le C \, d^{1/4} \gamma~,</math>
where <math>C</math> is a universal constant, {{nowrap|<math>\gamma = \sum^n_{i=1} \operatorname E \left[\left\| \Sigma^{-1/2}X_i\right\|^3_2\right]</math>,}} and <math>\|\cdot\|_2</math> denotes the Euclidean norm on {{nowrap|<math>\R^d</math>.}}
}}

It is unknown whether the factor <math display="inline">d^{1/4}</math> is necessary.<ref>{{cite journal |first=V. |last=Bentkus |title=A Lyapunov-type bound in <math>\R^d</math> |journal=Theory Probab. Appl. |volume=49 |year=2005 |issue=2 |pages=311–323 |doi=10.1137/S0040585X97981123 }}</ref>