Editing Convergence of random variables (section)

== Convergence in distribution{{anchor|Convergence in distribution}} ==
{{Infobox
  | title = Examples of convergence in distribution
  | bodystyle = width: 28em;
  | headerstyle = background-color: lightblue; text-align: left; padding-left: 3pt;
  | datastyle = text-align: left;
  | header1 = Dice factory
  | data2   = Suppose a new dice factory has just been built. The first few dice come out quite biased, due to imperfections in the production process. The outcome from tossing any of them will follow a distribution markedly different from the desired [[uniform distribution (discrete)|uniform distribution]]. <br /><br />As the factory is improved, the dice become less and less loaded, and the outcomes from tossing a newly produced die will follow the uniform distribution more and more closely.
  | header3 = Tossing coins
  | data4   = Let {{mvar|X<sub>n</sub>}} be the fraction of heads after tossing up an unbiased coin {{mvar|n}} times. Then {{math|''X''<sub>1</sub>}} has the [[Bernoulli distribution]] with expected value {{math|''μ'' {{=}} 0.5}} and variance {{math|''σ''<sup>2</sup> {{=}} 0.25}}. The subsequent random variables {{math|''X''<sub>2</sub>, ''X''<sub>3</sub>, ...}} will all be distributed [[Binomial distribution|binomially]].<br /><br />As {{mvar|n}} grows larger, this distribution will gradually start to take shape more and more similar to the [[Normal distribution|bell curve]] of the normal distribution. If we shift and rescale {{math|''X<sub>n</sub>''}} appropriately, then <math>\scriptstyle Z_n = \frac{\sqrt{n}}{\sigma}(X_n-\mu)</math> will be '''converging in distribution''' to the standard normal, the result that follows from the celebrated [[central limit theorem]].
  | header5 = Graphic example
  | data6   = Suppose {{math|{''X<sub>i</sub>''} }} is an [[Independent and identically distributed|iid]] sequence of [[uniform distribution (continuous)|uniform]] {{math|''U''(−1, 1)}} random variables. Let <math>\scriptstyle Z_n = {\scriptscriptstyle\frac{1}{\sqrt{n}}}\sum_{i=1}^n X_i</math> be their (normalized) sums. Then according to the [[central limit theorem]], the distribution of {{mvar|Z<sub>n</sub>}} approaches the normal {{math|''N''(0, {{sfrac|1|3}})}} distribution. This convergence is shown in the picture: as {{mvar|n}} grows larger, the shape of the probability density function gets closer and closer to the Gaussian curve.
[[Image:Convergence in distribution (sum of uniform rvs).gif|center|200px]]}}

Loosely, with this mode of convergence,  we increasingly expect to see the next outcome in a sequence of random experiments becoming better and better modeled by a given [[probability distribution]]. More precisely, the distribution of the associated random variable in the sequence becomes arbitrarily close to a specified fixed distribution.

Convergence in distribution is the weakest form of convergence typically discussed, since it is implied by all other types of convergence mentioned in this article. However, convergence in distribution is very frequently used in practice; most often it arises from application of the [[central limit theorem]].

===Definition===
A sequence <math>X_1, X_2, \ldots </math> of real-valued [[random variable]]s, with [[cumulative distribution function]]s <math>F_1, F_2, \ldots </math>, is said to '''converge in distribution''', or '''converge weakly''', or '''converge in law''' to a random variable {{mvar|X}} with [[cumulative distribution function]] {{mvar|F}} if

: <math>\lim_{n\to\infty} F_n(x) = F(x),</math>

for every number <math>x \in \mathbb{R}</math> at which <math> F </math> is [[continuous function|continuous]].

The requirement that only the continuity points of <math> F </math> should be considered is essential. For example, if <math> X_n </math> are distributed [[Uniform distribution (continuous)|uniformly]] on intervals <math> \left( 0,\frac{1}{n} \right)  </math>, then this sequence converges in distribution to the [[degenerate distribution|degenerate]] random variable <math>  X=0 </math>. Indeed, <math>  F_n(x) =  0 </math> [[existential quantification|for all]] <math> n  </math> when <math>   x\leq 0</math>, and <math> F_n(x) = 1  </math> for all <math> x \geq \frac{1}{n}  </math>when <math> n > 0  </math>. However, for this limiting random variable <math> F(0) = 1  </math>, even though <math> F_n(0) = 0  </math> for all <math> n </math>. Thus the convergence of cdfs fails at the point <math> x=0  </math> where <math> F  </math> is discontinuous.

Convergence in distribution may be denoted as

{{NumBlk|:|<math>\begin{align}
  {}
  & X_n \ \xrightarrow{d}\ X,\ \ 
    X_n \ \xrightarrow{\mathcal{D}}\ X,\ \ 
    X_n \ \xrightarrow{\mathcal{L}}\ X,\ \ 
    X_n \ \xrightarrow{d}\ \mathcal{L}_X, \\
  & X_n \rightsquigarrow X,\ \ 
    X_n \Rightarrow X,\ \ 
    \mathcal{L}(X_n)\to\mathcal{L}(X),\\ 
  \end{align}</math>
|{{EquationRef|1}}}}

where <math>\scriptstyle\mathcal{L}_X</math> is the law (probability distribution) of {{mvar|X}}. For example, if {{mvar|X}} is standard normal we can write <math style="height:1.5em;position:relative;top:-.3em">X_n\,\xrightarrow{d}\,\mathcal{N}(0,\,1)</math>.

For [[random vector]]s <math>\left\{ X_1,X_2,\dots \right\}\subset \mathbb{R}^k</math> the convergence in distribution is defined similarly. We say that this sequence '''converges in distribution''' to a random {{mvar|k}}-vector {{mvar|X}} if
: <math>\lim_{n\to\infty} \mathbb{P}(X_n\in A) = \mathbb{P}(X\in A)</math>
for every <math>A\subset \mathbb{R}^k</math> which is a [[continuity set]] of {{mvar|X}}.

The definition of convergence in distribution may be extended from random vectors to more general [[random element]]s in arbitrary [[metric space]]s, and even to the “random variables” which are not measurable — a situation which occurs for example in the study of [[empirical process]]es. This is the “weak convergence of laws without laws being defined” — except asymptotically.<ref>{{harvnb|Bickel|Klaassen|Ritov|Wellner|1998|loc=A.8, page 475}}</ref>

In this case the term '''weak convergence''' is preferable (see [[weak convergence of measures]]), and we say that a sequence of random elements {{math|{''X<sub>n</sub>''} }} converges weakly to {{mvar|X}} (denoted as {{math|''X<sub>n</sub>'' ⇒ ''X''}}) if
: <math>\mathbb{E}^*h(X_n) \to \mathbb{E}\,h(X)</math>
for all continuous bounded functions {{mvar|h}}.<ref>{{harvnb|van der Vaart|Wellner|1996|page=4}}</ref> Here E* denotes the ''outer expectation'', that is the expectation of a “smallest measurable function {{mvar|g}} that dominates {{math|''h''(''X<sub>n</sub>'')}}”.

===Properties===
* Since <math>F(a) = \mathbb{P}(X \le a)</math>, the convergence in distribution means that the probability for {{mvar|X<sub>n</sub>}} to be in a given range is approximately equal to the probability that the value of {{mvar|X}} is in that range, provided {{mvar|n}} is [[sufficiently large]].
*In general, convergence in distribution does not imply that the sequence of corresponding [[probability density function]]s will also converge. As an example one may consider random variables with densities {{math|''f<sub>n</sub>''(''x'') {{=}} (1 + cos(2''πnx''))'''1'''<sub>(0,1)</sub>}}. These random variables converge in distribution to a uniform ''U''(0,&thinsp;1), whereas their densities do not converge at all.<ref>{{harvnb|Romano|Siegel|1985|loc=Example 5.26}}</ref>
** However, according to ''Scheffé’s theorem'', convergence of the [[probability density function]]s implies convergence in distribution.<ref name="Durrett">{{cite book|last1=Durrett|first1=Rick|title=Probability: Theory and Examples|date=2010|page=84}}</ref>
* The [[portmanteau lemma]] provides several equivalent definitions of convergence in distribution. Although these definitions are less intuitive, they are used to prove a number of statistical theorems. The lemma states that {{math|{''X<sub>n</sub>''} }} converges in distribution to {{mvar|X}} if and only if any of the following statements are true:<ref>{{harvnb|van der Vaart|1998|loc=Lemma 2.2}}</ref>
** <math>\mathbb{P}(X_n \le x) \to \mathbb{P}(X \le x)</math> for all continuity points of <math>x\mapsto \mathbb{P}(X \le x)</math>;
** <math>\mathbb{E}f(X_n) \to \mathbb{E}f(X)</math> for all [[Bounded function|bounded]], [[continuous function]]s <math>f</math> (where <math>\mathbb{E}</math> denotes the [[expected value]] operator);
** <math>\mathbb{E}f(X_n) \to \mathbb{E}f(X)</math> for all bounded, [[Lipschitz function]]s <math>f</math>;
** <math>\lim\inf \mathbb{E}f(X_n) \ge \mathbb{E}f(X)</math> for all nonnegative, continuous functions <math>f</math>;
** <math>\lim\inf \mathbb{P}(X_n \in G) \ge \mathbb{P}(X \in G)</math> for every [[open set]] <math>G</math>;
** <math>\lim\sup \mathbb{P}(X_n \in F) \le \mathbb{P}(X \in F)</math> for every [[closed set]] <math>F</math>;
** <math>\mathbb{P}(X_n \in B) \to \mathbb{P}(X \in B)</math> for all [[continuity set]]s <math>B</math> of random variable <math>X</math>;
** <math>\limsup \mathbb{E}f(X_n) \le \mathbb{E}f(X)</math> for every [[upper semi-continuous]] function <math>f</math> bounded above;{{citation needed|date=February 2013}}
** <math>\liminf \mathbb{E}f(X_n) \ge \mathbb{E}f(X)</math> for every [[lower semi-continuous]] function <math>f</math> bounded below.{{citation needed|date=February 2013}}
* The [[continuous mapping theorem]] states that for a continuous function {{mvar|g}}, if the sequence {{math|{''X<sub>n</sub>''} }} converges in distribution to {{mvar|X}}, then {{math|{''g''(''X<sub>n</sub>'')} }} converges in distribution to {{math|''g''(''X'')}}.
** Note however that convergence in distribution of {{math|{''X<sub>n</sub>''} }} to {{mvar|X}} and {{math|{''Y<sub>n</sub>''} }} to {{mvar|Y}} does in general ''not'' imply convergence in distribution of {{math|{''X<sub>n</sub>'' + ''Y<sub>n</sub>''} }} to {{math|''X'' + ''Y''}} or of {{math|{''X<sub>n</sub>Y<sub>n</sub>''} }} to {{mvar|XY}}.
* [[Lévy’s continuity theorem]]: The sequence {{math|{''X<sub>n</sub>''} }} converges in distribution to {{mvar|X}} if and only if the sequence of corresponding [[characteristic function (probability theory)|characteristic function]]s {{math|{''φ<sub>n</sub>''} }} [[pointwise convergence|converges pointwise]] to the characteristic function {{mvar|φ}} of {{mvar|X}}.
* Convergence in distribution is [[metrizable]] by the [[Lévy–Prokhorov metric]].
* A natural link to convergence in distribution is the [[Skorokhod's representation theorem]].