Editing Convergence of random variables (section)

== Convergence in probability ==
{{Infobox
  | title = Examples of convergence in probability
  | bodystyle = width: 28em;
  | headerstyle = background-color: lightblue; text-align: left; padding-left: 3pt;
  | datastyle = text-align: left;
  | header1 = Height of a person
  | data2 = Consider the following experiment. First, pick a random person in the street. Let {{mvar|X}} be their height, which is ''ex ante'' a random variable. Then ask other people to estimate this height by eye. Let {{mvar|X<sub>n</sub>}} be the average of the first {{mvar|n}} responses. Then (provided there is no [[systematic error]]) by the [[law of large numbers]], the sequence {{mvar|X<sub>n</sub>}} will converge in probability to the random variable {{mvar|X}}.
  | header3 = Predicting random number generation
  | data4 = Suppose that a random number generator generates a pseudorandom floating point number between 0 and 1. Let random variable {{mvar|X}} represent the distribution of possible outputs by the algorithm. Because the pseudorandom number is generated deterministically, its next value is not truly random. Suppose that as you observe a sequence of randomly generated numbers, you can deduce a pattern and make increasingly accurate predictions as to what the next randomly generated number will be. Let {{mvar|X<sub>n</sub>}} be your guess of the value of the next random number after observing the first {{mvar|n}} random numbers. As you learn the pattern and your guesses become more accurate, not only will the distribution of {{mvar|X<sub>n</sub>}} converge to the distribution of {{mvar|X}}, but the outcomes of {{mvar|X<sub>n</sub>}} will converge to the outcomes of {{mvar|X}}.

  }}

The basic idea behind this type of convergence is that the probability of an “unusual” outcome becomes smaller and smaller as the sequence progresses.

The concept of convergence in probability is used very often in statistics. For example, an estimator is called [[consistent estimator|consistent]] if it converges in probability to the quantity being estimated. Convergence in probability is also the type of convergence established by the [[weak law of large numbers]].

=== Definition ===
A sequence {''X''<sub>''n''</sub>} of random variables '''converges in probability''' towards the random variable ''X'' if for all ''ε'' > 0

: <math>\lim_{n\to\infty}\mathbb{P}\big(|X_n-X| > \varepsilon\big) = 0.</math>

More explicitly, let ''P''<sub>''n''</sub>(''ε'') be the probability that ''X''<sub>''n''</sub> is outside the ball of radius ''ε'' centered at&nbsp;''X''. Then {{mvar|X<sub>n</sub>}} is said to converge in probability to ''X'' if for any {{math|''ε'' > 0}} and any ''δ''&nbsp;>&nbsp;0 there exists a number ''N'' (which may depend on ''ε'' and ''δ'') such that for all ''n''&nbsp;≥&nbsp;''N'', ''P''<sub>''n''</sub>(''ε'')&nbsp;<&nbsp;''δ'' (the definition of limit).

Notice that for the condition to be satisfied, it is not possible that for each ''n'' the random variables ''X'' and ''X''<sub>''n''</sub> are independent (and thus convergence in probability is a condition on the joint cdf's, as opposed to convergence in distribution, which is a condition on the individual cdf's), unless ''X'' is deterministic like for the weak law of large numbers. At the same time, the case of a deterministic ''X'' cannot, whenever the deterministic value is a discontinuity point (not isolated), be handled by convergence in distribution, where discontinuity points have to be explicitly excluded.

Convergence in probability is denoted by adding the letter ''p'' over an arrow indicating convergence, or using the "plim" probability limit operator:
{{NumBlk|:| <math>X_n \ \xrightarrow{p}\ X,\ \ X_n \ \xrightarrow{P}\ X,\ \ \underset{n\to\infty}{\operatorname{plim}}\, X_n = X.</math>|{{EquationRef|2}}}}

For random elements {''X''<sub>''n''</sub>} on a [[separable metric space]] {{math|(''S'', ''d'')}}, convergence in probability is defined similarly by<ref>{{harvnb|Dudley|2002|loc=Chapter 9.2, page 287}}</ref>
: <math>\forall\varepsilon>0, \mathbb{P}\big(d(X_n,X)\geq\varepsilon\big) \to 0.</math>

===Properties===
* Convergence in probability implies convergence in distribution.<sup>[[Proofs of convergence of random variables#propA2|[proof]]]</sup>
* In the opposite direction, convergence in distribution implies convergence in probability when the limiting random variable ''X'' is a constant.<sup>[[Proofs of convergence of random variables#propB1|[proof]]]</sup>
* Convergence in probability does not imply almost sure convergence.<sup>[[Proofs of convergence of random variables#propA1i|[proof]]]</sup>
* The [[continuous mapping theorem]] states that for every continuous function <math>g</math>, if <math display="inline">X_n \xrightarrow{p} X</math>, then also&thinsp;{{nowrap|<math display="inline">g(X_n)\xrightarrow{p}g(X)</math>.}}
* Convergence in probability defines a [[topology]] on the space of random variables over a fixed probability space. This topology is [[metrizable]] by the ''[[Ky Fan]] metric'':<ref>{{harvnb|Dudley|2002|page=289}}</ref> <math style="position:relative;top:.3em" display="block">d(X,Y) = \inf\!\big\{ \varepsilon>0:\ \mathbb{P}\big(|X-Y|>\varepsilon\big)\leq\varepsilon\big\}</math> or alternately by this metric <math display="block">d(X,Y)=\mathbb E\left[\min(|X-Y|, 1)\right].</math>

===Counterexamples===
Not every sequence of random variables which converges to another random variable in distribution also converges in probability to that random variable. As an example, consider a sequence of standard normal random variables <math>X_n</math> and a second sequence <math>Y_n = (-1)^nX_n</math>. Notice that the distribution of <math>Y_n</math> is equal to the distribution of <math>X_n</math> for all <math>n</math>, but:
<math display="block">P(|X_n - Y_n| \geq \epsilon) = P(|X_n|\cdot|(1 - (-1)^n)| \geq \epsilon)</math>

which does not converge to <math>0</math>. So we do not have convergence in probability.