Editing Order statistic (section)

=== Probability distributions of order statistics ===

==== Order statistics sampled from a uniform distribution ====

In this section we show that the order statistics of the [[uniform distribution (continuous)|uniform distribution]] on the [[unit interval]] have [[marginal distribution]]s belonging to the [[beta distribution]] family. We also give a simple method to derive the joint distribution of any number of order statistics, and finally translate these results to arbitrary continuous distributions using the [[cumulative distribution function|cdf]].

We assume throughout this section that <math>X_1, X_2, \ldots, X_n</math> is a [[random sample]] drawn from a continuous distribution with cdf <math>F_X</math>. Denoting <math>U_i=F_X(X_i)</math> we obtain the corresponding random sample <math>U_1,\ldots,U_n</math> from the standard [[uniform distribution (continuous)|uniform distribution]]. Note that the order statistics also satisfy <math>U_{(i)}=F_X(X_{(i)})</math>.

The probability density function of the order statistic <math>U_{(k)}</math> is equal to<ref name="gentle">{{citation|title=Computational Statistics|first=James E.|last=Gentle|publisher=Springer|year=2009|isbn=9780387981444|page=63|url=https://books.google.com/books?id=mQ5KAAAAQBAJ&pg=PA63}}.</ref>

:<math>f_{U_{(k)}}(u)={n!\over (k-1)!(n-k)!}u^{k-1}(1-u)^{n-k}</math>

that is, the ''k''th order statistic of the uniform distribution is a [[beta distribution|beta-distributed]] random variable.<ref name="gentle"/><ref>{{citation|title=Kumaraswamy's distribution: A beta-type distribution with some tractability advantages|first=M. C.|last=Jones|journal=Statistical Methodology|volume=6|issue=1|year=2009|pages=70–81|doi=10.1016/j.stamet.2008.04.001|quote=As is well known, the beta distribution is the distribution of the ''m'' ’th order statistic from a random sample of size ''n'' from the uniform distribution (on (0,1)).}}</ref>

:<math>U_{(k)} \sim \operatorname{Beta}(k,n+1\mathbf{-}k).</math>

The proof of these statements is as follows. For <math>U_{(k)}</math> to be between ''u'' and ''u''&nbsp;+&nbsp;''du'', it is necessary that exactly ''k''&nbsp;−&nbsp;1 elements of the sample are smaller than ''u'', and that at least one is between ''u'' and ''u''&nbsp;+&nbsp;d''u''. The probability that more than one is in this latter interval is already <math>O(du^2)</math>, so we have to calculate the probability that exactly ''k''&nbsp;−&nbsp;1, 1 and ''n''&nbsp;−&nbsp;''k'' observations fall in the intervals <math>(0,u)</math>, <math>(u,u+du)</math> and <math>(u+du,1)</math> respectively. This equals (refer to [[multinomial distribution]] for details)

:<math>{n!\over (k-1)!(n-k)!}u^{k-1}\cdot du\cdot(1-u-du)^{n-k}</math>

and the result follows.

The mean of this distribution is ''k'' / (''n'' + 1).

==== The joint distribution of the order statistics of the uniform distribution ====

Similarly, for ''i''&nbsp;<&nbsp;''j'', the [[joint probability distribution|joint probability density function]] of the two order statistics ''U''<sub>(''i'')</sub>&nbsp;<&nbsp;''U''<sub>(''j'')</sub> can be shown to be

:<math>f_{U_{(i)},U_{(j)}}(u,v) = n!{u^{i-1}\over (i-1)!}{(v-u)^{j-i-1}\over(j-i-1)!}{(1-v)^{n-j}\over (n-j)!}</math>

which is (up to terms of higher order than <math>O(du\,dv)</math>) the probability that ''i''&nbsp;−&nbsp;1, 1, ''j''&nbsp;−&nbsp;1&nbsp;−&nbsp;''i'', 1 and ''n''&nbsp;−&nbsp;''j'' sample elements fall in the intervals <math>(0,u)</math>, <math>(u,u+du)</math>, <math>(u+du,v)</math>, <math>(v,v+dv)</math>, <math>(v+dv,1)</math> respectively.

One reasons in an entirely analogous way to derive the higher-order joint distributions. Perhaps surprisingly, the joint density of the ''n'' order statistics turns out to be ''constant'':

:<math>f_{U_{(1)},U_{(2)},\ldots,U_{(n)}}(u_{1},u_{2},\ldots,u_{n}) = n!.</math>

One way to understand this is that the unordered sample does have constant density equal to 1, and that there are ''n''! different permutations of the sample corresponding to the same sequence of order statistics. This is related to the fact that 1/''n''! is the volume of the region <math>0<u_1<\cdots<u_n<1</math>. It is also related with another particularity of  order statistics of uniform random variables: It follows from the [[BRS-inequality]] that the maximum expected number of uniform U(0,1] random variables one can choose from a sample of size n with a sum up not exceeding <math>0 <s <n/2</math> is bounded above by
<math> \sqrt{2sn} </math>, which is thus invariant on the set of all <math> s, n </math>
with constant product <math> s n </math>.

Using the above formulas, one can derive the distribution of the range of the order statistics, that is the distribution of <math>U_{(n)}-U_{(1)}</math>, i.e. maximum minus the minimum. More generally, for <math>n\geq k>j\geq 1</math>, <math>U_{(k)}-U_{(j)} </math> also has a beta distribution: <math display="block">U_{(k)}-U_{(j)}\sim \operatorname{Beta}(k-j, n-(k-j)+1)</math>From these formulas we can derive the covariance between two order statistics:<math display="block">\operatorname{Cov}(U_{(k)},U_{(j)})=\frac{j(n-k+1)}{(n+1)^2(n+2)}</math>The formula follows from noting that <math display="block">\operatorname{Var}(U_{(k)}-U_{(j)})=\operatorname{Var}(U_{(k)}) + \operatorname{Var}(U_{(j)})-2\cdot \operatorname{Cov}(U_{(k)},U_{(j)})
=\frac{k(n-k+1)}{(n+1)^2(n+2)}+\frac{j(n-j+1)}{(n+1)^2(n+2)}-2\cdot \operatorname{Cov}(U_{(k)},U_{(j)})</math>and comparing that with <math display="block">\operatorname{Var}(U)=\frac{(k-j)(n-(k-j)+1)}{(n+1)^2(n+2)}</math>where <math>U\sim \operatorname{Beta}(k-j,n-(k-j)+1)</math>, which is the actual distribution of the difference.

==== Order statistics sampled from an exponential distribution ====

For <math>X_1, X_2, .., X_n</math> a random sample of size ''n'' from an [[exponential distribution]] with parameter ''λ'', the order statistics ''X''<sub>(''i'')</sub> for ''i'' = 1,2,3, ..., ''n'' each have distribution

::<math>X_{(i)} \stackrel{d}{=} \frac{1}{\lambda}\left( \sum_{j=1}^i \frac{Z_j}{n-j+1} \right)</math>

where the ''Z''<sub>''j''</sub> are iid standard exponential random variables (i.e. with rate parameter 1). This result was first published by [[Alfréd Rényi]].<ref>{{Citation
 | last1 = David
 | first1 = H. A.
 | last2 = Nagaraja
 | first2 = H. N.
 | title = Order Statistics
 | pages = 9
 | year = 2003
 | chapter = Chapter 2. Basic Distribution Theory
 | doi = 10.1002/0471722162.ch2 | series = Wiley Series in Probability and Statistics
 | isbn = 9780471722168
 }}</ref><ref>{{cite journal
 |last        = Rényi
 |first       = Alfréd | author-link = Alfréd Rényi
 |title       = On the theory of order statistics
 |journal     = [[Acta Mathematica Hungarica]]
 |volume      = 4
 |issue       = 3
 |pages       = 191–231
 |date        = 1953
 |doi         = 10.1007/BF02127580 | doi-access=free
}}</ref>

==== Order statistics sampled from an Erlang distribution ====

The [[Laplace transform]] of order statistics may be sampled from an [[Erlang distribution]] via a path counting method {{Clarify|reason=Unclear: Scope and relevance. Order statistics of Erlang RVs? Are we sampling from Erlang RVs to learn simulate order statistics of other RVs. Why do we care?|date=February 2019}}.<ref>{{Cite journal | last1 = Hlynka | first1 = M. | last2 = Brill | first2 = P. H. | last3 = Horn | first3 = W. | title = A method for obtaining Laplace transforms of order statistics of Erlang random variables | doi = 10.1016/j.spl.2009.09.006 | journal = Statistics & Probability Letters | volume = 80 | pages = 9–18 | year = 2010 }}</ref>

==== The joint distribution of the order statistics of an absolutely continuous distribution ====

If ''F''<sub>''X''</sub> is [[absolute continuity|absolutely continuous]], it has a density such that <math>dF_X(x)=f_X(x)\,dx</math>, and we can use the substitutions

:<math>u=F_X(x)</math>

and

:<math>du=f_X(x)\,dx</math>

to derive the following probability density functions for the order statistics of a sample of size ''n'' drawn from the distribution of ''X'':

:<math>f_{X_{(k)}}(x) =\frac{n!}{(k-1)!(n-k)!}[F_X(x)]^{k-1}[1-F_X(x)]^{n-k} f_X(x)</math>

:<math>f_{X_{(j)},X_{(k)}}(x,y) = \frac{n!}{(j-1)!(k-j-1)!(n-k)!}[F_X(x)]^{j-1}[F_X(y)-F_X(x)]^{k-1-j}[1-F_X(y)]^{n-k}f_X(x)f_X(y)</math> where <math>x\le y</math>

:<math>f_{X_{(1)},\ldots,X_{(n)}}(x_1,\ldots,x_n)=n!f_X(x_1)\cdots f_X(x_n)</math> where <math>x_1\le x_2\le \dots \le x_n.</math>