Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Order statistic
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Application: Non-parametric density estimation == Moments of the distribution for the first order statistic can be used to develop a non-parametric density estimator.<ref>{{cite journal |last1= Garg|first1= Vikram V.|last2= Tenorio|first2= Luis|last3= Willcox|first3= Karen|author3-link=Karen Willcox| date= 2017|title= Minimum local distance density estimation.|journal= Communications in Statistics - Theory and Methods|volume= 46|issue= 1|pages= 148β164|doi= 10.1080/03610926.2014.988260|arxiv= 1412.2851|s2cid= 14334678}}</ref> Suppose, we want to estimate the density <math>f_{X}</math> at the point <math>x^*</math>. Consider the random variables <math>Y_i = |X_i - x^*|</math>, which are i.i.d with distribution function <math>g_Y(y) = f_X(y + x^*) + f_X(x^* - y)</math>. In particular, <math>f_X(x^*) = \frac{g_Y(0)}{2}</math>. The expected value of the first order statistic <math>Y_{(1)}</math> given a sample of <math>N</math> total observations yields, : <math> E(Y_{(1)}) = \frac{1}{(N+1) g(0)} + \frac{1}{(N+1)(N+2)} \int_{0}^{1} Q''(z) \delta_{N+1}(z) \, dz</math> where <math>Q</math> is the quantile function associated with the distribution <math>g_{Y}</math>, and <math>\delta_N(z) = (N+1)(1-z)^N</math>. This equation in combination with a [[Jackknife resampling|jackknifing]] technique becomes the basis for the following density estimation algorithm, Input: A sample of <math>N</math> observations. <math>\{x_\ell\}_{\ell=1}^M</math> points of density evaluation. Tuning parameter <math>a \in (0,1)</math> (usually 1/3). Output: <math>\{\hat{f}_\ell\}_{\ell=1}^M</math> estimated density at the points of evaluation. 1: Set <math>m_N = \operatorname{round}(N^{1-a})</math> 2: Set <math>s_N = \frac{N}{m_N}</math> 3: Create an <math>s_N \times m_N</math> matrix <math>M_{ij}</math> which holds <math>m_N</math> subsets with <math>s_N</math> observations each. 4: Create a vector <math>\hat{f}</math> to hold the density evaluations. 5: '''for''' <math>\ell = 1 \to M</math> '''do''' 6: '''for''' <math>k = 1 \to m_N</math> '''do''' 7: Find the nearest distance <math>d_{\ell k}</math> to the current point <math>x_\ell</math> within the <math>k</math>th subset 8: '''end for''' 9: Compute the subset average of distances to <math>x_\ell:d_\ell = \sum_{k=1}^{m_N} \frac{d_{\ell k}}{m_N}</math> 10: Compute the density estimate at <math>x_\ell:\hat{f}_\ell = \frac{1}{2 (1+ s_N) d_\ell}</math> 11: '''end for''' 12: '''return''' <math>\hat{f}</math> In contrast to the bandwidth/length based tuning parameters for [[histogram]] and [[Kernel density estimation|kernel]] based approaches, the tuning parameter for the order statistic based density estimator is the size of sample subsets. Such an estimator is more robust than histogram and kernel based approaches, for example densities like the Cauchy distribution (which lack finite moments) can be inferred without the need for specialized modifications such as [[Freedman-Diaconis rule|IQR based bandwidths]]. This is because the first moment of the order statistic always exists if the expected value of the underlying distribution does, but the converse is not necessarily true.<ref>{{Citation | last1 = David | first1 = H. A. | last2 = Nagaraja | first2 = H. N. | title = Order Statistics | pages = 34 | year = 2003 | chapter = Chapter 3. Expected Values and Moments | doi = 10.1002/0471722162.ch3 | series = Wiley Series in Probability and Statistics | isbn = 9780471722168 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)