Editing Cross-validation (statistics) (section)

===Exhaustive cross-validation ===
Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set.

====Leave-p-out cross-validation====
Leave-''p''-out cross-validation ('''LpO CV''') involves using ''p'' observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of ''p'' observations and a training set.<ref>{{cite journal |last1=Celisse |first1=Alain |title=Optimal cross-validation in density estimation with the L<sup>2</sup>-loss |journal=The Annals of Statistics |date=October 2014 |volume=42 |issue=5 |doi=10.1214/14-AOS1240 |arxiv=0811.0802 }}</ref>

LpO cross-validation require training and validating the model <math>C^n_p</math> times, where ''n'' is the number of observations in the original sample, and where <math>C^n_p</math> is the [[binomial coefficient]].  For ''p'' > 1 and for even moderately large ''n'', LpO CV can become computationally infeasible.  For example, with ''n'' = 100 and ''p'' = 30, <math>C^{100}_{30} \approx 3\times 10^{25}.</math>

A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under [[ROC curve]] of binary classifiers.<ref>{{cite journal |last1=Airola |first1=Antti |last2=Pahikkala |first2=Tapio |last3=Waegeman |first3=Willem |last4=De Baets |first4=Bernard |last5=Salakoski |first5=Tapio |title=An experimental comparison of cross-validation techniques for estimating the area under the ROC curve |journal=Computational Statistics & Data Analysis |date=April 2011 |volume=55 |issue=4 |pages=1828–1844 |doi=10.1016/j.csda.2010.11.018 }}</ref>

====Leave-one-out cross-validation====<!-- This section is linked from [[Data mining]] -->
[[File:LOOCV.gif|right|thumb|300x300px|Illustration of leave-one-out cross-validation (LOOCV) when n = 8 observations. A total of 8 models will be trained and tested.|alt=]]
Leave-''one''-out cross-validation ('''LOOCV''') is a particular case of leave-''p''-out cross-validation with ''p''&nbsp;=&nbsp;1. The process looks similar to [[Jackknife resampling|jackknife]]; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only.

LOO cross-validation requires less computation time than LpO cross-validation because there are only <math>C^n_1=n</math> passes rather than <math>C^n_p</math>. However, <math>n</math> passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate.<ref>{{cite journal |last1=Molinaro |first1=Annette M. |last2=Simon |first2=Richard |last3=Pfeiffer |first3=Ruth M. |title=Prediction error estimation: a comparison of resampling methods |journal=Bioinformatics |date=August 2005 |volume=21 |issue=15 |pages=3301–3307 |doi=10.1093/bioinformatics/bti499 |pmid=15905277 |doi-access=free }}</ref>

'''Pseudo-code algorithm:'''

'''Input:'''

<code>x</code>, {vector of length <code>N</code> with x-values of incoming points}

<code>y</code>, {vector of length <code>N</code> with y-values of the expected result}

<code>interpolate( x_in, y_in, x_out )</code>, { returns the estimation for point <code>x_out</code> after the model is trained with <code>x_in</code>-<code>y_in</code> pairs}

'''Output:'''

<code>err</code>, {estimate for the prediction error}

'''Steps:'''

  err ← 0
  for i ← 1, ..., N do
    // define the cross-validation subsets
    x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N])
    y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N])
    x_out ← x[i]
    y_out ← interpolate(x_in, y_in, x_out)
    err ← err + (y[i] − y_out)^2
  end for
  err ← err/N