Editing Cross-validation (statistics) (section)

====Leave-one-out cross-validation====<!-- This section is linked from [[Data mining]] -->
[[File:LOOCV.gif|right|thumb|300x300px|Illustration of leave-one-out cross-validation (LOOCV) when n = 8 observations. A total of 8 models will be trained and tested.|alt=]]
Leave-''one''-out cross-validation ('''LOOCV''') is a particular case of leave-''p''-out cross-validation with ''p''&nbsp;=&nbsp;1. The process looks similar to [[Jackknife resampling|jackknife]]; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only.

LOO cross-validation requires less computation time than LpO cross-validation because there are only <math>C^n_1=n</math> passes rather than <math>C^n_p</math>. However, <math>n</math> passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate.<ref>{{cite journal |last1=Molinaro |first1=Annette M. |last2=Simon |first2=Richard |last3=Pfeiffer |first3=Ruth M. |title=Prediction error estimation: a comparison of resampling methods |journal=Bioinformatics |date=August 2005 |volume=21 |issue=15 |pages=3301–3307 |doi=10.1093/bioinformatics/bti499 |pmid=15905277 |doi-access=free }}</ref>

'''Pseudo-code algorithm:'''

'''Input:'''

<code>x</code>, {vector of length <code>N</code> with x-values of incoming points}

<code>y</code>, {vector of length <code>N</code> with y-values of the expected result}

<code>interpolate( x_in, y_in, x_out )</code>, { returns the estimation for point <code>x_out</code> after the model is trained with <code>x_in</code>-<code>y_in</code> pairs}

'''Output:'''

<code>err</code>, {estimate for the prediction error}

'''Steps:'''

  err ← 0
  for i ← 1, ..., N do
    // define the cross-validation subsets
    x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N])
    y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N])
    x_out ← x[i]
    y_out ← interpolate(x_in, y_in, x_out)
    err ← err + (y[i] − y_out)^2
  end for
  err ← err/N