Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cross-validation (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ====Leave-p-out cross-validation==== Leave-''p''-out cross-validation ('''LpO CV''') involves using ''p'' observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of ''p'' observations and a training set.<ref>{{cite journal |last1=Celisse |first1=Alain |title=Optimal cross-validation in density estimation with the L<sup>2</sup>-loss |journal=The Annals of Statistics |date=October 2014 |volume=42 |issue=5 |doi=10.1214/14-AOS1240 |arxiv=0811.0802 }}</ref> LpO cross-validation require training and validating the model <math>C^n_p</math> times, where ''n'' is the number of observations in the original sample, and where <math>C^n_p</math> is the [[binomial coefficient]]. For ''p'' > 1 and for even moderately large ''n'', LpO CV can become computationally infeasible. For example, with ''n'' = 100 and ''p'' = 30, <math>C^{100}_{30} \approx 3\times 10^{25}.</math> A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under [[ROC curve]] of binary classifiers.<ref>{{cite journal |last1=Airola |first1=Antti |last2=Pahikkala |first2=Tapio |last3=Waegeman |first3=Willem |last4=De Baets |first4=Bernard |last5=Salakoski |first5=Tapio |title=An experimental comparison of cross-validation techniques for estimating the area under the ROC curve |journal=Computational Statistics & Data Analysis |date=April 2011 |volume=55 |issue=4 |pages=1828β1844 |doi=10.1016/j.csda.2010.11.018 }}</ref> ====Leave-one-out cross-validation====<!-- This section is linked from [[Data mining]] --> [[File:LOOCV.gif|right|thumb|300x300px|Illustration of leave-one-out cross-validation (LOOCV) when n = 8 observations. A total of 8 models will be trained and tested.|alt=]] Leave-''one''-out cross-validation ('''LOOCV''') is a particular case of leave-''p''-out cross-validation with ''p'' = 1. The process looks similar to [[Jackknife resampling|jackknife]]; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only <math>C^n_1=n</math> passes rather than <math>C^n_p</math>. However, <math>n</math> passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate.<ref>{{cite journal |last1=Molinaro |first1=Annette M. |last2=Simon |first2=Richard |last3=Pfeiffer |first3=Ruth M. |title=Prediction error estimation: a comparison of resampling methods |journal=Bioinformatics |date=August 2005 |volume=21 |issue=15 |pages=3301β3307 |doi=10.1093/bioinformatics/bti499 |pmid=15905277 |doi-access=free }}</ref> '''Pseudo-code algorithm:''' '''Input:''' <code>x</code>, {vector of length <code>N</code> with x-values of incoming points} <code>y</code>, {vector of length <code>N</code> with y-values of the expected result} <code>interpolate( x_in, y_in, x_out )</code>, { returns the estimation for point <code>x_out</code> after the model is trained with <code>x_in</code>-<code>y_in</code> pairs} '''Output:''' <code>err</code>, {estimate for the prediction error} '''Steps:''' err β 0 for i β 1, ..., N do // define the cross-validation subsets x_in β (x[1], ..., x[i β 1], x[i + 1], ..., x[N]) y_in β (y[1], ..., y[i β 1], y[i + 1], ..., y[N]) x_out β x[i] y_out β interpolate(x_in, y_in, x_out) err β err + (y[i] β y_out)^2 end for err β err/N
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)