Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cross-validation (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====Holdout method==== In the holdout method, we randomly assign data points to two sets ''d''<sub>0</sub> and ''d''<sub>1</sub>, usually called the training set and the test set, respectively. The size of each of the sets is arbitrary although typically the test set is smaller than the training set. We then train (build a model) on ''d''<sub>0</sub> and test (evaluate its performance) on ''d''<sub>1</sub>. In typical cross-validation, results of multiple runs of model-testing are averaged together; in contrast, the holdout method, in isolation, involves a single run. It should be used with caution because without such averaging of multiple runs, one may achieve highly misleading results. One's indicator of predictive accuracy ([[#Statistical properties|''F''<sup>*</sup>]]) will tend to be unstable since it will not be smoothed out by multiple iterations (see below). Similarly, indicators of the specific role played by various predictor variables (e.g., values of regression coefficients) will tend to be unstable. While the holdout method can be framed as "the simplest kind of cross-validation",<ref>{{cite web|title=Cross Validation|url=https://www.cs.cmu.edu/~schneide/tut5/node42.html|access-date=11 November 2012}}{{self-published inline|date=November 2024}}</ref> many sources instead classify holdout as a type of simple validation, rather than a simple or degenerate form of cross-validation.<ref name="Kohavi95" /><ref>{{cite journal |last1=Arlot |first1=Sylvain |first2=Alain |last2=Celisse |title=A survey of cross-validation procedures for model selection |journal=Statistics Surveys |volume=4 |year=2010 |pages=40β79 |quote=In brief, CV consists in averaging several hold-out estimators of the risk corresponding to different data splits. |doi=10.1214/09-SS054 |arxiv=0907.4728 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)