Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cross-validation (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====Repeated random sub-sampling validation==== This method, also known as [[Monte Carlo method|Monte Carlo]] cross-validation,<ref>{{cite journal |last1=Xu |first1=Qing-Song |last2=Liang |first2=Yi-Zeng |title=Monte Carlo cross validation |journal=Chemometrics and Intelligent Laboratory Systems |date=April 2001 |volume=56 |issue=1 |pages=1β11 |doi=10.1016/S0169-7439(00)00122-2 }}</ref><ref name="mccv">{{cite book |doi=10.1007/978-0-387-47509-7_8 |chapter=Resampling Strategies for Model Assessment and Selection |title=Fundamentals of Data Mining in Genomics and Proteomics |date=2007 |last1=Simon |first1=Richard |pages=173β186 |isbn=978-0-387-47508-0 }}</ref> creates multiple random splits of the dataset into training and validation data.<ref>{{cite book |doi=10.1007/978-1-4614-6849-3 |title=Applied Predictive Modeling |date=2013 |last1=Kuhn |first1=Max |last2=Johnson |first2=Kjell |isbn=978-1-4614-6848-6 }}{{pn|date=November 2024}}</ref> For each such split, the model is fit to the training data, and predictive accuracy is assessed using the validation data. The results are then averaged over the splits. The advantage of this method (over ''k''-fold cross validation) is that the proportion of the training/validation split is not dependent on the number of iterations (i.e., the number of partitions). The disadvantage of this method is that some observations may never be selected in the validation subsample, whereas others may be selected more than once. In other words, validation subsets may overlap. This method also exhibits [[Monte Carlo method|Monte Carlo]] variation, meaning that the results will vary if the analysis is repeated with different random splits. As the number of random splits approaches infinity, the result of repeated random sub-sampling validation tends towards that of leave-p-out cross-validation. In a stratified variant of this approach, the random samples are generated in such a way that the mean response value (i.e. the dependent variable in the regression) is equal in the training and testing sets. This is particularly useful if the responses are [[dichotomous]] with an unbalanced representation of the two response values in the data. A method that applies repeated random sub-sampling is [[RANSAC]].<ref>{{cite report |last1=Cantzler |first1=H |title=Random Sample Consensus (RANSAC) |url=https://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/CANTZLER2/ransac.pdf }}{{self-published inline|date=November 2024}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)