Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cross-validation (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Applications== Cross-validation can be used to compare the performances of different predictive modeling procedures. For example, suppose we are interested in [[optical character recognition]], and we are considering using either a [[Support Vector Machine]] (SVM) or [[K-nearest neighbors algorithm|''k''-nearest neighbors]] (KNN) to predict the true character from an image of a handwritten character. Using cross-validation, we can obtain empirical estimates comparing these two methods in terms of their respective fractions of misclassified characters. In contrast, the in-sample estimate will not represent the quantity of interest (i.e. the generalization error).<ref>{{cite book |doi=10.1007/978-0-387-84858-7 |title=The Elements of Statistical Learning |series=Springer Series in Statistics |date=2009 |isbn=978-0-387-84857-0 }}{{pn|date=November 2024}}</ref> Cross-validation can also be used in [[Feature selection|''variable selection'']].<ref name="Picard84">{{cite journal |last1=Picard |first1=Richard |last2=Cook |first2=Dennis |year=1984 |title=Cross-Validation of Regression Models |journal=Journal of the American Statistical Association |jstor=2288403 |volume=79 |pages=575β583 |doi=10.2307/2288403 |issue=387 }}</ref> Suppose we are using the [[gene expression|expression]] levels of 20 [[proteins]] to predict whether a [[cancer]] patient will respond to a [[drug]]. A practical goal would be to determine which subset of the 20 features should be used to produce the best predictive model. For most modeling procedures, if we compare feature subsets using the in-sample error rates, the best performance will occur when all 20 features are used. However under cross-validation, the model with the best fit will generally include only a subset of the features that are deemed truly informative. A recent development in medical statistics is its use in meta-analysis. It forms the basis of the validation statistic, Vn which is used to test the statistical validity of meta-analysis summary estimates.<ref>{{cite journal |last1=Willis |first1=Brian H. |last2=Riley |first2=Richard D. |title=Measuring the statistical validity of summary meta-analysis and meta-regression results for use in clinical practice |journal=Statistics in Medicine |date=20 September 2017 |volume=36 |issue=21 |pages=3283β3301 |doi=10.1002/sim.7372 |pmid=28620945 |pmc=5575530 }}</ref> It has also been used in a more conventional sense in meta-analysis to estimate the likely prediction error of meta-analysis results.<ref>{{cite journal |last1=Riley |first1=Richard D. |last2=Ahmed |first2=Ikhlaaq |last3=Debray |first3=Thomas P. A. |last4=Willis |first4=Brian H. |last5=Noordzij |first5=J. Pieter |last6=Higgins |first6=Julian P.T. |last7=Deeks |first7=Jonathan J. |title=Summarising and validating test accuracy results across multiple studies for use in clinical practice |journal=Statistics in Medicine |date=15 June 2015 |volume=34 |issue=13 |pages=2081β2103 |doi=10.1002/sim.6471 |pmid=25800943 |pmc=4973708 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)