Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Overfitting
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Flaw in mathematical modelling}} {{Refimprove|date=August 2017}} {{Machine learning}} [[Image:Overfitting.svg|thumb|300px|Figure 1. The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and is likely to have a higher error rate on new unseen data, illustrated by black-outlined dots, compared to the black line.]] [[File:Pyplot overfitting.png|thumb|300x300px|Figure 2. Noisy (roughly linear) data is fitted to a linear function and a [[polynomial]] function. Although the polynomial function is a perfect fit, the linear function can be expected to generalize better: If the two functions were used to extrapolate beyond the fitted data, the linear function should make better predictions.]] [[Image:Parabola_on_line.png|thumb|300px|Figure 3. The blue dashed line represents an underfitted model. A straight line can never fit a parabola. This model is too simple.]] <span lang="English ">In</span> mathematical modeling, '''overfitting''' is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably".<ref>Definition of "[https://web.archive.org/web/20171107014257/https://en.oxforddictionaries.com/definition/overfitting overfitting]" at [[OxfordDictionaries.com]]: this definition is specifically for statistics.</ref> An '''overfitted model''' is a [[mathematical model]] that contains more [[parameter]]s than can be justified by the data.<ref name=CDS/> In the special case where the model consists of a polynomial function, these parameters represent the [[degree of a polynomial]]. The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e., the [[Statistical noise|noise]]) as if that variation represented underlying model structure.<ref name="BA2002" />{{rp|45}} '''Underfitting''' occurs when a mathematical model cannot adequately capture the underlying structure of the data. An '''under-fitted model''' is a model where some parameters or terms that would appear in a correctly specified model are missing.<ref name=CDS/> Underfitting would occur, for example, when fitting a linear model to nonlinear data. Such a model will tend to have poor predictive performance. The possibility of over-fitting exists because the criterion used for [[model selection|selecting the model]] is not the same as the criterion used to judge the suitability of a model. For example, a model might be selected by maximizing its performance on some set of [[training data]], and yet its suitability might be determined by its ability to perform well on unseen data; overfitting occurs when a model begins to "memorize" training data rather than "learning" to generalize from a trend. As an extreme example, if the number of parameters is the same as or greater than the number of observations, then a model can perfectly predict the training data simply by memorizing the data in its entirety. (For an illustration, see Figure 2.) Such a model, though, will typically fail severely when making predictions. Overfitting is directly related to approximation error of the selected function class and the optimization error of the optimization procedure. A function class that is too large, in a suitable sense, relative to the dataset size is likely to overfit.<ref>{{Citation |last1=Bottou |first1=LΓ©on |title=The Tradeoffs of Large-Scale Learning |date=2011-09-30 |url=http://dx.doi.org/10.7551/mitpress/8996.003.0015 |work=Optimization for Machine Learning |pages=351β368 |access-date=2023-12-08 |publisher=The MIT Press |isbn=978-0-262-29877-3 |last2=Bousquet |first2=Olivier|doi=10.7551/mitpress/8996.003.0015 }}</ref> Even when the fitted model does not have an excessive number of parameters, it is to be expected that the fitted relationship will appear to perform less well on a new dataset than on the dataset used for fitting (a phenomenon sometimes known as ''shrinkage'').<ref name="CDS">Everitt B.S., Skrondal A. (2010), ''Cambridge Dictionary of Statistics'', [[Cambridge University Press]].</ref> In particular, the value of the [[coefficient of determination]] will [[Shrinkage (statistics)|shrink]] relative to the original data. To lessen the chance or amount of overfitting, several techniques are available (e.g., [[Model selection|model comparison]], [[cross-validation (statistics)|cross-validation]], [[regularization (mathematics)|regularization]], [[early stopping]], [[pruning (algorithm)|pruning]], [[Prior distribution|Bayesian priors]], or [[Dropout (neural networks)|dropout]]). The basis of some techniques is to either (1) explicitly penalize overly complex models or (2) test the model's ability to generalize by evaluating its performance on a set of data not used for training, which is assumed to approximate the typical unseen data that a model will encounter.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)