Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Coefficient of determination
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Indicator for how well data points fit a line or curve}} {{Distinguish|Coefficient of variation}} {{expand German|date=September 2019|Bestimmtheitsmaß}} [[File:Okuns law quarterly differences.svg|300px|thumb|[[Ordinary least squares]] regression of [[Okun's law]]. Since the regression line does not miss any of the points by very much, the ''R''<sup>2</sup> of the regression is relatively high.]] In [[statistics]], the '''coefficient of determination''', denoted ''R''<sup>2</sup> or ''r''<sup>2</sup> and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a [[statistic]] used in the context of [[statistical model]]s whose main purpose is either the [[Prediction#Statistics|prediction]] of future outcomes or the testing of [[hypotheses]], on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.<ref>{{cite book|last1=Steel|first1=R. G. D.|last2=Torrie|first2=J. H.|year=1960|title=Principles and Procedures of Statistics with Special Reference to the Biological Sciences|publisher=[[McGraw Hill]]}}</ref><ref>{{cite book |last1=Glantz |first1=Stanton A. |last2=Slinker |first2=B. K. |year=1990 |title=Primer of Applied Regression and Analysis of Variance |publisher=McGraw-Hill |isbn=978-0-07-023407-9}}</ref><ref>{{cite book |last1=Draper |first1=N. R. |last2=Smith |first2=H. |year=1998 |title=Applied Regression Analysis |publisher=Wiley-Interscience |isbn=978-0-471-17082-2}}</ref> There are several definitions of ''R''<sup>2</sup> that are only sometimes equivalent. In [[simple linear regression]] (which includes an [[regression intercept|intercept]]), ''r''<sup>2</sup> is simply the square of the sample [[Pearson product-moment correlation coefficient|''correlation coefficient'']] (''r''), between the observed outcomes and the observed predictor values.<ref name=Devore>{{cite book |last1 = Devore|first1 = Jay L.|title = Probability and Statistics for Engineering and the Sciences| edition=8th |publisher = Cengage Learning |location = Boston, MA | year = 2011 |isbn =978-0-538-73352-6 |pages=508–510}}</ref> If additional [[regressor]]s are included, ''R''<sup>2</sup> is the square of the ''[[coefficient of multiple correlation]]''. In both such cases, the coefficient of determination normally ranges from 0 to 1. There are cases where ''R''<sup>2</sup> can yield negative values. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, ''R''<sup>2</sup> may still be negative, for example when linear regression is conducted without including an intercept,<ref>{{cite book |last=Barten |first=Anton P. |author-link=Anton Barten |chapter=The Coeffecient of Determination for Regression without a Constant Term |editor-first=Risto |editor-last=Heijmans |editor2-first=Heinz |editor2-last=Neudecker |title=The Practice of Econometrics |location=Dordrecht |publisher=Kluwer |year=1987 |isbn=90-247-3502-5 |pages=181–189 }}</ref> or when a non-linear function is used to fit the data.<ref>{{cite journal |doi=10.1016/S0304-4076(96)01818-0 |title=An R-squared measure of goodness of fit for some common nonlinear regression models |year=1997 |last1=Colin Cameron |first1=A. |last2=Windmeijer |first2=Frank A.G. |journal=Journal of Econometrics |volume=77 |issue=2 |pages=1790–2 }}</ref> In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. The coefficient of determination can be more intuitively informative than [[mean absolute error|MAE]], [[MAPE]], [[mean square error|MSE]], and [[RMSE]] in [[regression analysis]] evaluation, as the former can be expressed as a percentage, whereas the latter measures have arbitrary ranges. It also proved more robust for poor fits compared to [[SMAPE]] on certain test datasets.<ref>{{cite journal | doi=10.7717/peerj-cs.623| title= The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation| year=2021 | last1= Chicco | first1=Davide | last2= Warrens | first2=Matthijs J. | last3=Jurman| first3=Giuseppe| journal= PeerJ Computer Science | volume=7 | issue=e623 | pages=e623| pmid= 34307865| pmc= 8279135| doi-access=free}}</ref> When evaluating the goodness-of-fit of simulated (''Y''<sub>pred</sub>) versus measured (''Y''<sub>obs</sub>) values, it is not appropriate to base this on the ''R''<sup>2</sup> of the linear regression (i.e., ''Y''<sub>obs</sub>= ''m''·''Y''<sub>pred</sub> + b).{{cn|date=August 2021}} The ''R''<sup>2</sup> quantifies the degree of any linear correlation between ''Y''<sub>obs</sub> and ''Y''<sub>pred</sub>, while for the goodness-of-fit evaluation only one specific linear correlation should be taken into consideration: ''Y''<sub>obs</sub> = 1·''Y''<sub>pred</sub> + 0 (i.e., the 1:1 line).<ref>{{cite journal |doi=10.1029/1998WR900018 |title= Evaluating the use of "goodness-of-fit" measures in hydrologic and hydroclimatic model validation |year=1999 |last1= Legates |first1=D.R. |last2= McCabe |first2=G.J. |journal= Water Resour. Res. |volume=35 |issue=1 |pages=233–241 |bibcode= 1999WRR....35..233L |s2cid= 128417849 |doi-access= }}</ref><ref>{{cite journal |doi=10.1016/j.jhydrol.2012.12.004|title= Performance evaluation of hydrological models: statistical significance for reducing subjectivity in goodness-of-fit assessments |year=2013 |last1= Ritter |first1=A. |last2= Muñoz-Carpena |first2=R. |journal= Journal of Hydrology |volume=480 |issue=1 |pages=33–45 |bibcode= 2013JHyd..480...33R }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)