Editing Coefficient of determination (section)

== Definitions ==
[[File:Coefficient of Determination.svg|thumb|400px|<math>R^2 = 1 - \frac{\color{blue}{SS_\text{res}}}{\color{red}{SS_\text{tot}}}</math>{{br}}
The better the linear regression (on the right) fits the data in comparison to the simple average (on the left graph), the closer the value of ''R''<sup>2</sup> is to 1. The areas of the blue squares represent the squared residuals with respect to the linear regression. The areas of the red squares represent the squared residuals with respect to the average value.]]

A [[data set]] has ''n'' values marked ''y''<sub>1</sub>, ..., ''y''<sub>''n''</sub> (collectively known as ''y''<sub>''i''</sub> or as a vector '''''y''''' = [''y''<sub>1</sub>, ..., ''y''<sub>''n''</sub>]<sup>T</sup>), each associated with a fitted (or modeled, or predicted) value ''f''<sub>1</sub>, ..., ''f''<sub>''n''</sub> (known as ''f''<sub>''i''</sub>, or sometimes ''ŷ''<sub>''i''</sub>, as a vector '''''f''''').

Define the [[Residuals (statistics)|residuals]] as {{nowrap|1=''e''<sub>''i''</sub> = ''y''<sub>''i''</sub> − ''f''<sub>''i''</sub>}} (forming a vector '''''e''''').

If <math>\bar{y}</math> is the mean of the observed data:
<math display="block">\bar{y}=\frac{1}{n}\sum_{i=1}^n y_i </math>
then the variability of the data set can be measured with two [[Mean squared error|sums of squares]] formulas:
* The sum of squares of residuals, also called the [[residual sum of squares]]: <math display="block">SS_\text{res}=\sum_i (y_i - f_i)^2=\sum_i e_i^2\,</math>
* The [[total sum of squares]] (proportional to the [[variance]] of the data): <math display="block">SS_\text{tot}=\sum_i (y_i - \bar{y})^2</math>

The most general definition of the coefficient of determination is
<math display="block">R^2 = 1 - {SS_{\rm res}\over SS_{\rm tot}} </math>

In the best case, the modeled values exactly match the observed values, which results in <math>SS_\text{res}=0</math> and {{nowrap|1=''R''<sup>2</sup> = 1}}. A baseline model, which always predicts {{overline|''y''}}, will have {{nowrap|1=''R''<sup>2</sup> = 0}}. 

=== Relation to unexplained variance ===
{{Main|Fraction of variance unexplained}}
In a general form, ''R''<sup>2</sup> can be seen to be related to the fraction of variance unexplained (FVU), since the second term compares the unexplained variance (variance of the model's errors) with the total variance (of the data):
<math display="block">R^2 = 1 - \text{FVU}</math>

=== As explained variance ===
A larger value of ''R''<sup>2</sup> implies a more successful regression model.<ref name=Devore/>{{rp|463}}
Suppose {{nowrap|1=''R''<sup>2</sup> = 0.49}}. This implies that 49% of the variability of the dependent variable in the data set has been accounted for, and the remaining 51% of the variability is still unaccounted for. 
For regression models, the regression sum of squares, also called the [[explained sum of squares]], is defined as
: <math>SS_\text{reg}=\sum_i (f_i -\bar{y})^2</math>

In some cases, as in [[simple linear regression]], the [[total sum of squares]] equals the sum of the two other sums of squares defined above:
: <math>SS_\text{res}+SS_\text{reg}=SS_\text{tot}</math>

See [[Explained sum of squares#Partitioning in the general ordinary least squares model|Partitioning in the general OLS model]] for a derivation of this result for one case where the relation holds. When this relation does hold, the above definition of ''R''<sup>2</sup> is equivalent to
: <math>R^2 = \frac{SS_\text{reg}}{SS_\text{tot}} = \frac{SS_\text{reg}/n}{SS_\text{tot}/n}</math>
where ''n'' is the number of observations (cases) on the variables.

In this form ''R''<sup>2</sup> is expressed as the ratio of the [[explained variation|explained variance]] (variance of the model's predictions, which is {{nowrap|''SS''<sub>reg</sub> / ''n''}}) to the total variance (sample variance of the dependent variable, which is {{nowrap|''SS''<sub>tot</sub> / ''n''}}).

This partition of the sum of squares holds for instance when the model values ''ƒ''<sub>''i''</sub> have been obtained by [[linear regression]]. A milder [[sufficient condition]] reads as follows: The model has the form
: <math>f_i=\widehat\alpha+\widehat\beta q_i</math>
where the ''q''<sub>''i''</sub> are arbitrary values that may or may not depend on ''i'' or on other free parameters (the common choice ''q''<sub>''i''</sub>&nbsp;=&nbsp;''x''<sub>''i''</sub> is just one special case), and the coefficient estimates <math>\widehat\alpha</math> and <math>\widehat\beta</math> are obtained by minimizing the residual sum of squares.

This set of conditions is an important one and it has a number of implications for the properties of the fitted [[Errors and residuals in statistics|residuals]] and the modelled values. In particular, under these conditions:
: <math>\bar{f}=\bar{y}.\,</math>

=== As squared correlation coefficient ===
In linear least squares [[multiple regression]] (with fitted intercept and slope), ''R''<sup>2</sup> equals <math>\rho^2(y,f)</math> the square of the [[Pearson correlation coefficient]] between the observed <math>y</math> and modeled (predicted) <math>f</math> data values of the dependent variable.

In a [[simple regression|linear least squares regression with a single explanator]] (with fitted intercept and slope), this is also equal to  <math>\rho^2(y,x)</math> the squared Pearson correlation coefficient between the dependent variable <math>y</math> and explanatory variable <math>x</math>.

It should not be confused with the correlation coefficient between two [[explanatory variable]]s, defined as
: <math>\rho_{\widehat\alpha,\widehat\beta} = {\operatorname{cov}\left(\widehat\alpha,\widehat\beta\right) \over \sigma_{\widehat\alpha} \sigma_{\widehat\beta}},</math>
where the covariance between two coefficient estimates, as well as their [[standard deviation]]s, are obtained from the [[Ordinary least squares#Covariance matrix|covariance matrix]] of the coefficient estimates, <math>(X^T X)^{-1}</math>.

Under more general modeling conditions, where the predicted values might be generated from a model different from linear least squares regression, an ''R''<sup>2</sup> value can be calculated as the square of the [[Pearson product-moment correlation coefficient|correlation coefficient]] between the original <math>y</math> and modeled <math>f</math> data values. In this case, the value is not directly a measure of how good the modeled values are, but rather a measure of how good a predictor might be constructed from the modeled values (by creating a revised predictor of the form {{nowrap|''α'' + ''βƒ''<sub>''i''</sub>}}).{{Citation needed|reason=The citation for the next sentence does not discuss the information in this sentence.|date=March 2017}} According to Everitt,<ref>{{cite book |last=Everitt |first=B. S. |page=78 |year=2002 |title=Cambridge Dictionary of Statistics |edition=2nd |publisher=CUP |isbn=978-0-521-81099-9}}</ref> this usage is specifically the definition of the term "coefficient of determination": the square of the correlation between two (general) variables.