Editing Coefficient of determination (section)

=== Inflation of ''R''<sup>2</sup> ===
In [[least squares]] regression using typical data, ''R''<sup>2</sup> is at least weakly increasing with an increase in number of regressors in the model. Because increases in the number of regressors increase the value of ''R''<sup>2</sup>, ''R''<sup>2</sup> alone cannot be used as a meaningful comparison of models with very different numbers of independent variables. For a meaningful comparison between two models, an [[F-test]] can be performed on the [[residual sum of squares]] {{Citation needed|date=October 2021}}, similar to the F-tests in [[Granger causality]], though this is not always appropriate{{Explain|date=October 2021}}. As a reminder of this, some authors denote ''R''<sup>2</sup> by ''R''<sub>''q''</sub><sup>2</sup>, where ''q'' is the number of columns in ''X'' (the number of explanators including the constant).

To demonstrate this property, first recall that the objective of least squares linear regression is
: <math>\min_b SS_\text{res}(b) \Rightarrow \min_b \sum_i (y_i - X_ib)^2\,</math>
where ''X<sub>i</sub>'' is a row vector of values of explanatory variables for case ''i'' and ''b'' is a column vector of coefficients of the respective elements of ''X<sub>i</sub>''.

The optimal value of the objective is weakly smaller as more explanatory variables are added and hence additional columns of <math>X</math> (the explanatory data matrix whose ''i''th row is ''X<sub>i</sub>'') are added, by the fact that less constrained minimization leads to an optimal cost which is weakly smaller than more constrained minimization does. Given the previous conclusion and noting that <math>SS_{tot}</math> depends only on ''y'', the non-decreasing property of ''R''<sup>2</sup> follows directly from the definition above.

The intuitive reason that using an additional explanatory variable cannot lower the ''R''<sup>2</sup> is this: Minimizing <math>SS_\text{res}</math> is equivalent to maximizing ''R''<sup>2</sup>. When the extra variable is included, the data always have the option of giving it an estimated coefficient of zero, leaving the predicted values and the ''R''<sup>2</sup> unchanged. The only way that the optimization problem will give a non-zero coefficient is if doing so improves the&nbsp;''R''<sup>2</sup>.

The above gives an analytical explanation of the inflation of ''R''<sup>2</sup>. Next, an example based on ordinary least square from a geometric perspective is shown below. <ref>{{cite book |last1=Faraway |first1=Julian James |title=Linear models with R |date=2005 |publisher=Chapman & Hall/CRC |isbn=9781584884255 |url=https://www.utstat.toronto.edu/~brunner/books/LinearModelsWithR.pdf}}</ref>

[[File:Screen shot proj fig.jpg|thumb|400x266px|right|This is an example of residuals of regression models in smaller and larger spaces based on ordinary least square regression.]]

A simple case to be considered first:
: <math>Y=\beta_0+\beta_1\cdot X_1+\varepsilon\,</math>
This equation describes the [[ordinary least squares regression]] model with one regressor. The prediction is shown as the red vector in the figure on the right. Geometrically, it is the projection of true value onto a model space in <math>\mathbb{R}</math> (without intercept). The residual is shown as the red line.
: <math>Y=\beta_0+\beta_1\cdot X_1+\beta_2\cdot X_2 + \varepsilon\,</math>
This equation corresponds to the ordinary least squares regression model with two regressors. The prediction is shown as the blue vector in the figure on the right. Geometrically, it is the projection of true value onto a larger model space in <math>\mathbb{R}^2</math> (without intercept). Noticeably, the values of <math>\beta_0</math> and <math>\beta_0</math> are not the same as in the equation for smaller model space as long as <math>X_1</math> and <math>X_2</math> are not zero vectors. Therefore, the equations are expected to yield different predictions (i.e., the blue vector is expected to be different from the red vector). The least squares regression criterion ensures that the residual is minimized. In the figure, the blue line representing the residual is orthogonal to the model space in <math>\mathbb{R}^2</math>, giving the minimal distance from the space. 

The smaller model space is a subspace of the larger one, and thereby the residual of the smaller model is guaranteed to be larger. Comparing the red and blue lines in the figure, the blue line is orthogonal to the space, and any other line would be larger than the blue one. Considering the calculation for ''R''<sup>2</sup>, a smaller value of <math>SS_{tot}</math> will lead to a larger value of ''R''<sup>2</sup>, meaning that adding regressors will result in inflation of ''R''<sup>2</sup>.