Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Pearson correlation coefficient
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==In least squares regression analysis== {{For|more general, non-linear dependency|Coefficient of determination#In a multiple linear model}} The square of the sample correlation coefficient is typically denoted ''r''<sup>2</sup> and is a special case of the [[coefficient of determination]]. In this case, it estimates the fraction of the variance in ''Y'' that is explained by ''X'' in a [[simple linear regression]]. So if we have the observed dataset <math>Y_1, \dots , Y_n</math> and the fitted dataset <math>\hat Y_1, \dots , \hat Y_n</math> then as a starting point the total variation in the ''Y''<sub>''i''</sub> around their average value can be decomposed as follows :<math>\sum_i (Y_i - \bar{Y})^2 = \sum_i (Y_i-\hat{Y}_i)^2 + \sum_i (\hat{Y}_i-\bar{Y})^2,</math> where the <math>\hat{Y}_i</math> are the fitted values from the regression analysis. This can be rearranged to give :<math>1 = \frac{\sum_i (Y_i-\hat{Y}_i)^2}{\sum_i (Y_i - \bar{Y})^2} + \frac{\sum_i (\hat{Y}_i-\bar{Y})^2}{\sum_i (Y_i - \bar{Y})^2}.</math> The two summands above are the fraction of variance in ''Y'' that is explained by ''X'' (right) and that is unexplained by ''X'' (left). Next, we apply a property of [[least squares]] regression models, that the sample covariance between <math>\hat{Y}_i</math> and <math>Y_i-\hat{Y}_i</math> is zero. Thus, the sample correlation coefficient between the observed and fitted response values in the regression can be written (calculation is under expectation, assumes Gaussian statistics) :<math> \begin{align} r(Y,\hat{Y}) &= \frac{\sum_i(Y_i-\bar{Y})(\hat{Y}_i-\bar{Y})}{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\[6pt] &= \frac{\sum_i(Y_i-\hat{Y}_i+\hat{Y}_i-\bar{Y})(\hat{Y}_i-\bar{Y})}{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\[6pt] &= \frac{ \sum_i [(Y_i-\hat{Y}_i)(\hat{Y}_i-\bar{Y}) +(\hat{Y}_i-\bar{Y})^2 ]}{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\[6pt] &= \frac{ \sum_i (\hat{Y}_i-\bar{Y})^2 }{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\[6pt] &= \sqrt{\frac{\sum_i(\hat{Y}_i-\bar{Y})^2}{\sum_i(Y_i-\bar{Y})^2}}. \end{align} </math> Thus :<math>r(Y,\hat{Y})^2 = \frac{\sum_i(\hat{Y}_i-\bar{Y})^2}{\sum_i(Y_i-\bar{Y})^2}</math> where <math>r(Y,\hat{Y})^2</math> is the proportion of variance in ''Y'' explained by a linear function of ''X''. In the derivation above, the fact that :<math>\sum_i (Y_i-\hat{Y}_i)(\hat{Y}_i-\bar{Y}) = 0</math> can be proved by noticing that the partial derivatives of the [[residual sum of squares]] ({{math|RSS}}) over ''Ξ²''<sub>0</sub> and ''Ξ²''<sub>1</sub> are equal to 0 in the least squares model, where :<math>\text{RSS} = \sum_i (Y_i - \hat{Y}_i)^2</math>. In the end, the equation can be written as :<math>r(Y,\hat{Y})^2 = \frac{\text{SS}_\text{reg}}{\text{SS}_\text{tot}}</math> where *<math>\text{SS}_\text{reg} = \sum_i (\hat{Y}_i-\bar{Y})^2</math> *<math>\text{SS}_\text{tot} = \sum_i (Y_i-\bar{Y})^2</math>. The symbol <math>\text{SS}_\text{reg}</math> is called the regression sum of squares, also called the [[explained sum of squares]], and <math>\text{SS}_\text{tot}</math> is the [[total sum of squares]] (proportional to the [[variance]] of the data).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)