Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Pearson correlation coefficient
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Interpretation== The correlation coefficient ranges from β1 to 1. An absolute value of exactly 1 implies that a linear equation describes the relationship between ''X'' and ''Y'' perfectly, with all data points lying on a [[line (mathematics)|line]]. The correlation sign is determined by the [[regression slope]]: a value of +1 implies that all data points lie on a line for which ''Y'' increases as ''X'' increases, whereas a value of -1 implies a line where ''Y'' increases while ''X'' decreases.<ref name="STAT 462">{{cite web | title=2.6 - (Pearson) Correlation Coefficient r | website=STAT 462 | url=https://online.stat.psu.edu/stat462/node/96/ | access-date=2021-07-10}}</ref> A value of 0 implies that there is no linear dependency between the variables.<ref>{{Cite web|title=Introductory Business Statistics: The Correlation Coefficient r|url=https://opentextbc.ca/introbusinessstatopenstax/chapter/the-correlation-coefficient-r/|access-date=21 August 2020|website=opentextbc.ca}}</ref> More generally, {{math|(''X''<sub>''i''</sub> β {{overline|''X''}})(''Y''<sub>''i''</sub> β {{overline|''Y''}})}} is positive if and only if ''X''<sub>''i''</sub> and ''Y''<sub>''i''</sub> lie on the same side of their respective means. Thus the correlation coefficient is positive if ''X''<sub>''i''</sub> and ''Y''<sub>''i''</sub> tend to be simultaneously greater than, or simultaneously less than, their respective means. The correlation coefficient is negative ([[anti-correlation]]) if ''X''<sub>''i''</sub> and ''Y''<sub>''i''</sub> tend to lie on opposite sides of their respective means. Moreover, the stronger either tendency is, the larger is the [[absolute value]] of the correlation coefficient. Rodgers and Nicewander<ref>{{cite journal |author1=Rodgers |author2=Nicewander |year=1988 |title=Thirteen ways to look at the correlation coefficient |journal=The American Statistician |volume=42 |issue=1 |pages=59β66 |url=https://www.stat.berkeley.edu/~rabbee/correlation.pdf |doi=10.2307/2685263 |jstor=2685263}}</ref> cataloged thirteen ways of interpreting correlation or simple functions of it: * Function of raw scores and means * Standardized covariance * Standardized slope of the regression line * Geometric mean of the two regression slopes * Square root of the ratio of two variances * Mean cross-product of standardized variables * Function of the angle between two standardized regression lines * Function of the angle between two variable vectors * Rescaled variance of the difference between standardized scores * Estimated from the balloon rule * Related to the bivariate ellipses of isoconcentration * Function of test statistics from designed experiments * Ratio of two means ===Geometric interpretation=== [[File:Regression lines.png|thumb|upright=1.5|Regression lines for {{math|1=''y'' = ''g''<sub>''X''</sub>(''x'')}} [{{color|red|red}}] and {{math|1=''x'' = ''g''<sub>''Y''</sub>(''y'')}} [{{color|blue|blue}}]]] For uncentered data, there is a relation between the correlation coefficient and the angle ''Ο'' between the two regression lines, {{nowrap|1=''y'' = ''g''<sub>''X''</sub>(''x'')}} and {{nowrap|1=''x'' = ''g''<sub>''Y''</sub>(''y'')}}, obtained by regressing ''y'' on ''x'' and ''x'' on ''y'' respectively. (Here, ''Ο'' is measured counterclockwise within the first quadrant formed around the lines' intersection point if {{math|''r'' > 0}}, or counterclockwise from the fourth to the second quadrant if {{nowrap|''r'' < 0}}.) One can show<ref>{{cite journal |last=Schmid |first=John Jr. |title=The relationship between the coefficient of correlation and the angle included between regression lines |journal=The Journal of Educational Research |date=December 1947 |volume=41 |issue=4 |pages=311β313 |jstor=27528906 |doi=10.1080/00220671.1947.10881608}}</ref> that if the standard deviations are equal, then {{nowrap|1=''r'' = sec ''Ο'' β tan ''Ο''}}, where sec and tan are [[trigonometric functions]]. For centered data (i.e., data which have been shifted by the sample means of their respective variables so as to have an average of zero for each variable), the correlation coefficient can also be viewed as the [[cosine]] of the [[angle]] ''ΞΈ'' between the two observed [[Vector (geometry)|vectors]] in ''N''-dimensional space (for ''N'' observations of each variable).<ref>{{cite web |last=Rummel |first=R.J. |title=Understanding Correlation |year=1976 |url=http://www.hawaii.edu/powerkills/UC.HTM |at=ch. 5 (as illustrated for a special case in the next paragraph)}}</ref> Both the uncentered (non-Pearson-compliant) and centered correlation coefficients can be determined for a dataset. As an example, suppose five countries are found to have gross national products of 1, 2, 3, 5, and 8 billion dollars, respectively. Suppose these same five countries (in the same order) are found to have 11%, 12%, 13%, 15%, and 18% poverty. Then let '''x''' and '''y''' be ordered 5-element vectors containing the above data: {{nowrap|1='''x''' = (1, 2, 3, 5, 8)}} and {{nowrap|1='''y''' = (0.11, 0.12, 0.13, 0.15, 0.18)}}. By the usual procedure for finding the angle ''ΞΈ'' between two vectors (see [[dot product]]), the ''uncentered'' correlation coefficient is :<math> \cos \theta = \frac { \mathbf{x} \cdot \mathbf{y} } { \left\| \mathbf{x} \right\| \left\| \mathbf{y} \right\|} = \frac {2.93} { \sqrt{103} \sqrt{0.0983} } = 0.920814711. </math> This uncentered correlation coefficient is identical with the [[cosine similarity]]. The above data were deliberately chosen to be perfectly correlated: {{math|1=''y'' = 0.10 + 0.01 ''x''}}. The Pearson correlation coefficient must therefore be exactly one. Centering the data (shifting '''x''' by {{math|1=β°('''x''') = 3.8}} and '''y''' by {{math|1=β°('''y''') = 0.138}}) yields {{math|1='''x''' = (β2.8, β1.8, β0.8, 1.2, 4.2)}} and {{math|1='''y''' = (β0.028, β0.018, β0.008, 0.012, 0.042)}}, from which :<math> \cos \theta = \frac{\mathbf{x} \cdot \mathbf{y}} {\left\| \mathbf{x} \right\| \left\| \mathbf{y} \right\|} = \frac {0.308}{\sqrt{30.8}\sqrt{0.00308}} = 1 = \rho_{xy}, </math> as expected. ===Interpretation of the size of a correlation=== [[File:Pearson correlation and prediction intervals.svg|thumb|200px|right|This figure gives a sense of how the usefulness of a Pearson correlation for predicting values varies with its magnitude. Given jointly normal ''X'', ''Y'' with correlation ''Ο'', <math>1 - \sqrt{1 - \rho^2}</math> (plotted here as a function of ''Ο'') is the factor by which a given [[w:Prediction interval|prediction interval]] for ''Y'' may be reduced given the corresponding value of ''X''. For example, if ''Ο'' = 0.5, then the 95% prediction interval of ''Y''{{pipe}}''X'' will be about 13% smaller than the 95% prediction interval of ''Y''.]] Several authors have offered guidelines for the interpretation of a correlation coefficient.<ref name="Buda">{{cite book |last1=Buda |first1=Andrzej |last2=Jarynowski |first2=Andrzej |title=Life Time of Correlations and its Applications |date=December 2010 |publisher=Wydawnictwo NiezaleΕΌne |isbn=9788391527290 |pages=5β21}}</ref><ref name="Cohen88"/> However, all such criteria are in some ways arbitrary.<ref name="Cohen88">{{cite book |last=Cohen |first=J. |year=1988 |title=Statistical Power Analysis for the Behavioral Sciences |edition=2nd}}</ref> The interpretation of a correlation coefficient depends on the context and purposes. A correlation of 0.8 may be very low if one is verifying a physical law using high-quality instruments, but may be regarded as very high in the social sciences, where there may be a greater contribution from complicating factors.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)