Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Correlation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Other measures of dependence among random variables== {{See also|Pearson product-moment correlation coefficient#Variants}} The information given by a correlation coefficient is not enough to define the dependence structure between random variables. The correlation coefficient completely defines the dependence structure only in very particular cases, for example when the distribution is a [[multivariate normal distribution]]. (See diagram above.) In the case of [[elliptical distribution]]s it characterizes the (hyper-)ellipses of equal density; however, it does not completely characterize the dependence structure (for example, a [[multivariate t-distribution]]'s degrees of freedom determine the level of tail dependence). For continuous variables, multiple alternative measures of dependence were introduced to address the deficiency of Pearson's correlation that it can be zero for dependent random variables (see <ref name = karch>{{cite journal | last1 = Karch | first1 = Julian D. | last2 = Perez-Alonso | first2 = Andres F. | last3 = Bergsma | first3 = Wicher P. | title = Beyond Pearson's Correlation: Modern Nonparametric Independence Tests for Psychological Research | journal = Multivariate Behavioral Research | doi = 10.1080/00273171.2024.2347960 | date = 2024-08-04 | volume = 59 | issue = 5 | pages = 957–977 | pmid = 39097830 | hdl = 1887/4108931 | hdl-access = free }}</ref> and reference references therein for an overview). They all share the important property that a value of zero implies independence. This led some authors <ref name = karch/><ref>{{cite arXiv | last1 = Simon | first1 = Noah | last2 = Tibshirani | first2 = Robert | title = Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 | year = 2014 | eprint = 1401.7645 | class = stat.ME | pages = 3 }}</ref> to recommend their routine usage, particularly of [[distance correlation]].<ref>{{cite journal | last1 = Székely | first1 = G. J. Rizzo | last2 = Bakirov | first2 = N. K. | year = 2007 | title = Measuring and testing independence by correlation of distances | journal = [[Annals of Statistics]] | volume = 35 | issue = 6| pages = 2769–2794 | doi = 10.1214/009053607000000505 | arxiv = 0803.4101 | s2cid = 5661488 }}</ref><ref>{{cite journal | last1 = Székely | first1 = G. J. | last2 = Rizzo | first2 = M. L. | year = 2009 | title = Brownian distance covariance | journal = Annals of Applied Statistics | volume = 3 | issue = 4| pages = 1233–1303 | doi = 10.1214/09-AOAS312 | pmid = 20574547 | pmc = 2889501 | arxiv = 1010.0297 }}</ref> Another alternative measure is the Randomized Dependence Coefficient.<ref>Lopez-Paz D. and Hennig P. and Schölkopf B. (2013). "The Randomized Dependence Coefficient", "[[Conference on Neural Information Processing Systems]]" [http://papers.nips.cc/paper/5138-the-randomized-dependence-coefficient.pdf Reprint]</ref> The RDC is a computationally efficient, [[Copula (probability theory)|copula]]-based measure of dependence between multivariate random variables and is invariant with respect to non-linear scalings of random variables. One important disadvantage of the alternative, more general measures is that, when used to test whether two variables are associated, they tend to have lower power compared to Pearson's correlation when the data follow a multivariate normal distribution.<ref name=karch/> This is an implication of the [[No free lunch theorem]]. To detect all kinds of relationships, these measures have to sacrifice power on other relationships, particularly for the important special case of a linear relationship with Gaussian marginals, for which Pearson's correlation is optimal. Another problem concerns interpretation. While Person's correlation can be interpreted for all values, the alternative measures can generally only be interpreted meaningfully at the extremes.<ref>{{cite journal | last1 = Reimherr | first1 = Matthew | last2 = Nicolae | first2 = Dan L. | title = On Quantifying Dependence: A Framework for Developing Interpretable Measures | journal = Statistical Science | volume = 28 | issue = 1 | pages = 116–130 | year = 2013 | doi = 10.1214/12-STS405 | arxiv = 1302.5233 }}</ref> For two [[binary data|binary variables]], the [[odds ratio]] measures their dependence, and takes range non-negative numbers, possibly infinity: {{tmath|[0, +\infty]}}. Related statistics such as [[Yule's Y|Yule's ''Y'']] and [[Yule's Q|Yule's ''Q'']] normalize this to the correlation-like range {{tmath|[-1, 1]}}. The odds ratio is generalized by the [[logistic regression|logistic model]] to model cases where the dependent variables are discrete and there may be one or more independent variables. The [[correlation ratio]], [[Entropy (information theory)|entropy]]-based [[mutual information]], [[total correlation]], [[dual total correlation]] and [[polychoric correlation]] are all also capable of detecting more general dependencies, as is consideration of the [[copula (statistics)|copula]] between them, while the [[coefficient of determination]] generalizes the correlation coefficient to [[multiple regression]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)