Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Pearson correlation coefficient
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Robustness=== Like many commonly used statistics, the sample [[statistic]] ''r'' is not [[robust statistics|robust]],<ref name="wilcox">{{Cite book| title=Introduction to robust estimation and hypothesis testing | last = Wilcox | first = Rand R. | publisher= Academic Press | year=2005}}</ref> so its value can be misleading if [[outlier]]s are present.<ref>{{Cite journal |title=Robust estimation and outlier detection with correlation coefficients |author1=Devlin, Susan J. |author1-link=Susan J. Devlin |author2=Gnanadesikan, R. |author3=Kettenring J.R. |journal=Biometrika |volume=62 |issue=3 |year=1975 |pages=531β545 |doi=10.1093/biomet/62.3.531 |jstor=2335508}}</ref><ref>{{Cite book| title=Robust Statistics | last = Huber | first = Peter. J.| publisher= Wiley | year=2004}}{{Page needed|date=September 2010}}</ref> Specifically, the PMCC is neither distributionally robust,<ref>{{Cite book |last=Vaart |first=A. W. van der |url=http://dx.doi.org/10.1017/cbo9780511802256 |title=Asymptotic Statistics |date=1998-10-13 |publisher=Cambridge University Press |doi=10.1017/cbo9780511802256 |isbn=978-0-511-80225-6}}</ref> nor outlier resistant<ref name="wilcox"/> (see ''{{section link|Robust statistics#Definition}}''). Inspection of the [[scatterplot]] between ''X'' and ''Y'' will typically reveal a situation where lack of robustness might be an issue, and in such cases it may be advisable to use a robust measure of association. Note however that while most robust estimators of association measure [[statistical dependence]] in some way, they are generally not interpretable on the same scale as the Pearson correlation coefficient. Statistical inference for Pearson's correlation coefficient is sensitive to the data distribution. Exact tests, and asymptotic tests based on the [[Fisher transformation]] can be applied if the data are approximately normally distributed, but may be misleading otherwise. In some situations, the [[bootstrapping (statistics)|bootstrap]] can be applied to construct confidence intervals, and [[permutation test]]s can be applied to carry out hypothesis tests. These [[non-parametric statistics|non-parametric]] approaches may give more meaningful results in some situations where bivariate normality does not hold. However the standard versions of these approaches rely on [[exchangeable random variables|exchangeability]] of the data, meaning that there is no ordering or grouping of the data pairs being analyzed that might affect the behavior of the correlation estimate. A stratified analysis is one way to either accommodate a lack of bivariate normality, or to isolate the correlation resulting from one factor while controlling for another. If ''W'' represents cluster membership or another factor that it is desirable to control, we can [[Stratified sampling|stratify]] the data based on the value of ''W'', then calculate a correlation coefficient within each stratum. The stratum-level estimates can then be combined to estimate the overall correlation while controlling for ''W''.<ref>Katz., Mitchell H. (2006) ''Multivariable Analysis β A Practical Guide for Clinicians''. 2nd Edition. Cambridge University Press. {{isbn|978-0-521-54985-1}}. {{isbn|0-521-54985-X}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)