Editing Pearson correlation coefficient (section)

===Robustness===
Like many commonly used statistics, the sample [[statistic]] ''r'' is not [[robust statistics|robust]],<ref name="wilcox">{{Cite book| title=Introduction to robust estimation and hypothesis testing | last = Wilcox | first = Rand R. | publisher= Academic Press | year=2005}}</ref> so its value can be misleading if [[outlier]]s are present.<ref>{{Cite journal |title=Robust estimation and outlier detection with correlation coefficients |author1=Devlin, Susan J. |author1-link=Susan J. Devlin |author2=Gnanadesikan, R. |author3=Kettenring J.R. |journal=Biometrika |volume=62 |issue=3 |year=1975 |pages=531–545 |doi=10.1093/biomet/62.3.531 |jstor=2335508}}</ref><ref>{{Cite book| title=Robust Statistics | last = Huber | first = Peter. J.| publisher= Wiley | year=2004}}{{Page needed|date=September 2010}}</ref> Specifically, the PMCC is neither distributionally robust,<ref>{{Cite book |last=Vaart |first=A. W. van der |url=http://dx.doi.org/10.1017/cbo9780511802256 |title=Asymptotic Statistics |date=1998-10-13 |publisher=Cambridge University Press |doi=10.1017/cbo9780511802256 |isbn=978-0-511-80225-6}}</ref> nor outlier resistant<ref name="wilcox"/> (see ''{{section link|Robust statistics#Definition}}''). Inspection of the [[scatterplot]] between ''X'' and ''Y'' will typically reveal a situation where lack of robustness might be an issue, and in such cases it may be advisable to use a robust measure of association. Note however that while most robust estimators of association measure [[statistical dependence]] in some way, they are generally not interpretable on the same scale as the Pearson correlation coefficient.

Statistical inference for Pearson's correlation coefficient is sensitive to the data distribution.  Exact tests, and asymptotic tests based on the [[Fisher transformation]] can be applied if the data are approximately normally distributed, but may be misleading otherwise.  In some situations, the [[bootstrapping (statistics)|bootstrap]] can be applied to construct confidence intervals, and [[permutation test]]s can be applied to carry out hypothesis tests.  These [[non-parametric statistics|non-parametric]] approaches may give more meaningful results in some situations where bivariate normality does not hold.  However the standard versions of these approaches rely on [[exchangeable random variables|exchangeability]] of the data, meaning that there is no ordering or grouping of the data pairs being analyzed that might affect the behavior of the correlation estimate.

A stratified analysis is one way to either accommodate a lack of bivariate normality, or to isolate the correlation resulting from one factor while controlling for another.  If ''W'' represents cluster membership or another factor that it is desirable to control, we can [[Stratified sampling|stratify]] the data based on the value of ''W'', then calculate a correlation coefficient within each stratum.  The stratum-level estimates can then be combined to estimate the overall correlation while controlling for ''W''.<ref>Katz., Mitchell H. (2006) ''Multivariable Analysis – A Practical Guide for Clinicians''. 2nd Edition.  Cambridge University Press. {{isbn|978-0-521-54985-1}}. {{isbn|0-521-54985-X}}</ref>