Editing Correlation (section)

==Correlation matrices==

The correlation matrix of <math>n</math> random variables <math>X_1,\ldots,X_n</math> is the <math>n \times n</math> matrix <math>C</math> whose <math>(i,j)</math> entry is
:<math>c_{ij}:=\operatorname{corr}(X_i,X_j)=\frac{\operatorname{cov}(X_i,X_j)}{\sigma_{X_i}\sigma_{X_j}},\quad \text{if}\ \sigma_{X_i}\sigma_{X_j}>0.</math>
Thus the diagonal entries are all identically [[unity (number)|one]].  If the measures of correlation used are product-moment coefficients, the correlation matrix is the same as the [[covariance matrix]] of the [[standardized variable|standardized random variables]] <math>X_i / \sigma(X_i)</math> for <math>i = 1, \dots, n</math>.  This applies both to the matrix of population correlations (in which case <math>\sigma</math> is the population standard deviation), and to the matrix of sample correlations (in which case <math>\sigma</math> denotes the sample standard deviation). Consequently, each is necessarily a [[positive-semidefinite matrix]]. Moreover, the correlation matrix is strictly [[positive definite matrix|positive definite]] if no variable can have all its values exactly generated as a linear function of the values of the others.

The correlation matrix is symmetric because the correlation between <math>X_i</math> and <math>X_j</math> is the same as the correlation between <math>X_j</math> and <math>X_i</math>.

A correlation matrix appears, for example, in one formula for the [[coefficient of multiple determination#Computation|coefficient of multiple determination]], a measure of goodness of fit in [[multiple regression]].

In [[statistical modelling]], correlation matrices representing the relationships between variables are categorized into different correlation structures, which are distinguished by factors such as the number of parameters required to estimate them. For example, in an [[Exchangeability|exchangeable]] correlation matrix, all pairs of variables are modeled as having the same correlation, so all non-diagonal elements of the matrix are equal to each other. On the other hand, an [[Autoregressive model|autoregressive]] matrix is often used when variables represent a time series, since correlations are likely to be greater when measurements are closer in time. Other examples include independent, unstructured, M-dependent, and [[Toeplitz matrix|Toeplitz]].

In [[exploratory data analysis]], the [[iconography of correlations]] consists in replacing a correlation matrix by a diagram where the "remarkable" correlations are represented by a solid line (positive correlation), or a dotted line (negative correlation).

===Nearest valid correlation matrix===
In some applications (e.g., building data models from only partially observed data) one wants to find the "nearest" correlation matrix to an "approximate" correlation matrix (e.g., a matrix which typically lacks semi-definite positiveness due to the way it has been computed).

In 2002, Higham<ref>{{cite journal|title=Computing the nearest correlation matrix—a problem from finance|journal=IMA Journal of Numerical Analysis|date=2002|first=Nicholas J.|last=Higham|volume=22|issue=3|pages=329–343|doi=10.1093/imanum/22.3.329|citeseerx=10.1.1.661.2180}}</ref> formalized the notion of nearness using the [[Frobenius norm]] and provided a method for computing the nearest correlation matrix using the [[Dykstra's projection algorithm]], of which an implementation is available as an online Web API.<ref>{{Cite web|url=https://portfoliooptimizer.io/|title=Portfolio Optimizer |website=portfoliooptimizer.io|access-date=2021-01-30}}</ref>

This sparked interest in the subject, with new theoretical (e.g., computing the nearest correlation matrix with factor structure<ref>{{cite journal|title=Computing a Nearest Correlation Matrix with Factor Structure.|journal= SIAM J. Matrix Anal. Appl.|date=2010|first1=Rudiger|last1=Borsdorf|first2=Nicholas J.|last2=Higham|first3=Marcos|last3=Raydan|volume=31|issue=5|pages=2603–2622|doi=10.1137/090776718|url= http://eprints.maths.manchester.ac.uk/1523/1/SML002603.pdf}}</ref>) and numerical (e.g. usage the [[Newton's method]] for computing the nearest correlation matrix<ref>{{cite journal|title=A quadratically convergent Newton method for computing the nearest correlation matrix.|journal= SIAM J. Matrix Anal. Appl.|date=2006|first1=HOUDUO|last1=Qi|first2=DEFENG|last2=Sun|volume=28|issue=2|pages=360–385|doi=10.1137/050624509}}</ref>) results obtained in the subsequent years.