Editing Canonical correlation (section)

{{Machine learning|Dimensionality reduction}}
{{short description|Way of inferring information from cross-covariance matrices}}
In [[statistics]], '''canonical-correlation analysis''' ('''CCA'''), also called '''canonical variates analysis''', is a way of inferring information from [[cross-covariance matrix|cross-covariance matrices]]. If we have two vectors ''X''&nbsp;=&nbsp;(''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub>) and ''Y''&nbsp;=&nbsp;(''Y''<sub>1</sub>,&nbsp;...,&nbsp;''Y''<sub>''m''</sub>)  of [[random variable]]s, and there are [[correlation]]s among the variables, then canonical-correlation analysis will find [[linear combinations]] of ''X'' and ''Y'' that have a maximum correlation with each other.<ref>{{Cite book | doi = 10.1007/978-3-540-72244-1_14 | chapter = Canonical Correlation Analysis | title = Applied Multivariate Statistical Analysis | pages = 321–330 | year = 2007 | isbn = 978-3-540-72243-4 | first1 = Wolfgang | last1 = Härdle| first2 = Léopold | last2 = Simar| citeseerx = 10.1.1.324.403 }}</ref> T. R. Knapp notes that "virtually all of the commonly encountered [[parametric statistics|parametric test]]s of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables."<ref>{{Cite journal | last1 = Knapp | first1 = T. R. | title = Canonical correlation analysis: A general parametric significance-testing system | doi = 10.1037/0033-2909.85.2.410 | journal = Psychological Bulletin | volume = 85 | issue = 2 | pages = 410–416 | year = 1978 }}</ref> The method was first introduced by [[Harold Hotelling]] in 1936,<ref>{{Cite journal | last1 = Hotelling | first1 = H. | author-link1 = Harold Hotelling| title = Relations Between Two Sets of Variates | doi = 10.1093/biomet/28.3-4.321 | journal = Biometrika | volume = 28 | issue = 3–4 | pages = 321–377 | year = 1936 | jstor = 2333955}}</ref> although in the context of [[angles between flats]] the mathematical concept was published by [[Camille Jordan]] in 1875.<ref name="jordan">{{cite journal
  |last=Jordan
  |first=C.
  |author-link=Camille Jordan
  |date=1875
  |title=Essai sur la géométrie à <math>n</math> dimensions
  |journal=Bull. Soc. Math. France
  |volume=3
  |pages=103
  |url=http://www.numdam.org/item?id=BSMF_1875__3__103_2
}}</ref>

CCA is now a cornerstone of multivariate statistics and multi-view learning, and a great number of interpretations and extensions have been proposed, such as probabilistic CCA, sparse CCA, multi-view CCA, deep CCA,<ref>{{Cite journal |last=Andrew |first=Galen |last2=Arora |first2=Raman |last3=Bilmes |first3=Jeff |last4=Livescu |first4=Karen |date=2013-05-26 |title=Deep Canonical Correlation Analysis |url=https://proceedings.mlr.press/v28/andrew13.html |journal=Proceedings of the 30th International Conference on Machine Learning |language=en |publisher=PMLR |pages=1247–1255}}</ref> and DeepGeoCCA.<ref>{{Cite book |author=Ju |first1=Ce |url=https://openreview.net/pdf?id=PnR1MNen7u |title=Deep Geodesic Canonical Correlation Analysis for Covariance-Based Neuroimaging Data |last2=Kobler |first2=Reinmar J |last3=Tang |first3=Liyao |last4=Guan |first4=Cuntai |last5=Kawanabe |first5=Motoaki |publisher=The Twelfth International Conference on Learning Representations (ICLR 2024, spotlight) |year=2024}}</ref> Unfortunately, perhaps because of its popularity, the literature can be inconsistent with notation, we attempt to highlight such inconsistencies in this article to help the reader make best use of the existing literature and techniques available.

Like its sister method [[Principal component analysis|PCA]], CCA can be viewed in ''population'' form (corresponding to random vectors and their covariance matrices) or in ''sample'' form (corresponding to datasets and their sample covariance matrices). These two forms are almost exact analogues of each other, which is why their distinction is often overlooked, but they can behave very differently in high dimensional settings.<ref>{{Cite web |title=Statistical Learning with Sparsity: the Lasso and Generalizations |url=https://hastie.su.domains/StatLearnSparsity/ |access-date=2023-09-12 |website=hastie.su.domains}}</ref> We next give explicit mathematical definitions for the population problem and highlight the different objects in the so-called ''canonical decomposition'' - understanding the differences between these objects is crucial for interpretation of the technique.