Editing Canonical correlation (section)

==Computation==

===Derivation===
Let <math>\Sigma _{XY}</math> be the [[cross-covariance matrix]] for any pair of (vector-shaped) random variables <math>X</math> and <math>Y</math>. The target function to maximize is

:<math>
\rho = \frac{a^T \Sigma _{XY} b}{\sqrt{a^T \Sigma _{XX} a} \sqrt{b^T \Sigma _{YY} b}}.
</math>

The first step is to define a [[change of basis]] and define

:<math>
c = \Sigma _{XX} ^{1/2} a,
</math>

:<math>
d = \Sigma _{YY} ^{1/2} b,
</math>
where <math>\Sigma_{XX}^{1/2}</math> and <math>\Sigma_{YY}^{1/2}</math> can be obtained from the eigen-decomposition (or by [[Square root of a matrix#By diagonalization|diagonalization]]):

:<math>
\Sigma _{XX} ^{1/2} = V_X D_X^{1/2} V_X^\top,\qquad V_X D_X V_X^\top = \Sigma_{XX},
</math>
and
:<math>
\Sigma _{YY} ^{1/2} = V_Y D_Y^{1/2} V_Y^\top,\qquad V_Y D_Y V_Y^\top = \Sigma_{YY}.
</math>

Thus
:<math>
\rho = \frac{c^T \Sigma _{XX} ^{-1/2} \Sigma _{XY} \Sigma _{YY} ^{-1/2} d}{\sqrt{c^Tc} \sqrt{d^Td}}.
</math>

By the [[Cauchy–Schwarz inequality]], ...can someone check the this, particularly the term to the right of "(d) leq"?
:<math>
\left(c^T \Sigma _{XX} ^{-1/2} \Sigma _{XY} \Sigma _{YY} ^{-1/2}  \right)  (d)  \leq \left(c^T \Sigma _{XX} ^{-1/2} \Sigma _{XY} \Sigma _{YY} ^{-1/2} \Sigma _{YY} ^{-1/2} \Sigma _{YX} \Sigma _{XX} ^{-1/2} c \right)^{1/2} \left(d^T d \right)^{1/2},
</math>

:<math>
\rho \leq \frac{\left(c^T \Sigma _{XX}^{-1/2} \Sigma _{XY} \Sigma _{YY}^{-1} \Sigma _{YX} \Sigma_{XX}^{-1/2} c \right)^{1/2}}{\left(c^T c \right)^{1/2}}.
</math>

There is equality if the vectors <math>d</math> and <math>\Sigma_{YY}^{-1/2} \Sigma_{YX} \Sigma_{XX}^{-1/2} c</math> are collinear. In addition, the maximum of correlation is attained if <math>c</math> is the [[eigenvector]] with the maximum eigenvalue for the matrix <math>\Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX} \Sigma_{XX}^{-1/2}</math> (see [[Rayleigh quotient]]). The subsequent pairs are found by using [[eigenvalues]] of decreasing magnitudes. Orthogonality is guaranteed by the symmetry of the correlation matrices.

Another way of viewing this computation is that <math>c</math> and <math>d</math> are the left and right [[Singular value decomposition|singular vectors]] of the correlation matrix of X and Y corresponding to the highest singular value.

===Solution===
The solution is therefore:
* <math>c</math> is an eigenvector of <math>\Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX} \Sigma_{XX}^{-1/2}</math>
* <math>d</math> is proportional to <math>\Sigma _{YY}^{-1/2} \Sigma_{YX} \Sigma_{XX}^{-1/2} c</math>

Reciprocally, there is also:
* <math>d</math> is an eigenvector of <math>\Sigma_{YY}^{-1/2} \Sigma_{YX} \Sigma_{XX}^{-1} \Sigma_{XY} \Sigma_{YY}^{-1/2}</math>
* <math>c</math> is proportional to <math>\Sigma_{XX}^{-1/2} \Sigma_{XY} \Sigma_{YY}^{-1/2} d</math>

Reversing the change of coordinates, we have that
* <math>a</math> is an eigenvector of <math>\Sigma_{XX}^{-1} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX}</math>,
* <math>b</math> is proportional to <math>\Sigma_{YY}^{-1} \Sigma_{YX} a;</math>
* <math>b</math> is an eigenvector of <math>\Sigma _{YY}^{-1} \Sigma_{YX} \Sigma_{XX}^{-1} \Sigma_{XY},</math>
* <math>a</math> is proportional to <math>\Sigma_{XX}^{-1} \Sigma_{XY} b</math>.

The canonical variables are defined by:

:<math>U = c^T \Sigma_{XX}^{-1/2} X = a^T X</math>

:<math>V = d^T \Sigma_{YY}^{-1/2} Y = b^T Y</math>

===Implementation===
CCA can be computed using [[singular value decomposition]] on a correlation matrix.<ref>{{Cite journal | last1 = Hsu | first1 = D. | last2 = Kakade | first2 = S. M. | last3 = Zhang | first3 = T. | doi = 10.1016/j.jcss.2011.12.025 | title = A spectral algorithm for learning Hidden Markov Models | journal = Journal of Computer and System Sciences | volume = 78 | issue = 5 | pages = 1460 | year = 2012 | url = http://www.cs.mcgill.ca/~colt2009/papers/011.pdf| arxiv = 0811.4413| s2cid = 220740158 }}</ref> It is available as a function in<ref>{{Cite journal | last1 = Huang | first1 = S. Y. | last2 = Lee | first2 = M. H. | last3 = Hsiao | first3 = C. K. | doi = 10.1016/j.jspi.2008.10.011 | title = Nonlinear measures of association with kernel canonical correlation analysis and applications | journal = Journal of Statistical Planning and Inference | volume = 139 | issue = 7 | pages = 2162 | year = 2009 | url = http://www.stat.sinica.edu.tw/syhuang/papersdownload/KCCA-080906.pdf | access-date = 2015-09-04 | archive-date = 2017-03-13 | archive-url = https://web.archive.org/web/20170313203427/http://www.stat.sinica.edu.tw/syhuang/papersdownload/KCCA-080906.pdf | url-status = dead }}</ref>

* [[MATLAB]] as [http://www.mathworks.co.uk/help/stats/canoncorr.html canoncorr] ([https://sourceforge.net/p/octave/statistics/ci/default/tree/inst/canoncorr.m also] in [[GNU Octave|Octave]]) 
* [[R (programming language)|R]] as the standard function [http://stat.ethz.ch/R-manual/R-devel/library/stats/html/cancor.html cancor] and several other packages, including [https://cran.r-project.org/package=candisc candisc], [https://cran.r-project.org/web/packages/CCA/index.html CCA] and [https://cran.r-project.org/web/packages/vegan/index.html vegan]. [https://cran.r-project.org/web/packages/CCP/index.html CCP] for statistical hypothesis testing in canonical correlation analysis.
* [[SAS language|SAS]] as [https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_cancorr_toc.htm proc cancorr]
* [[Python (programming language)|Python]] in the library [[scikit-learn]], as [http://scikit-learn.org/stable/modules/cross_decomposition.html cross decomposition] and in statsmodels, as [https://devdocs.io/statsmodels/generated/statsmodels.multivariate.cancorr.cancorr CanCorr]. The CCA-Zoo library <ref>{{Cite journal |last1=Chapman |first1=James |last2=Wang |first2=Hao-Ting |date=2021-12-18 |title=CCA-Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic CCA methods in a scikit-learn style framework |journal=Journal of Open Source Software |language=en |volume=6 |issue=68 |pages=3823 |doi=10.21105/joss.03823 |issn=2475-9066|doi-access=free |bibcode=2021JOSS....6.3823C }}</ref> implements CCA extensions, such as probabilistic CCA, sparse CCA, multi-view CCA, and deep CCA.
* [[SPSS]] as macro CanCorr shipped with the main software
*[[Julia (programming language)]] in the [https://github.com/JuliaStats/MultivariateStats.jl MultivariateStats.jl] package.

CCA computation using [[singular value decomposition]] on a correlation matrix is related to the [[cosine]] of the [[angles between flats]]. The [[cosine]] function is [[ill-conditioned]] for small angles, leading to very inaccurate computation of highly correlated principal vectors in finite [[Precision (computer science)|precision]] [[computer arithmetic]]. To [[Angles_between_flats#Computation|fix this trouble]], alternative algorithms<ref name="KA02">{{Citation
  | last1 = Knyazev
  | first1 = A.V.
  | last2 = Argentati
  | first2 = M.E.
  | title =  Principal Angles between Subspaces in an A-Based Scalar Product: Algorithms and Perturbation Estimates
  | journal =  SIAM Journal on Scientific Computing
  | volume = 23
  | issue = 6
  | pages = 2009–2041
  | year = 2002
  | doi = 10.1137/S1064827500377332
  | bibcode = 2002SJSC...23.2008K
 | citeseerx = 10.1.1.73.2914
  }}</ref> are available in

* [[SciPy]] as [https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.subspace_angles.html linear-algebra function subspace_angles] 
* [[MATLAB]] as [https://www.mathworks.com/matlabcentral/fileexchange/55-subspacea-m FileExchange function subspacea]