Editing Principal component analysis (section)

== Computation using the covariance method ==

The following is a detailed description of PCA using the covariance method<ref>See also the tutorial [http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf here]</ref> as opposed to the correlation method.<ref>{{cite web|title=Engineering Statistics Handbook Section 6.5.5.2|url=http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm|access-date=19 January 2015}}</ref>

The goal is to transform a given data set '''X''' of dimension ''p'' to an alternative data set '''Y''' of smaller dimension ''L''. Equivalently, we are seeking to find the matrix '''Y''', where '''Y''' is the [[Karhunen–Loève theorem|Karhunen–Loève]] transform (KLT) of matrix '''X''':

<math display="block"> \mathbf{Y} = \mathbb{KLT} \{ \mathbf{X} \} </math>
<ol>
<li>
'''Organize the data set'''
{{pb}}
Suppose you have data comprising a set of observations of ''p'' variables, and you want to reduce the data so that each observation can be described with only ''L'' variables, ''L'' < ''p''. Suppose further, that the data are arranged as a set of ''n'' data vectors <math>\mathbf{x}_1 \ldots \mathbf{x}_n</math> with each <math>\mathbf{x}_i </math> representing a single grouped observation of the ''p'' variables.
* Write <math>\mathbf{x}_1 \ldots \mathbf{x}_n</math> as row vectors, each with ''p'' elements.
* Place the row vectors into a single matrix '''X''' of dimensions ''n'' × ''p''.
</li>
<li>
'''Calculate the empirical mean'''

* Find the empirical mean along each column ''j'' = 1,&nbsp;...,&nbsp;''p''.
* Place the calculated mean values into an empirical mean vector '''u''' of dimensions ''p'' × 1. <math display="block">u_j = \frac{1}{n} \sum_{i=1}^n X_{ij} </math>
</li>
<li>
'''Calculate the deviations from the mean'''
{{pb}}
Mean subtraction is an integral part of the solution towards finding a principal component basis that minimizes the mean square error of approximating the data.<ref>A.A. Miranda, Y.-A. Le Borgne, and G. Bontempi. [http://www.ulb.ac.be/di/map/yleborgn/pub/NPL_PCA_07.pdf New Routes from Minimal Approximation Error to Principal Components], Volume 27, Number 3 / June, 2008, Neural Processing Letters, Springer</ref> Hence we proceed by centering the data as follows:
* Subtract the empirical mean vector <math> \mathbf{u}^{T} </math> from each row of the data matrix '''X'''.
* Store mean-subtracted data in the ''n'' × ''p'' matrix '''B'''. <math display="block">\mathbf{B} = \mathbf{X} - \mathbf{h}\mathbf{u}^T </math> where '''h''' is an {{math|''n'' × 1}} column vector of all&nbsp;1s: <math display="block">h_i = 1 \, \qquad \qquad \text{for } i = 1, \ldots, n </math>

In some applications, each variable (column of '''B''') may also be scaled to have a variance equal to 1 (see [[Z-score]]).<ref>{{Cite journal|author1=Abdi. H. |author2=Williams, L.J. |name-list-style=amp | author-link=AbdiWilliams | year = 2010 | title = Principal component analysis | journal = Wiley Interdisciplinary Reviews: Computational Statistics | volume = 2 | issue=4 | pages = 433–459 | doi = 10.1002/wics.101 |arxiv=1108.4372 |s2cid=122379222 }}</ref> This step affects the calculated principal components, but makes them independent of the units used to measure the different variables.
</li>
<li>
'''Find the covariance matrix'''

* Find the ''p'' × ''p'' empirical [[covariance matrix]] '''C''' from matrix '''B''': <math display="block">\mathbf{C} = { 1 \over {n-1} } \mathbf{B}^{*} \mathbf{B}</math> where <math> *</math> is the [[conjugate transpose]] operator. If '''B''' consists entirely of real numbers, which is the case in many applications, the "conjugate transpose" is the same as the regular [[transpose]].
* The reasoning behind using {{math|''n'' − 1}} instead of ''n'' to calculate the covariance is [[Bessel's correction]].
</li>
<li>
'''Find the eigenvectors and eigenvalues of the covariance matrix'''

* Compute the matrix '''V''' of [[eigenvector]]s which [[diagonalizable matrix|diagonalizes]] the covariance matrix '''C''': <math display="block">\mathbf{V}^{-1} \mathbf{C} \mathbf{V} = \mathbf{D} </math> where '''D''' is the [[diagonal matrix]] of [[eigenvalue]]s of '''C'''. This step will typically involve the use of a computer-based algorithm for [[Eigendecomposition of a matrix|computing eigenvectors and eigenvalues]]. These algorithms are readily available as sub-components of most [[matrix algebra]] systems, such as [[SAS (software)|SAS]],<ref>{{Cite web | url=http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_princomp_sect001.htm | title=SAS/STAT(R) 9.3 User's Guide}}</ref> [[R (programming language)|R]], [[MATLAB]],<ref>[http://www.mathworks.com/access/helpdesk/help/techdoc/ref/eig.html#998306 eig function] Matlab documentation</ref><ref>{{Cite web|url=https://www.mathworks.com/matlabcentral/fileexchange/24634-face-recognition-system-pca-based|title=Face Recognition System-PCA based|website=www.mathworks.com|date=19 June 2023 }}</ref> [[Mathematica]],<ref>[http://reference.wolfram.com/mathematica/ref/Eigenvalues.html Eigenvalues function] Mathematica documentation</ref> [[SciPy]], [[IDL (programming language)|IDL]] ([[Interactive Data Language]]), or [[GNU Octave]] as well as [[OpenCV]].
* Matrix '''D''' will take the form of an ''p'' × ''p'' diagonal matrix, where <math display="block">D_{k\ell} = \lambda_k \qquad \text{for } k = \ell</math> is the ''j''th eigenvalue of the covariance matrix '''C''', and <math display="block">D_{k\ell} = 0 \qquad \text{for } k \ne \ell.</math>
* Matrix '''V''', also of dimension ''p'' × ''p'', contains ''p'' column vectors, each of length ''p'', which represent the ''p'' eigenvectors of the covariance matrix '''C'''.
* The eigenvalues and eigenvectors are ordered and paired. The ''j''th eigenvalue corresponds to the ''j''th eigenvector.
* Matrix '''V''' denotes the matrix of ''right'' eigenvectors (as opposed to ''left'' eigenvectors). In general, the matrix of right eigenvectors need ''not'' be the (conjugate) transpose of the matrix of left eigenvectors.
</li>
<li>
'''Rearrange the eigenvectors and eigenvalues'''

* Sort the columns of the eigenvector matrix '''V''' and eigenvalue matrix '''D''' in order of ''decreasing'' eigenvalue.
* Make sure to maintain the correct pairings between the columns in each matrix.
</li>
<li>
'''Compute the cumulative energy content for each eigenvector'''

* The eigenvalues represent the distribution of the source data's energy{{Clarify|date=March 2011}} among each of the eigenvectors, where the eigenvectors form a [[basis (linear algebra)|basis]] for the data. The cumulative energy content ''g'' for the ''j''th eigenvector is the sum of the energy content across all of the eigenvalues from 1 through ''j'':{{Citation needed|date=March 2011}} <math display="block">g_j = \sum_{k=1}^j D_{kk} \qquad \text{for } j = 1,\dots,p </math>
</li>
<li>
'''Select a subset of the eigenvectors as basis vectors'''

* Save the first ''L'' columns of '''V''' as the ''p'' × ''L'' matrix '''W''': <math display="block"> W_{kl} = V_{k\ell} \qquad \text{for } k = 1,\dots,p \qquad \ell = 1,\dots,L </math> where <math display="block">1 \leq L \leq p.</math>
* Use the vector '''g''' as a guide in choosing an appropriate value for ''L''. The goal is to choose a value of ''L'' as small as possible while achieving a reasonably high value of ''g'' on a percentage basis. For example, you may want to choose ''L'' so that the cumulative energy ''g'' is above a certain threshold, like 90 percent. In this case, choose the smallest value of ''L'' such that <math display="block"> \frac{g_L}{g_p} \ge 0.9 </math>
</li>
<li>
'''Project the data onto the new basis'''

* The projected data points are the rows of the matrix <math display="block"> \mathbf{T} = \mathbf{B} \cdot \mathbf{W}</math>
That is, the first column of <math>\mathbf{T}</math> is the projection of the data points onto the first principal component, the second column is the projection onto the second principal component, etc.
</li>
</ol>