Editing Principal component analysis (section)

=== Singular value decomposition ===
{{Main|Singular value decomposition}}
The principal components transformation can also be associated with another matrix factorization, the [[singular value decomposition]] (SVD) of '''X''',
:<math>\mathbf{X} = \mathbf{U}\mathbf{\Sigma}\mathbf{W}^T</math>
Here '''Σ''' is an ''n''-by-''p'' [[Diagonal matrix|rectangular diagonal matrix]] of positive numbers ''σ''<sub>(''k'')</sub>, called the singular values of '''X'''; '''U''' is an ''n''-by-''n'' matrix, the columns of which are orthogonal unit vectors of length ''n'' called the left singular vectors of '''X'''; and '''W''' is a ''p''-by-''p'' matrix whose columns are orthogonal unit vectors of length ''p'' and called the right singular vectors of '''X'''.

In terms of this factorization, the matrix '''X'''<sup>T</sup>'''X''' can be written
:<math>\begin{align}
\mathbf{X}^T\mathbf{X}
& = \mathbf{W}\mathbf{\Sigma}^\mathsf{T} \mathbf{U}^\mathsf{T} \mathbf{U}\mathbf{\Sigma}\mathbf{W}^\mathsf{T} \\
& = \mathbf{W}\mathbf{\Sigma}^\mathsf{T} \mathbf{\Sigma} \mathbf{W}^\mathsf{T} \\
& = \mathbf{W}\mathbf{\hat{\Sigma}}^2 \mathbf{W}^\mathsf{T}
\end{align}</math>

where ''' <math> \mathbf{\hat{\Sigma}} </math>''' is the square diagonal matrix with the singular values of '''X '''and the excess zeros chopped off that satisfies''' <math> \mathbf{\hat{\Sigma}^2}=\mathbf{\Sigma}^\mathsf{T} \mathbf{\Sigma} </math>'''. Comparison with the eigenvector factorization of '''X'''<sup>T</sup>'''X''' establishes that the right singular vectors '''W''' of '''X''' are equivalent to the eigenvectors of '''X'''<sup>T</sup>'''X''', while the singular values ''σ''<sub>(''k'')</sub> of ''' <math> \mathbf{{X}}</math>''' are equal to the square-root of the eigenvalues ''λ''<sub>(''k'')</sub> of '''X'''<sup>T</sup>'''X'''.

Using the singular value decomposition the score matrix '''T''' can be written
:<math>\begin{align}
\mathbf{T}
& = \mathbf{X} \mathbf{W} \\
& = \mathbf{U}\mathbf{\Sigma}\mathbf{W}^\mathsf{T} \mathbf{W} \\
& = \mathbf{U}\mathbf{\Sigma}
\end{align}</math>
so each column of '''T''' is given by one of the left singular vectors of '''X''' multiplied by the corresponding singular value. This form is also the [[polar decomposition]] of '''T'''.

Efficient algorithms exist to calculate the SVD of '''X''' without having to form the matrix '''X'''<sup>T</sup>'''X''', so computing the SVD is now the standard way to calculate a principal components analysis from a data matrix,<ref>{{Cite book |last1=Boyd |first1=Stephen |url=http://dx.doi.org/10.1017/cbo9780511804441 |title=Convex Optimization |last2=Vandenberghe |first2=Lieven |date=2004-03-08 |publisher=Cambridge University Press |doi=10.1017/cbo9780511804441 |isbn=978-0-521-83378-3}}</ref> unless only a handful of components are required.

As with the eigen-decomposition, a truncated {{math|''n'' × ''L''}} score matrix '''T'''<sub>L</sub> can be obtained by considering only the first L largest singular values and their singular vectors:
:<math>\mathbf{T}_L = \mathbf{U}_L\mathbf{\Sigma}_L = \mathbf{X} \mathbf{W}_L </math>
The truncation of a matrix '''M''' or '''T''' using a truncated singular value decomposition in this way produces a truncated matrix that is the nearest possible matrix of [[Rank (linear algebra)|rank]] ''L'' to the original matrix, in the sense of the difference between the two having the smallest possible [[Frobenius norm]], a result known as the [[Low-rank approximation#Proof of Eckart–Young–Mirsky theorem (for Frobenius norm)|Eckart–Young theorem]] [1936].

<blockquote>
'''Theorem (Optimal k‑dimensional fit).'''  
Let P be an n×m data matrix whose columns have been mean‑centered and scaled, and let  
<math>P = U \,\Sigma\, V^{T}</math>  
be its singular value decomposition. Then the best rank‑k approximation to P in the least‑squares (Frobenius‑norm) sense is  
<math>P_{k} = U_{k}\,\Sigma_{k}\,V_{k}^{T}</math>,  
where V<sub>k</sub> consists of the first k columns of V. Moreover, the relative residual variance is  
<math>R(k)=\frac{\sum_{j=k+1}^{m}\sigma_{j}^{2}}{\sum_{j=1}^{m}\sigma_{j}^{2}}</math>.
</blockquote><ref name="Holmes2023" />