Editing Principal component analysis (section)

== Covariance-free computation ==

In practical implementations, especially with [[high dimensional data]] (large {{mvar|p}}), the naive covariance method is rarely used because it is not efficient due to high computational and memory costs of explicitly determining the covariance matrix. The covariance-free approach avoids the {{math|''np''<sup>2</sup>}} operations of explicitly calculating and storing the covariance matrix {{math|'''X<sup>T</sup>X'''}}, instead utilizing one of [[matrix-free methods]], for example, based on the function evaluating the product {{math|'''X<sup>T</sup>(X r)'''}} at the cost of {{math|2''np''}} operations.

=== Iterative computation ===

One way to compute the first principal component efficiently<ref name="roweis">Roweis, Sam. "EM Algorithms for PCA and SPCA." Advances in Neural Information Processing Systems. Ed. Michael I. Jordan, Michael J. Kearns, and [[Sara A. Solla]] The MIT Press, 1998.</ref> is shown in the following pseudo-code, for a data matrix {{math|'''X'''}} with zero mean, without ever computing its covariance matrix.

 {{math|'''r'''}} = a random vector of length {{mvar|p}}
 '''r''' = '''r''' / norm('''r''')
 do {{mvar|c}} times:
       {{math|1='''s''' = 0}} (a vector of length {{mvar|p}})
       {{nowrap|for each row '''x''' in '''X'''}}
             {{nowrap|1='''s''' = '''s''' + ('''x''' ⋅ '''r''') '''x'''}}
       {{nowrap|1=λ = '''r'''<sup>T</sup>'''s'''}} {{nowrap|// λ is the eigenvalue}}
       {{nowrap|1=error = {{!}}λ ⋅ '''r''' − '''s'''{{!}}}}
       {{nowrap|1='''r''' = '''s''' / norm('''s''')}}
       {{nowrap|exit if error < tolerance}}
 return {{nowrap|λ, '''r'''}}

This [[power iteration]] algorithm simply calculates the vector {{math|'''X<sup>T</sup>(X r)'''}}, normalizes, and places the result back in {{math|'''r'''}}. The eigenvalue is approximated by {{math|'''r<sup>T</sup> (X<sup>T</sup>X) r'''}}, which is the [[Rayleigh quotient]] on the unit vector {{math|'''r'''}} for the covariance matrix {{math|'''X<sup>T</sup>X '''}}. If the largest singular value is well separated from the next largest one, the vector {{math|'''r'''}} gets close to the first principal component of {{math|'''X'''}} within the number of iterations {{mvar|c}}, which is small relative to {{mvar|p}}, at the total cost {{math|''2cnp''}}. The [[power iteration]] convergence can be accelerated without noticeably sacrificing the small cost per iteration using more advanced [[matrix-free methods]], such as the [[Lanczos algorithm]] or the Locally Optimal Block Preconditioned Conjugate Gradient ([[LOBPCG]]) method.

Subsequent principal components can be computed one-by-one via deflation or simultaneously as a block. In the former approach, imprecisions in already computed approximate principal components additively affect the accuracy of the subsequently computed principal components, thus increasing the error with every new computation. The latter approach in the block power method replaces single-vectors {{math|'''r'''}} and {{math|'''s'''}} with block-vectors, matrices {{math|'''R'''}} and {{math|'''S'''}}. Every column of {{math|'''R'''}} approximates one of the leading principal components, while all columns are iterated simultaneously. The main calculation is evaluation of the product {{math|'''X<sup>T</sup>(X R)'''}}. Implemented, for example, in [[LOBPCG]], efficient blocking eliminates the accumulation of the errors, allows using high-level [[BLAS]] matrix-matrix product functions, and typically leads to faster convergence, compared to the single-vector one-by-one technique.

=== The NIPALS method ===
''Non-linear iterative partial least squares (NIPALS)'' is a variant the classical [[power iteration]] with matrix deflation by subtraction implemented for computing the first few components in a principal component or [[partial least squares]] analysis. For very-high-dimensional datasets, such as those generated in the *omics sciences (for example, [[genomics]], [[metabolomics]]) it is usually only necessary to compute the first few PCs. The [[non-linear iterative partial least squares]] (NIPALS) algorithm updates iterative approximations to the leading scores and loadings '''t'''<sub>1</sub> and '''r'''<sub>1</sub><sup>T</sup> by the [[power iteration]] multiplying on every iteration by '''X''' on the left and on the right, that is, calculation of the covariance matrix is avoided, just as in the matrix-free implementation of the power iterations to {{math|'''X<sup>T</sup>X'''}}, based on the function evaluating the product {{math|1='''X<sup>T</sup>(X r)''' = '''((X r)<sup>T</sup>X)<sup>T</sup>'''}}.

The matrix deflation by subtraction is performed by subtracting the outer product, '''t'''<sub>1</sub>'''r'''<sub>1</sub><sup>T</sup> from '''X''' leaving the deflated residual matrix used to calculate the subsequent leading PCs.<ref>{{Cite journal
  | last1 = Geladi
  | first1 = Paul
  | last2 = Kowalski
  | first2 = Bruce
  | title = Partial Least Squares Regression:A Tutorial
  | journal = Analytica Chimica Acta
  | volume = 185
  | pages = 1–17
  | year = 1986
  | doi = 10.1016/0003-2670(86)80028-9
  | bibcode = 1986AcAC..185....1G
 }}</ref>
For large data matrices, or matrices that have a high degree of column collinearity, NIPALS suffers from loss of orthogonality of PCs due to machine precision [[round-off errors]] accumulated in each iteration and matrix deflation by subtraction.<ref>{{cite book |last=Kramer |first=R. |year=1998 |title=Chemometric Techniques for Quantitative Analysis |publisher=CRC Press |location=New York |isbn= 9780203909805|url=https://books.google.com/books?id=iBpOzwAOfHYC}}</ref> A [[Gram–Schmidt]] re-orthogonalization algorithm is applied to both the scores and the loadings at each iteration step to eliminate this loss of orthogonality.<ref>{{cite journal |first=M. |last=Andrecut |title=Parallel GPU Implementation of Iterative PCA Algorithms |journal=Journal of Computational Biology |volume=16 |issue=11 |year=2009 |pages=1593–1599 |doi=10.1089/cmb.2008.0221 |pmid=19772385 |arxiv=0811.1081 |s2cid=1362603 }}</ref> NIPALS reliance on single-vector multiplications cannot take advantage of high-level [[BLAS]] and results in slow convergence for clustered leading singular values—both these deficiencies are resolved in more sophisticated matrix-free block solvers, such as the Locally Optimal Block Preconditioned Conjugate Gradient ([[LOBPCG]]) method.

=== Online/sequential estimation ===
In an "online" or "streaming" situation with data arriving piece by piece rather than being stored in a single batch, it is useful to make an estimate of the PCA projection that can be updated sequentially. This can be done efficiently, but requires different algorithms.<ref>{{Cite journal
  | last1 = Warmuth
  | first1 = M. K.
  | last2 = Kuzmin
  | first2 = D.
  | title = Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension
  | journal = Journal of Machine Learning Research
  | volume = 9
  | pages = 2287–2320
  | year = 2008
  | url = http://www.jmlr.org/papers/volume9/warmuth08a/warmuth08a.pdf}}</ref>