Editing Principal component analysis (section)

=== Iterative computation ===

One way to compute the first principal component efficiently<ref name="roweis">Roweis, Sam. "EM Algorithms for PCA and SPCA." Advances in Neural Information Processing Systems. Ed. Michael I. Jordan, Michael J. Kearns, and [[Sara A. Solla]] The MIT Press, 1998.</ref> is shown in the following pseudo-code, for a data matrix {{math|'''X'''}} with zero mean, without ever computing its covariance matrix.

 {{math|'''r'''}} = a random vector of length {{mvar|p}}
 '''r''' = '''r''' / norm('''r''')
 do {{mvar|c}} times:
       {{math|1='''s''' = 0}} (a vector of length {{mvar|p}})
       {{nowrap|for each row '''x''' in '''X'''}}
             {{nowrap|1='''s''' = '''s''' + ('''x''' ⋅ '''r''') '''x'''}}
       {{nowrap|1=λ = '''r'''<sup>T</sup>'''s'''}} {{nowrap|// λ is the eigenvalue}}
       {{nowrap|1=error = {{!}}λ ⋅ '''r''' − '''s'''{{!}}}}
       {{nowrap|1='''r''' = '''s''' / norm('''s''')}}
       {{nowrap|exit if error < tolerance}}
 return {{nowrap|λ, '''r'''}}

This [[power iteration]] algorithm simply calculates the vector {{math|'''X<sup>T</sup>(X r)'''}}, normalizes, and places the result back in {{math|'''r'''}}. The eigenvalue is approximated by {{math|'''r<sup>T</sup> (X<sup>T</sup>X) r'''}}, which is the [[Rayleigh quotient]] on the unit vector {{math|'''r'''}} for the covariance matrix {{math|'''X<sup>T</sup>X '''}}. If the largest singular value is well separated from the next largest one, the vector {{math|'''r'''}} gets close to the first principal component of {{math|'''X'''}} within the number of iterations {{mvar|c}}, which is small relative to {{mvar|p}}, at the total cost {{math|''2cnp''}}. The [[power iteration]] convergence can be accelerated without noticeably sacrificing the small cost per iteration using more advanced [[matrix-free methods]], such as the [[Lanczos algorithm]] or the Locally Optimal Block Preconditioned Conjugate Gradient ([[LOBPCG]]) method.

Subsequent principal components can be computed one-by-one via deflation or simultaneously as a block. In the former approach, imprecisions in already computed approximate principal components additively affect the accuracy of the subsequently computed principal components, thus increasing the error with every new computation. The latter approach in the block power method replaces single-vectors {{math|'''r'''}} and {{math|'''s'''}} with block-vectors, matrices {{math|'''R'''}} and {{math|'''S'''}}. Every column of {{math|'''R'''}} approximates one of the leading principal components, while all columns are iterated simultaneously. The main calculation is evaluation of the product {{math|'''X<sup>T</sup>(X R)'''}}. Implemented, for example, in [[LOBPCG]], efficient blocking eliminates the accumulation of the errors, allows using high-level [[BLAS]] matrix-matrix product functions, and typically leads to faster convergence, compared to the single-vector one-by-one technique.