Editing Inverse iteration

In [[numerical analysis]], '''inverse iteration''' (also known as the ''inverse power method'') is an [[Iterative method|iterative]] [[eigenvalue algorithm]]. It allows one to find an approximate
[[eigenvector]] when an approximation to a corresponding [[eigenvalue]] is already known.
The method is conceptually similar to  the [[power method]].
It appears to have originally been developed to compute resonance frequencies in the field of structural mechanics.<ref name=Pohlhausen>Ernst Pohlhausen, ''Berechnung der Eigenschwingungen statisch-bestimmter Fachwerke'', ZAMM - Zeitschrift für Angewandte Mathematik und Mechanik 1, 28-42 (1921).</ref>

The inverse power iteration algorithm starts with an approximation <math>\mu</math> for the [[eigenvalue]] corresponding to the desired [[eigenvector]] and a vector <math>b_0</math>, either a randomly selected vector or an approximation to the eigenvector. The method is described by the iteration

<math display="block"> b_{k+1} = \frac{(A - \mu I)^{-1}b_k}{C_k}, </math>

where <math>C_k</math> are some constants usually chosen as <math>C_k= \|(A - \mu I)^{-1}b_k \|. </math> Since eigenvectors are defined up to multiplication by constant, the choice of <math>C_k</math> can be arbitrary in theory; practical aspects of the choice of <math>C_k</math> are discussed below.

At every iteration, the vector <math>b_k</math> is multiplied by the matrix <math>(A - \mu I)^{-1}</math> and normalized.
It is exactly the same formula as in the [[power method]], except replacing the matrix <math>A</math> by <math>(A - \mu I)^{-1}. </math>
The closer the approximation <math>\mu</math> to the eigenvalue is chosen, the faster the algorithm converges; however, incorrect choice of  <math>\mu</math> can lead to slow convergence or to the convergence to an eigenvector other than the one desired. In practice, the method is used when a good approximation for the eigenvalue is known, and hence one needs only few (quite often just one) iterations.

== Theory and convergence ==

The basic idea of the [[power iteration]] is choosing an initial vector <math>b</math> (either an [[eigenvector]] approximation or a [[random]] vector) and iteratively calculating <math>Ab, A^{2}b, A^{3}b,...</math>.  Except for a set of zero [[Measure (mathematics)|measure]], for any initial vector, the result will converge to an [[eigenvector]] corresponding to the dominant [[eigenvalue]].

The inverse iteration does the same for the matrix  <math>(A - \mu I)^{-1}</math>, so it converges to the eigenvector corresponding to the dominant eigenvalue of the matrix <math>(A - \mu I)^{-1}</math>. 
Eigenvalues of this matrix are  <math>(\lambda_1 - \mu)^{-1},...,(\lambda_n - \mu)^{-1}, </math> where  <math> \lambda_i </math> are eigenvalues of <math>A</math>.
The largest of these numbers corresponds to the smallest of <math>(\lambda_1 - \mu),...,(\lambda_n - \mu). </math> The eigenvectors of <math>A</math> and of <math>(A - \mu I)^{-1}</math> are the same, since

<math display="block">
Av=\lambda v \Leftrightarrow
(A-\mu I)v = \lambda v - \mu v \Leftrightarrow
(\lambda - \mu)^{-1} v = (A-\mu I)^{-1} v
</math>

'''Conclusion''': The method converges to the eigenvector of the matrix <math>A</math> corresponding to the closest eigenvalue to  <math>\mu .</math>

In particular, taking <math>\mu=0</math> we see that  <math>(A)^{-1}b_k </math>
converges to the eigenvector corresponding to the eigenvalue of <math>A^{-1}</math> with the largest magnitude <math>\frac{1}{\lambda _N}</math> and thus can be used to determine the smallest magnitude eigenvalue of <math>A</math> since they are inversely related.

=== Speed of  convergence ===

Let us analyze the [[rate of convergence]] of the method.

The [[power method]] is known to [[Rate of convergence#Convergence speed for iterative methods|converge linearly]] to the limit, more precisely:

<math display="block"> \mathrm{Distance}( b^\mathrm{ideal},  b^{k}_\mathrm{Power~Method})=O \left(   \left| \frac{\lambda_\mathrm{subdominant} }{\lambda_\mathrm{dominant} } \right|^k \right), </math>

hence for the inverse iteration method similar result sounds as:

<math display="block"> \mathrm{Distance}( b^\mathrm{ideal},  b^{k}_\mathrm{Inverse~iteration})=O \left(   \left| \frac{\mu -\lambda_{\mathrm{closest~ to~ }\mu}   }{\mu - \lambda_{\mathrm{second~ closest~ to~} \mu} } \right|^k \right). </math>

This is a key formula for understanding the method's convergence. It shows that if <math>\mu</math> is chosen close enough to some  eigenvalue  <math>\lambda </math>, for example <math> \mu- \lambda = \epsilon </math> each iteration will improve the accuracy <math> |\epsilon| /|\lambda +\epsilon - \lambda_{\mathrm{closest~ to~} \lambda} |  </math>  times. (We use that for small enough <math>\epsilon</math> "closest to <math>\mu</math>" and "closest to <math>\lambda </math>" is the same.) For small enough  <math> |\epsilon|</math> it is approximately the same as  <math> |\epsilon| /|\lambda  - \lambda_{\text{closest to } \lambda}|  </math>. Hence if one is able to find <math>\mu </math>, such that the <math> \epsilon </math> will be small enough, then very few iterations may be satisfactory.

=== Complexity ===

The inverse iteration algorithm requires solving a [[System of linear equations|linear system]] or calculation of the inverse matrix.
For non-structured matrices (not sparse, not Toeplitz,...) this requires <math>O(n^{3})</math> operations.

== Implementation options ==

The method is defined by the formula:

<math display="block"> b_{k+1} = \frac{(A - \mu I)^{-1}b_k}{C_k}, </math>

There are, however, multiple options for its implementation.

=== Calculate inverse matrix or solve system of linear equations ===

We can rewrite the formula in the following way:
<math display="block"> (A - \mu I) b_{k+1} = \frac{b_k}{C_k}, </math>

emphasizing that to find the next approximation  <math> b_{k+1} </math> we may solve a system of linear equations. 
There are two options: one may choose an algorithm that solves a linear system, or one may calculate the inverse <math>(A - \mu I)^{-1}</math> and then apply it to the vector.
Both options have complexity ''O''(''n''<sup>3</sup>), the exact number depends on the chosen method.

The choice depends also on the number of iterations. Naively, if at each iteration one solves a linear system, the complexity will be ''k''&thinsp;''O''(''n''<sup>3</sup>), where ''k'' is number of iterations; similarly, calculating the inverse matrix and applying it at each iteration is of complexity ''k''&thinsp;''O''(''n''<sup>3</sup>).
Note, however, that if the eigenvalue estimate <math>\mu</math> remains constant, then we may reduce the complexity to ''O''(''n''<sup>3</sup>) + ''k''&thinsp;''O''(''n''<sup>2</sup>) with either method.
Calculating the inverse matrix once, and storing it to apply at each iteration is of complexity ''O''(''n''<sup>3</sup>) + ''k''&thinsp;''O''(''n''<sup>2</sup>).
Storing an [[LU decomposition]] of <math>(A - \mu I)</math> and using [[Triangular matrix#Forward and back substitution|forward and back substitution]] to solve the system of equations at each iteration is also of complexity ''O''(''n''<sup>3</sup>) + ''k''&thinsp;''O''(''n''<sup>2</sup>).

Inverting the matrix will typically have a greater initial cost, but lower cost at each iteration. Conversely, solving systems of linear equations will typically have a lesser initial cost, but require more operations for each iteration.

=== Tridiagonalization, [[Hessenberg form]] ===

If it is necessary to perform many iterations (or few iterations, but for many eigenvectors), then it might be wise to bring the matrix to the 
upper [[Hessenberg form]] first (for symmetric matrix this will be [[tridiagonal matrix|tridiagonal form]]).  Which costs <math display="inline">\frac{10}{3} n^3 + O(n^2)</math> arithmetic operations using a technique based on [[Householder transformation|Householder reduction]]), with a finite sequence of orthogonal similarity transforms, somewhat like a two-sided QR decomposition.<ref name=Demmel>{{citation
 | last = Demmel | first = James W. | authorlink = James Demmel
 | mr = 1463942
 | isbn = 0-89871-389-7
 | location = Philadelphia, PA
 | publisher = [[Society for Industrial and Applied Mathematics]]
 | title = Applied Numerical Linear Algebra
 | year = 1997}}.</ref><ref name=Trefethen>Lloyd N. Trefethen and David Bau, ''Numerical Linear Algebra'' (SIAM, 1997).</ref>  (For QR decomposition, the Householder rotations are multiplied only on the left, but for the Hessenberg case they are multiplied on both left and right.) For  [[symmetric matrix|symmetric matrices]] this procedure costs <math display="inline">\frac{4}{3} n^3 + O(n^2)</math> arithmetic operations using  a technique based on Householder reduction.<ref name=Demmel/><ref name=Trefethen/>

Solution of the system of linear equations for the [[tridiagonal matrix]] costs <math>O(n)</math> operations, so the complexity grows like <math>O(n^3) + kO(n)</math>, where <math>k</math> is the iteration number, which is better than for the direct inversion. However, for few iterations such transformation may not be practical.

Also transformation to the [[Hessenberg form]] involves square roots and the division operation, which are not universally supported by hardware.

=== Choice of the normalization constant {{math|''C<sub>k</sub>''}} ===

On general purpose processors (e.g. produced by Intel) the execution time of addition, multiplication and division is approximately equal. But on embedded and/or low energy consuming hardware ([[digital signal processor]]s, [[FPGA]], [[Application-specific integrated circuit|ASIC]]) division may not be supported by hardware, and so should be avoided. 
Choosing <math>C_k = 2^{n_k}</math> allows fast division without explicit hardware support, as division by a power of 2 may be implemented as either a [[Bitwise operation#Bit shifts|bit shift]] (for [[fixed-point arithmetic]]) or subtraction of <math>k</math> from the exponent (for [[Floating point|floating-point arithmetic]]).

When implementing the algorithm using [[fixed-point arithmetic]], the choice of the constant <math>C_k</math> is especially important. Small values will lead to fast growth of the norm of <math>b_k</math> and to [[Integer overflow|overflow]]; large values of <math>C_k</math> will cause the vector <math>b_k</math> to tend toward zero.

== Usage ==

The main application of the method is the situation when an approximation to an eigenvalue is found and one needs to find the corresponding approximate eigenvector. In such a situation the inverse iteration is the main and probably the only method to use.

=== Methods to find approximate eigenvalues ===

Typically, the method is used in combination with some other method which finds approximate eigenvalues: the standard example is the [[bisection eigenvalue algorithm]], another example is the [[Rayleigh quotient iteration]], which is actually the same inverse iteration with the choice of the approximate eigenvalue as the [[Rayleigh quotient]] corresponding to the vector obtained on the previous step of the iteration.

There are some situations where the method can be used by itself, however they are quite marginal.

=== Norm of matrix as approximation to the ''dominant'' eigenvalue ===

The dominant eigenvalue can be easily estimated for any matrix. 
For any [[Matrix norm#Induced norm|induced norm]] it is true that <math>\left \| A \right \| \ge |\lambda| , </math> for any eigenvalue <math>\lambda</math>. 
So taking the norm of the matrix as an approximate eigenvalue one can see that the method will converge to the dominant eigenvector.

=== Estimates based on statistics ===

In some real-time applications one needs to find  eigenvectors  for matrices with  a speed of millions of matrices per second. In such applications, typically the statistics of matrices is known in advance and one can take as an approximate eigenvalue the average eigenvalue for some large matrix sample.
Better, one may calculate the mean ratio of the eigenvalues to the trace or the norm of the  matrix and estimate the average eigenvalue as the trace or norm multiplied by the average value of that ratio.  Clearly such a method can be used only with discretion and only when high precision is not critical. 
This approach of estimating an average eigenvalue can be combined with other methods to avoid excessively large error.

== See also ==
*[[Power iteration]]
*[[Rayleigh quotient iteration]]
* [[List of numerical analysis topics#Eigenvalue algorithms|List of eigenvalue algorithms]]

==References==
{{reflist}}

{{Numerical linear algebra}}

[[Category:Numerical linear algebra]]