Editing QR algorithm (section)

==The practical QR algorithm==
Formally, let {{math|''A''}} be a real matrix of which we want to compute the eigenvalues, and let {{math|1=''A''<sub>0</sub> := ''A''}}. At the {{mvar|k}}-th step (starting with {{math|1=''k'' = 0}}), we compute the [[QR decomposition]] {{math|1=''A''<sub>''k''</sub> = ''Q''<sub>''k''</sub>&thinsp;''R''<sub>''k''</sub>}} where {{math|''Q''<sub>''k''</sub>}} is an [[orthogonal matrix]] (i.e., {{math|1=''Q''<sup>T</sup> = ''Q''<sup>−1</sup>}}) and {{math|''R''<sub>''k''</sub>}} is an upper triangular matrix. We then form {{math|1=''A''<sub>''k''+1</sub> = ''R''<sub>''k''</sub>&thinsp;''Q''<sub>''k''</sub>}}. Note that
<math display="block"> A_{k+1} = R_k Q_k = Q_k^{-1} Q_k R_k Q_k = Q_k^{-1} A_k Q_k = Q_k^{\mathsf{T}} A_k Q_k, </math>
so all the {{math|''A''<sub>''k''</sub>}} are [[Similar matrix|similar]] and hence they have the same eigenvalues. The algorithm is [[numerical stability|numerically stable]] because it proceeds by ''orthogonal'' similarity transforms.

Under certain conditions,<ref name="golubvanloan">{{cite book |last1=Golub |first1=G. H. |last2=Van Loan |first2=C. F. |title=Matrix Computations |edition=3rd |publisher=Johns Hopkins University Press |location=Baltimore |year=1996 |isbn=0-8018-5414-8 }}</ref> the matrices ''A''<sub>''k''</sub> converge to a triangular matrix, the [[Schur form]] of ''A''. The eigenvalues of a triangular matrix are listed on the diagonal, and the eigenvalue problem is solved. In testing for convergence it is impractical to require exact zeros,{{citation needed|date=July 2020}} but the [[Gershgorin circle theorem]] provides a bound on the error.

If the matrices converge, then the eigenvalues along the diagonal will appear according to their geometric multiplicity. To guarantee convergence, A must be a symmetric matrix, and for all non zero eigenvalues <math>\lambda</math> there must not be a corresponding eigenvalue <math>-\lambda</math>.<ref>{{Cite book |last=Holmes |first=Mark H. |title=Introduction to scientific computing and data analysis |date=2023 |publisher=Springer |isbn=978-3-031-22429-4 |edition=Second |series=Texts in computational science and engineering |location=Cham}}</ref> Due to the fact that a single QR iteration has a cost of <math>\mathcal{O}(n^3)</math> and the convergence is linear, the standard QR algorithm is extremely expensive to compute, especially considering it is not guaranteed to converge.<ref>{{Cite book |last=Golub |first=Gene H. |title=Matrix computations |last2=Van Loan |first2=Charles F. |date=2013 |publisher=The Johns Hopkins University Press |isbn=978-1-4214-0794-4 |edition=Fourth |series=Johns Hopkins studies in the mathematical sciences |location=Baltimore}}</ref>

===Using Hessenberg form===
In the above crude form the iterations are relatively expensive. This can be mitigated by first bringing the matrix {{mvar|A}} to upper [[Hessenberg form]] (which costs <math display="inline">\tfrac{10}{3} n^3 + \mathcal{O}(n^2)</math> arithmetic operations using a technique based on [[Householder transformation|Householder reduction]]), with a finite sequence of orthogonal similarity transforms, somewhat like a two-sided QR decomposition.<ref name=Demmel>{{cite book |first=James W. |last=Demmel |author-link=James W. Demmel |title=Applied Numerical Linear Algebra |publisher=SIAM |year=1997 }}</ref><ref name=Trefethen>{{cite book |first1=Lloyd N. |last1=Trefethen |author-link=Lloyd N. Trefethen |first2=David |last2=Bau |title=Numerical Linear Algebra |publisher=SIAM |year=1997 }}</ref>  (For QR decomposition, the Householder reflectors are multiplied only on the left, but for the Hessenberg case they are multiplied on both left and right.) Determining the QR decomposition of an upper Hessenberg matrix costs <math display="inline">6 n^2 + \mathcal{O}(n)</math> arithmetic operations.  Moreover, because the Hessenberg form is already nearly upper-triangular (it has just one nonzero entry below each diagonal), using it as a starting point reduces the number of steps required for convergence of the QR algorithm.

If the original matrix is [[symmetric matrix|symmetric]], then the upper Hessenberg matrix is also symmetric and thus [[tridiagonal matrix|tridiagonal]], and so are all the {{math|''A''<sub>''k''</sub>}}. In this case reaching Hessenberg form costs <math display="inline">\tfrac{4}{3} n^3 + \mathcal{O}(n^2)</math> arithmetic operations using a technique based on Householder reduction.<ref name=Demmel/><ref name=Trefethen/> Determining the QR decomposition of a symmetric tridiagonal matrix costs <math>\mathcal{O}(n)</math> operations.<ref>{{cite journal |first1=James M. |last1=Ortega |first2=Henry F. |last2=Kaiser |title=The ''LL<sup>T</sup>'' and ''QR'' methods for symmetric tridiagonal matrices |journal=The Computer Journal |volume=6 |issue=1 |pages=99–101 |year=1963 |doi=10.1093/comjnl/6.1.99 |doi-access=free }}</ref>

===Iteration phase===
If a Hessenberg matrix <math>A</math> has element <math> a_{k,k-1} = 0 </math> for some <math>k</math>, i.e., if one of the elements just below the diagonal is in fact zero, then it decomposes into blocks whose eigenproblems may be solved separately; an eigenvalue is either an eigenvalue of the submatrix of the first <math>k-1</math> rows and columns, or an eigenvalue of the submatrix of remaining rows and columns. The purpose of the QR iteration step is to shrink one of these <math>a_{k,k-1}</math> elements so that effectively a small block along the diagonal is split off from the bulk of the matrix. In the case of a real eigenvalue that is usually the <math> 1 \times 1 </math> block in the lower right corner (in which case element <math> a_{nn} </math> holds that eigenvalue), whereas in the case of a pair of conjugate complex eigenvalues it is the <math> 2 \times 2 </math> block in the lower right corner.

The [[rate of convergence]] depends on the separation between eigenvalues, so a practical algorithm will use shifts, either explicit or implicit, to increase separation and accelerate convergence. A typical symmetric QR algorithm isolates each eigenvalue (then reduces the size of the matrix) with only one or two iterations, making it efficient as well as robust.{{clarify|date=June 2012}}

====A single iteration with explicit shift====
The steps of a QR iteration with explicit shift on a real Hessenberg matrix <math>A</math> are:
# Pick a shift <math>\mu</math> and subtract it from all diagonal elements, producing the matrix <math> A - \mu I </math>. A basic strategy is to use <math> \mu = a_{n,n} </math>, but there are more refined strategies that would further accelerate convergence. The idea is that <math> \mu </math> should be close to an eigenvalue, since making this shift will accelerate convergence to that eigenvalue.
# Perform a sequence of [[Givens rotation]]s <math> G_1, G_2, \dots, G_{n-1} </math> on <math> A - \mu I </math>, where <math> G_i </math> acts on rows <math>i</math> and <math>i+1</math>, and <math> G_i </math> is chosen to zero out position <math>(i+1,i)</math> of <math> G_{i-1} \dotsb G_1 (A - \mu I) </math>. This produces the upper triangular matrix <math> R = G_{n-1} \dotsb G_1 (A - \mu I) </math>. The orthogonal factor <math> Q </math> would be <math> G_1^\mathrm{T} G_2^\mathrm{T} \dotsb G_{n-1}^\mathrm{T} </math>, but it is neither necessary nor efficient to produce that explicitly.
# Now multiply <math> R </math> by the Givens matrices <math> G_1^\mathrm{T} </math>, <math> G_2^\mathrm{T} </math>, ..., <math> G_{n-1}^\mathrm{T} </math> on the right, where <math> G_i^\mathrm{T} </math> instead acts on columns <math>i</math> and <math> i+1 </math>. This produces the matrix <math> RQ = R G_1^\mathrm{T} G_2^\mathrm{T} \dotsb G_{n-1}^\mathrm{T} </math>, which is again on Hessenberg form.
# Finally undo the shift by adding <math> \mu </math> to all diagonal entries. The result is <math> A' = RQ + \mu I </math>. Since <math>Q</math> commutes with <math>I</math>, we have that <math> A' = Q^\mathrm{T} (A-\mu I) Q + \mu I = Q^\mathrm{T} A Q </math>.
The purpose of the shift is to change which Givens rotations are chosen.

In more detail, the structure of one of these <math> G_i </math> matrices are
<math display="block">
  G_i = \begin{bmatrix}
   I & 0 & 0 & 0 \\
   0 & c & -s & 0 \\
   0 & s & c & 0 \\
   0 & 0 & 0 & I
  \end{bmatrix}
</math>
where the <math>I</math> in the upper left corner is an <math> (n-1) \times (n-1) </math> identity matrix, and the two scalars <math> c = \cos\theta </math> and <math> s = \sin\theta </math> are determined by what rotation angle <math> \theta </math> is appropriate for zeroing out position <math>(i+1,i)</math>. It is not necessary to exhibit <math> \theta </math>; the factors <math> c </math> and <math> s </math> can be determined directly from elements in the matrix <math> G_i </math> should act on. Nor is it necessary to produce the whole matrix; multiplication (from the left) by <math> G_i </math> only affects rows <math> i </math> and <math> i+1 </math>, so it is easier to just update those two rows in place. Likewise, for the Step 3 multiplication by <math> G_i^\mathrm{T} </math> from the right, it is sufficient to remember <math>i</math>, <math>c</math>, and <math>s</math>.

If using the simple <math> \mu = a_{n,n} </math> strategy, then at the beginning of Step 2 we have a matrix
<math display="block">
  A - a_{n,n} I = \begin{pmatrix}
    \times & \times & \times & \times & \times \\
    \times & \times & \times & \times & \times \\
    0 & \times & \times & \times & \times \\
    0 & 0 & \times & \times & \times \\
    0 & 0 & 0 & \times & 0
  \end{pmatrix}
</math>
where the <math>\times</math> denotes “could be whatever”. The first Givens rotation <math> G_1 </math> zeroes out the <math> (i+1,i) </math> position of this, producing
<math display="block">
  G_1 (A - a_{n,n} I) = \begin{pmatrix}
    \times & \times & \times & \times & \times \\
    0 & \times & \times & \times & \times \\
    0 & \times & \times & \times & \times \\
    0 & 0 & \times & \times & \times \\
    0 & 0 & 0 & \times & 0
  \end{pmatrix}
  \text{.}
</math>
Each new rotation zeroes out another subdiagonal element, thus increasing the number of known zeroes until we are at
<math display="block">
  H = G_{n-2} \dotsb G_1 (A - a_{n,n} I) = \begin{pmatrix}
    \times & \times & \times & \times & \times \\
    0 & \times & \times & \times & \times \\
    0 & 0 & \times & \times & \times \\
    0 & 0 & 0 & h_{n-1,n-1} & h_{n-1,n} \\
    0 & 0 & 0 & h_{n,n-1} & 0
  \end{pmatrix}
  \text{.}
</math>
The final rotation <math>G_{n-1}</math> has <math>(c,s)</math> chosen so that <math> s h_{n-1,n-1} + c h_{n,n-1} = 0 </math>. If <math> |h_{n-1,n-1}| \gg |h_{n,n-1}| </math>, as is typically the case when we approach convergence, then <math> c \approx 1 </math> and <math> |s| \ll 1 </math>. Making this rotation produces
<math display="block">
  R = G_{n-1} G_{n-2} \dotsb G_1 (A - a_{n,n} I) = \begin{pmatrix}
    \times & \times & \times & \times & \times \\
    0 & \times & \times & \times & \times \\
    0 & 0 & \times & \times & \times \\
    0 & 0 & 0 & \times & c h_{n-1,n} \\
    0 & 0 & 0 & 0 & s h_{n-1,n}
  \end{pmatrix}
  \text{,}
</math>
which is our upper triangular matrix. But now we reach Step 3, and need to start rotating data between columns. The first rotation acts on columns <math>1</math> and <math>2</math>, producing
<math display="block">
  R G_1^\mathrm{T} = \begin{pmatrix}
    \times & \times & \times & \times & \times \\
    \times & \times & \times & \times & \times \\
    0 & 0 & \times & \times & \times \\
    0 & 0 & 0 & \times & c h_{n-1,n} \\
    0 & 0 & 0 & 0 & s h_{n-1,n}
  \end{pmatrix}
  \text{.}
</math>
The expected pattern is that each rotation moves some nonzero value from the diagonal out to the subdiagonal, returning the matrix to Hessenberg form. This ends at
<math display="block">
  R G_1^\mathrm{T} \dotsb  G_{n-1}^\mathrm{T} = \begin{pmatrix}
    \times & \times & \times & \times & \times \\
    \times & \times & \times & \times & \times \\
    0 & \times & \times & \times & \times \\
    0 & 0 & \times & \times & \times \\
    0 & 0 & 0 & -s^2 h_{n-1,n} & cs h_{n-1,n}
  \end{pmatrix}
  \text{.}
</math>
Algebraically the form is unchanged, but numerically the element in position <math> (n,n-1) </math> has gotten a lot closer to zero: there used to be a factor <math> s </math> gap between it and the diagonal element above, but now the gap is more like a factor <math> s^2 </math>, and another iteration would make it factor <math> s^4 </math>; we have quadratic convergence. Practically that means <math>O(1)</math> iterations per eigenvalue suffice for convergence, and thus overall we can complete in <math> O(n) </math> QR steps, each of which does a mere <math> O(n^2) </math> arithmetic operations (or as little as <math> O(n) </math> operations, in the case that <math> A </math> is symmetric).