Editing QR decomposition (section)

===Using Householder reflections===

{{see also|Householder transformation}}

[[File:Householder.svg|thumb|Householder reflection for QR-decomposition: The goal is to find a linear transformation that changes the vector <math>\mathbf x</math> into a vector of the same length which is collinear to <math>\mathbf e_1</math>. We could use an orthogonal projection (Gram-Schmidt) but this will be numerically unstable if the vectors <math>\mathbf x</math> and <math>\mathbf e_1</math> are close to orthogonal. Instead, the Householder reflection reflects through the dotted line (chosen to bisect the angle between <math>\mathbf x</math> and {{nowrap|<math>\mathbf e_1</math>).}} The maximum angle with this transform is 45 degrees.]]

A Householder reflection (or ''Householder transformation'') is a transformation that takes a vector and reflects it about some [[plane (mathematics)|plane]] or [[hyperplane]]. We can use this operation to calculate the ''QR'' factorization of an ''m''-by-''n'' matrix <math>A</math> with {{nowrap|''m'' ≥ ''n''}}.

''Q'' can be used to reflect a vector in such a way that all coordinates but one disappear.

Let <math>\mathbf{x}</math> be an arbitrary real ''m''-dimensional column vector of <math>A</math> such that <math>\|\mathbf{x}\| = |\alpha|</math> for a scalar ''α''. If the algorithm is implemented using [[floating-point arithmetic]], then ''α'' should get the opposite sign as the ''k''-th coordinate of {{nowrap|<math>\mathbf{x}</math>,}} where <math>x_k</math> is to be the pivot coordinate after which all entries are 0 in matrix ''A''{{'}}s final upper triangular form, to avoid [[loss of significance]]. In the complex case, set<ref>{{citation | first1=Josef | last1=Stoer | first2=Roland | last2=Bulirsch | year=2002 | title=Introduction to Numerical Analysis | edition=3rd | publisher=Springer | isbn=0-387-95452-X |page=225}}</ref>
:<math>\alpha = -e^{i \arg x_k} \|\mathbf{x}\|</math>
and substitute transposition by conjugate transposition in the construction of ''Q'' below.

Then, where <math>\mathbf{e}_1</math> is the vector {{math|[1 0 ⋯ 0]<sup>T</sup>}}, {{math|{{!}}{{!}} · {{!}}{{!}}}} is the [[Euclidean space#Euclidean norm|Euclidean norm]] and <math>I</math> is an {{math|''m''×''m''}} identity matrix, set
: <math>\begin{align}
  \mathbf{u} &= \mathbf{x} - \alpha\mathbf{e}_1, \\
  \mathbf{v} &= \frac{\mathbf{u}}{\|\mathbf{u}\|}, \\
           Q &= I - 2 \mathbf{v}\mathbf{v}^\textsf{T}.
\end{align}</math>

Or, if <math>A</math> is complex
: <math>Q = I - 2\mathbf{v}\mathbf{v}^\dagger.</math>

<math>Q</math> is an ''m''-by-''m'' Householder matrix, which is both symmetric and orthogonal (Hermitian and unitary in the complex case), and
: <math>Q\mathbf{x} = \begin{bmatrix} \alpha \\ 0 \\ \vdots \\ 0 \end{bmatrix}.</math>

This can be used to gradually transform an ''m''-by-''n'' matrix ''A'' to upper [[Triangular matrix|triangular]] form. First, we multiply ''A'' with the Householder matrix ''Q''<sub>1</sub> we obtain when we choose the first matrix column for '''x'''. This results in a matrix ''Q''<sub>1</sub>''A'' with zeros in the left column (except for the first row).
: <math>Q_1A = \begin{bmatrix}
  \alpha_1 & \star & \cdots & \star \\
         0 &       &        &       \\
    \vdots &       &     A' &       \\
         0 &       &        &
\end{bmatrix}</math>

This can be repeated for ''A''′ (obtained from ''Q''<sub>1</sub>''A'' by deleting the first row and first column), resulting in a Householder matrix ''Q''′<sub>2</sub>. Note that ''Q''′<sub>2</sub> is smaller than ''Q''<sub>1</sub>. Since we want it really to operate on ''Q''<sub>1</sub>''A'' instead of ''A''′ we need to expand it to the upper left, filling in a 1, or in general:
:<math>Q_k = \begin{bmatrix}
  I_{k-1} & 0    \\
       0  & Q_k'
\end{bmatrix}.</math>

After <math>t</math> iterations of this process, {{nowrap|<math>t = \min(m - 1, n)</math>,}}
:<math>R = Q_t \cdots Q_2 Q_1 A</math>

is an upper triangular matrix. So, with
:<math>\begin{align}
Q^\textsf{T} &= Q_t \cdots Q_2 Q_1, \\
Q &= Q_1^\textsf{T} Q_2^\textsf{T} \cdots Q_t^\textsf{T}
\end{align}</math>

<math>A = QR</math> is a QR decomposition of <math>A</math>.

This method has greater [[numerical stability]] than the Gram–Schmidt method above.<!--See the below example, and compare above-->

In numerical tests the computed factors <math>Q_c</math> and <math>R_c</math> satisfy
<math>\frac{\|Q R - Q_c R_c\|_\infty}{\|A\|_\infty} = O(\varepsilon)</math>
at machine precision. Also, orthogonality is preserved: <math>\|Q_c^\mathsf{T} Q_c - I\|_\infty = O(\varepsilon)</math>. However, the accuracy of <math>Q_c</math> and <math>R_c</math> decrease with condition number:
<math>\|Q - Q_c\|_\infty = O(\varepsilon\,\kappa_\infty(A)),\quad
\frac{\|R - R_c\|_\infty}{\|R\|_\infty} = O(\varepsilon\,\kappa_\infty(A)).</math>

For a well-conditioned example (<math>n=4000</math>, <math>\kappa_\infty(A)\approx3\times10^{3}</math>):
<math>\frac{\|Q R - Q_c R_c\|_\infty}{\|A\|_\infty} \approx 1.6\times10^{-15},</math>
<math>\|Q - Q_c\|_\infty \approx 1.6\times10^{-15},</math>
<math>\frac{\|R - R_c\|_\infty}{\|R\|_\infty} \approx 4.3\times10^{-14},</math>
<math>\|Q_c^\mathsf{T}Q_c - I\|_\infty \approx 1.1\times10^{-13}.</math>

In an ill-conditioned test (<math>n=4000</math>, <math>\kappa_\infty(A)\approx4\times10^{18}</math>):
<math>\frac{\|Q R - Q_c R_c\|_\infty}{\|A\|_\infty} \approx 1.3\times10^{-15},</math>
<math>\|Q - Q_c\|_\infty \approx 5.2\times10^{-4},</math>
<math>\frac{\|R - R_c\|_\infty}{\|R\|_\infty} \approx 1.2\times10^{-4},</math>
<math>\|Q_c^\mathsf{T}Q_c - I\|_\infty \approx 1.1\times10^{-13}.</math><ref>{{Cite book | author1=Holmes, M.| title=Introduction to Scientific Computing and Data Analysis, 2nd Ed | year=2023 | publisher=Springer | isbn=978-3-031-22429-4}} </ref>

The following table gives the number of operations in the ''k''-th step of the QR-decomposition by the Householder transformation, assuming a square matrix with size ''n''.
{| class="wikitable"
|-
! Operation
! Number of operations in the ''k''-th step
|-
| Multiplications
| <math>2(n - k + 1)^2</math>
|-
| Additions
| <math>(n - k + 1)^2 + (n - k + 1)(n - k) + 2 </math>
|-
| Division
| <math>1</math>
|-
| Square root
| <math>1</math>
|}

Summing these numbers over the {{nowrap|''n'' − 1}} steps (for a square matrix of size ''n''), the complexity of the algorithm (in terms of floating point multiplications) is given by
:<math>\frac{2}{3}n^3 + n^2 + \frac{1}{3}n - 2 = O\left(n^3\right).</math>

====Example====
Let us calculate the decomposition of
: <math>A = \begin{bmatrix}
  12 & -51 &   4 \\
   6 & 167 & -68 \\
  -4 &  24 & -41
\end{bmatrix}.</math>

First, we need to find a reflection that transforms the first column of matrix ''A'', vector {{nowrap|<math>\mathbf{a}_1 = \begin{bmatrix} 12 & 6 & -4 \end{bmatrix}^\textsf{T}</math>,}} into {{nowrap|<math>\left\|\mathbf{a}_1\right\| \mathbf{e}_1 = \begin{bmatrix} \alpha & 0 & 0\end{bmatrix}^\textsf{T}</math>.}}

Now,
: <math>\mathbf{u} = \mathbf{x} - \alpha\mathbf{e}_1,</math>

and
: <math>\mathbf{v} = \frac{\mathbf{u}}{\|\mathbf{u}\|}.</math>

Here,
: <math>\alpha = 14</math> and <math>\mathbf{x} = \mathbf{a}_1 = \begin{bmatrix} 12 & 6 & -4 \end{bmatrix}^\textsf{T}</math>

Therefore
: <math>\mathbf{u} = \begin{bmatrix} -2 & 6 & -4 \end{bmatrix}^\textsf{T} = 2 \begin{bmatrix} -1 & 3 & -2 \end{bmatrix}^\textsf{T}</math> and {{nowrap|<math>\mathbf{v} = \frac{1}{\sqrt{14}}\begin{bmatrix} -1 & 3 & -2 \end{bmatrix}^\textsf{T}</math>,}} and then
: <math>\begin{align}
      Q_1
  ={} &I - \frac{2}{\sqrt{14}\sqrt{14}}
         \begin{bmatrix} -1 \\ 3 \\ -2 \end{bmatrix}
         \begin{bmatrix} -1 &  3 &  -2 \end{bmatrix} \\
  ={} &I - \frac{1}{7}\begin{bmatrix}
          1 & -3 &  2 \\
         -3 &  9 & -6 \\
          2 & -6 &  4
       \end{bmatrix} \\
  ={} &\begin{bmatrix}
          6/7 &  3/7 & -2/7 \\
          3/7 & -2/7 &  6/7 \\
         -2/7 &  6/7 &  3/7 \\
       \end{bmatrix}.
\end{align}</math>

Now observe:
:<math>Q_1A = \begin{bmatrix}
  14 &  21 & -14 \\
   0 & -49 & -14 \\
   0 & 168 & -77
\end{bmatrix},</math>

so we already have almost a triangular matrix. We only need to zero the (3, 2) entry.

Take the (1, 1) [[minor (linear algebra)|minor]], and then apply the process again to
:<math>A' = M_{11} = \begin{bmatrix}
  -49 & -14 \\
  168 & -77
\end{bmatrix}.</math>

By the same method as above, we obtain the matrix of the Householder transformation
:<math>Q_2 = \begin{bmatrix}
  1 &     0 &  0 \\
  0 & -7/25 & 24/25 \\
  0 & 24/25 &  7/25
\end{bmatrix}</math>

after performing a direct sum with 1 to make sure the next step in the process works properly.

Now, we find
:<math>Q = Q_1^\textsf{T} Q_2^\textsf{T} = \begin{bmatrix}
   6/7 & -69/175 & 58/175 \\
   3/7 & 158/175 & -6/175 \\
  -2/7 &   6/35  & 33/35
\end{bmatrix}. </math>

Or, to four decimal digits,
:<math>\begin{align}
  Q &= Q_1^\textsf{T} Q_2^\textsf{T} = \begin{bmatrix}
     0.8571 & -0.3943 &  0.3314 \\
     0.4286 &  0.9029 & -0.0343 \\
    -0.2857 &  0.1714 &  0.9429
  \end{bmatrix} \\
  R &= Q_2 Q_1 A = Q^\textsf{T} A = \begin{bmatrix}
    14 &  21 & -14 \\
     0 & 175 & -70 \\
     0 &   0 & -35
  \end{bmatrix}.
\end{align}</math>

The matrix ''Q'' is orthogonal and ''R'' is upper triangular, so {{nowrap|1=''A'' = ''QR''}} is the required QR decomposition.

====Advantages and disadvantages====

The use of Householder transformations is inherently the most simple of the numerically stable QR decomposition algorithms due to the use of reflections as the mechanism for producing zeroes in the ''R'' matrix. However, the Householder reflection algorithm is bandwidth heavy and difficult to parallelize, as every reflection that produces a new zero element changes the entirety of both ''Q'' and ''R'' matrices.

====Parallel implementation of Householder QR====

The Householder QR method can be implemented in parallel with algorithms such as the TSQR algorithm (which stands for ''Tall Skinny QR''). This algorithm can be applied in the case when the matrix ''A'' has ''m >> n''.<ref>Communication-optimal parallel and sequential QR and LU factorizations: theory and practice, James Demmel and Laura Grigori, 2008,
https://arxiv.org/abs/0806.2159, 
</ref> This algorithm uses a binary reduction tree to compute local householder QR decomposition at each node in the forward pass, and re-constitute the Q matrix in the backward pass. The [[binary tree]] structure aims at decreasing the amount of communication between processor to increase performance.