Editing Orthogonal matrix (section)

==Numerical linear algebra==

===Benefits===
[[Numerical analysis]] takes advantage of many of the properties of orthogonal matrices for numerical linear algebra, and they arise naturally. For example, it is often desirable to compute an orthonormal basis for a space, or an orthogonal change of bases; both take the form of orthogonal matrices. Having determinant ±1 and all eigenvalues of magnitude 1 is of great benefit for [[numeric stability]]. One implication is that the [[condition number]] is 1 (which is the minimum), so errors are not magnified when multiplying with an orthogonal matrix. Many algorithms use orthogonal matrices like Householder reflections and [[Givens rotation]]s for this reason. It is also helpful that, not only is an orthogonal matrix invertible, but its inverse is available essentially free, by exchanging indices.

Permutations are essential to the success of many algorithms, including the workhorse [[Gaussian elimination]] with [[Pivot element#Partial and complete pivoting|partial pivoting]] (where permutations do the pivoting). However, they rarely appear explicitly as matrices; their special form allows more efficient representation, such as a list of {{mvar|n}} indices.

Likewise, algorithms using Householder and Givens matrices typically use specialized methods of multiplication and storage. For example, a Givens rotation affects only two rows of a matrix it multiplies, changing a full [[matrix multiplication|multiplication]] of order {{math|''n''<sup>3</sup>}} to a much more efficient order {{mvar|n}}. When uses of these reflections and rotations introduce zeros in a matrix, the space vacated is enough to store sufficient data to reproduce the transform, and to do so robustly. (Following {{harvtxt|Stewart|1976}}, we do ''not'' store a rotation angle, which is both expensive and badly behaved.)

===Decompositions===
A number of important [[matrix decomposition]]s {{harv|Golub|Van Loan|1996}} involve orthogonal matrices, including especially:

;[[QR decomposition|{{mvar|QR}} decomposition]] : {{math|1=''M'' = ''QR''}}, {{mvar|Q}} orthogonal, {{mvar|R}} upper triangular
;[[Singular value decomposition]] : {{math|1=''M'' = ''U''Σ''V''<sup>T</sup>}}, {{mvar|U}} and {{mvar|V}} orthogonal, {{math|Σ}} diagonal matrix
;[[Eigendecomposition of a matrix|Eigendecomposition of a symmetric matrix]] (decomposition according to the [[spectral theorem]]) : {{math|1=''S'' = ''Q''Λ''Q''<sup>T</sup>}}, {{mvar|S}} symmetric, {{mvar|Q}} orthogonal, {{math|Λ}} diagonal
;[[Polar decomposition]] : {{math|1=''M'' = ''QS''}}, {{mvar|Q}} orthogonal, {{mvar|S}} symmetric positive-semidefinite

====Examples====
Consider an [[overdetermined system of linear equations]], as might occur with repeated measurements of a physical phenomenon to compensate for experimental errors. Write {{math|1=''A'''''x''' = '''b'''}}, where {{mvar|A}} is {{math|''m'' × ''n''}}, {{math|''m'' > ''n''}}.
A {{mvar|QR}} decomposition reduces {{mvar|A}} to upper triangular {{mvar|R}}. For example, if {{mvar|A}} is {{nowrap|5 × 3}} then {{mvar|R}} has the form
<math display="block">R = \begin{bmatrix}
\cdot & \cdot & \cdot \\
0 & \cdot & \cdot \\
0 & 0 & \cdot \\
0 & 0 & 0 \\
0 & 0 & 0
\end{bmatrix}.</math>

The [[linear least squares (mathematics)|linear least squares]] problem is to find the {{math|'''x'''}} that minimizes {{math|{{norm|''A'''''x''' − '''b'''}}}}, which is equivalent to projecting {{math|'''b'''}} to the subspace spanned by the columns of {{mvar|A}}. Assuming the columns of {{mvar|A}} (and hence {{mvar|R}}) are independent, the projection solution is found from {{math|1=''A''<sup>T</sup>''A'''''x''' = ''A''<sup>T</sup>'''b'''}}. Now {{math|''A''<sup>T</sup>''A''}} is square ({{math|''n'' × ''n''}}) and invertible, and also equal to {{math|''R''<sup>T</sup>''R''}}. But the lower rows of zeros in {{mvar|R}} are superfluous in the product, which is thus already in lower-triangular upper-triangular factored form, as in [[Gaussian elimination]] ([[Cholesky decomposition]]). Here orthogonality is important not only for reducing {{math|1=''A''<sup>T</sup>''A'' = (''R''<sup>T</sup>''Q''<sup>T</sup>)''QR''}} to {{math|''R''<sup>T</sup>''R''}}, but also for allowing solution without magnifying numerical problems.

In the case of a linear system which is underdetermined, or an otherwise non-[[invertible matrix]], singular value decomposition (SVD) is equally useful. With {{mvar|A}} factored as {{math|''U''Σ''V''<sup>T</sup>}}, a satisfactory solution uses the Moore-Penrose [[pseudoinverse]], {{math|''V''Σ<sup>+</sup>''U''<sup>T</sup>}}, where {{math|Σ<sup>+</sup>}} merely replaces each non-zero diagonal entry with its reciprocal. Set {{math|'''x'''}} to {{math|''V''Σ<sup>+</sup>''U''<sup>T</sup>'''b'''}}.

The case of a square invertible matrix also holds interest. Suppose, for example, that {{mvar|A}} is a {{nowrap|3 × 3}} rotation matrix which has been computed as the composition of numerous twists and turns. Floating point does not match the mathematical ideal of real numbers, so {{mvar|A}} has gradually lost its true orthogonality. A [[Gram–Schmidt process]] could [[orthogonalization|orthogonalize]] the columns, but it is not the most reliable, nor the most efficient, nor the most invariant method. The [[polar decomposition]] factors a matrix into a pair, one of which is the unique ''closest'' orthogonal matrix to the given matrix, or one of the closest if the given matrix is singular. (Closeness can be measured by any [[matrix norm]] invariant under an orthogonal change of basis, such as the spectral norm or the Frobenius norm.) For a near-orthogonal matrix, rapid convergence to the orthogonal factor can be achieved by a "[[Newton's method]]" approach due to {{harvtxt|Higham|1986}} ([[#CITEREFHigham1990|1990]]), repeatedly averaging the matrix with its inverse transpose. {{harvtxt|Dubrulle|1999}} has published an accelerated method with a convenient convergence test.

For example, consider a non-orthogonal matrix for which the simple averaging algorithm takes seven steps
<math display="block">\begin{bmatrix}3 & 1\\7 & 5\end{bmatrix}
\rightarrow
\begin{bmatrix}1.8125 & 0.0625\\3.4375 & 2.6875\end{bmatrix}
\rightarrow \cdots \rightarrow
\begin{bmatrix}0.8 & -0.6\\0.6 & 0.8\end{bmatrix}</math>
and which acceleration trims to two steps (with {{mvar|γ}} = 0.353553, 0.565685).

<math display="block">\begin{bmatrix}3 & 1\\7 & 5\end{bmatrix}
\rightarrow
\begin{bmatrix}1.41421 & -1.06066\\1.06066 & 1.41421\end{bmatrix}
\rightarrow
\begin{bmatrix}0.8 & -0.6\\0.6 & 0.8\end{bmatrix}</math>

Gram-Schmidt yields an inferior solution, shown by a Frobenius distance of 8.28659 instead of the minimum 8.12404.

<math display="block">\begin{bmatrix}3 & 1\\7 & 5\end{bmatrix}
\rightarrow
\begin{bmatrix}0.393919 & -0.919145\\0.919145 & 0.393919\end{bmatrix}</math>

===Randomization===
Some numerical applications, such as [[Monte Carlo method]]s and exploration of high-dimensional data spaces, require generation of [[uniform distribution (continuous)|uniformly distributed]] random orthogonal matrices. In this context, "uniform" is defined in terms of [[Haar measure]], which essentially requires that the distribution not change if multiplied by any freely chosen orthogonal matrix. Orthogonalizing matrices with [[statistical independence|independent]] uniformly distributed random entries does not result in uniformly distributed orthogonal matrices{{Citation needed|date=June 2009}}, but the [[QR decomposition|{{mvar|QR}} decomposition]] of independent [[normal distribution|normally distributed]] random entries does, as long as the diagonal of {{mvar|R}} contains only positive entries {{harv|Mezzadri|2006}}. {{harvtxt|Stewart|1980}} replaced this with a more efficient idea that {{harvtxt|Diaconis|Shahshahani|1987}} later generalized as the "subgroup algorithm" (in which form it works just as well for permutations and rotations). To generate an {{math|(''n'' + 1) × (''n'' + 1)}} orthogonal matrix, take an {{math|''n'' × ''n''}} one and a uniformly distributed unit vector of dimension {{nowrap|''n'' + 1}}. Construct a Householder reflection from the vector, then apply it to the smaller matrix (embedded in the larger size with a 1 at the bottom right corner).

===Nearest orthogonal matrix===

The problem of finding the orthogonal matrix {{mvar|Q}} nearest a given matrix {{mvar|M}} is related to the [[Orthogonal Procrustes problem]]. There are several different ways to get the unique solution, the simplest of which is taking the [[singular value decomposition]] of {{mvar|M}} and replacing the singular values with ones. Another method expresses the {{mvar|R}} explicitly but requires the use of a [[matrix square root]]:<ref>[http://people.csail.mit.edu/bkph/articles/Nearest_Orthonormal_Matrix.pdf "Finding the Nearest Orthonormal Matrix"], [[Berthold K.P. Horn]], [[MIT]].</ref>
<math display="block">Q = M \left(M^\mathrm{T} M\right)^{-\frac 1 2}</math>

This may be combined with the Babylonian method for extracting the square root of a matrix to give a recurrence which converges to an orthogonal matrix quadratically:
<math display="block">Q_{n + 1} = 2 M \left(Q_n^{-1} M + M^\mathrm{T} Q_n\right)^{-1}</math>
where {{math|1=''Q''<sub>0</sub> = ''M''}}.

These iterations are stable provided the [[condition number]] of {{mvar|M}} is less than three.<ref>[http://www.maths.manchester.ac.uk/~nareports/narep91.pdf "Newton's Method for the Matrix Square Root"] {{Webarchive|url=https://web.archive.org/web/20110929131330/http://www.maths.manchester.ac.uk/~nareports/narep91.pdf |date=2011-09-29 }}, Nicholas J. Higham, Mathematics of Computation, Volume 46, Number 174, 1986.</ref>

Using a first-order approximation of the inverse and the same initialization results in the modified iteration:

<math display="block">N_{n} = Q_n^\mathrm{T} Q_n</math>
<math display="block">P_{n} = \frac 1 2 Q_n N_{n}</math>
<math display="block">Q_{n + 1} = 2 Q_n + P_n N_n - 3 P_n</math>