Moore–Penrose inverse

Template:Short description In mathematics, and in particular linear algebra, the Moore–Penrose inverse Template:Tmath of a matrix Template:Tmath, often called the pseudoinverse, is the most widely known generalization of the inverse matrix.<ref>Template:Multiref</ref> It was independently described by E. H. Moore in 1920,<ref name="Moore1920">Template:Cite journal</ref> Arne Bjerhammar in 1951,<ref name="Bjerhammar1951">Template:Cite journal</ref> and Roger Penrose in 1955.<ref name="Penrose1955">Template:Cite journal</ref> Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

A common use of the pseudoinverse is to compute a "best fit" (least squares) approximate solution to a system of linear equations that lacks an exact solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.

The pseudoinverse is defined for all rectangular matrices whose entries are real or complex numbers. Given a rectangular matrix with real or complex entries, its pseudoinverse is unique. It can be computed using the singular value decomposition. In the special case where Template:Tmath is a normal matrix (for example, a Hermitian matrix), the pseudoinverse Template:Tmath annihilates the kernel of Template:Tmath and acts as a traditional inverse of Template:Tmath on the subspace orthogonal to the kernel.

NotationEdit

In the following discussion, the following conventions are adopted.

Template:Tmath will denote one of the fields of real or complex numbers, denoted Template:Tmath, Template:Tmath, respectively. The vector space of Template:Tmath matrices over Template:Tmath is denoted by Template:Tmath.
For Template:Tmath, the transpose is denoted Template:Tmath and the Hermitian transpose (also called conjugate transpose) is denoted Template:Tmath. If <math>\mathbb{K} = \mathbb{R}</math>, then <math>A^* = A^\mathsf{T}</math>.
For Template:Tmath, Template:Tmath (standing for "range") denotes the column space (image) of Template:Tmath (the space spanned by the column vectors of Template:Tmath) and Template:Tmath denotes the kernel (null space) of Template:Tmath.
For any positive integer Template:Tmath, the Template:Tmath identity matrix is denoted Template:Tmath.

DefinitionEdit

For <math>A \in \mathbb{K}^{m\times n}</math>, a pseudoinverse of Template:Mvar is defined as a matrix Template:Tmath satisfying all of the following four criteria, known as the Moore–Penrose conditions:<ref name="Penrose1955"/><ref name="GvL1996">Template:Cite book</ref>

Template:Tmath need not be the general identity matrix, but it maps all column vectors of Template:Mvar to themselves: <math display="block">A A^+ A = \; A.</math>
Template:Tmath acts like a weak inverse: <math display="block">A^+ A A^+ = \; A^+.</math>
Template:Tmath is Hermitian: <math display="block">\left(A A^+\right)^* = \; A A^+.</math>
Template:Tmath is also Hermitian: <math display="block">\left(A^+ A\right)^* = \; A^+ A.</math>

Note that <math>A^+A</math> and <math>AA^+</math> are idempotent operators, as follows from <math>(AA^+)^2=A A^+</math> and <math>(A^+ A)^2=A^+ A</math>. More specifically, <math>A^+A</math> projects onto the image of <math>A^T</math> (equivalently, the span of the rows of <math>A</math>), and <math>AA^+</math> projects onto the image of <math>A</math> (equivalently, the span of the columns of <math>A</math>). In fact, the above four conditions are fully equivalent to <math>A^+A</math> and <math>AA^+</math> being such orthogonal projections: <math>AA^+</math> projecting onto the image of <math>A</math> implies <math>(A A^+)A=A</math>, and <math>A^+A</math> projecting onto the image of <math>A^T</math> implies <math>(A^+A)A^T=A^T</math>.

The pseudoinverse <math>A^+</math> exists for any matrix <math>A \in \mathbb{K}^{m\times n}</math>. If furthermore <math>A</math> is full rank, that is, its rank is Template:Tmath, then Template:Tmath can be given a particularly simple algebraic expression. In particular:

When Template:Tmath has linearly independent columns (equivalently, Template:Tmath is injective, and thus Template:Tmath is invertible), Template:Tmath can be computed as<math display="block">A^+ = \left(A^* A\right)^{-1} A^*.</math>This particular pseudoinverse is a left inverse, that is, <math>A^+A = I</math>.
If, on the other hand, <math>A</math> has linearly independent rows (equivalently, <math>A</math> is surjective, and thus Template:Tmath is invertible), Template:Tmath can be computed as<math display="block">A^+ = A^* \left(A A^*\right)^{-1}.</math>This is a right inverse, as <math>A A^+ = I</math>.

In the more general case, the pseudoinverse can be expressed leveraging the singular value decomposition. Any matrix can be decomposed as <math> A=UDV^*</math> for some isometries <math>U,V</math> and diagonal nonnegative real matrix <math>D</math>. The pseudoinverse can then be written as <math>A^+=V D^{+} U^*</math>, where <math>D^{+}</math> is the pseudoinverse of <math>D</math> and can be obtained by transposing the matrix and replacing the nonzero values with their multiplicative inverses.Template:Sfn That this matrix satisfies the above requirement is directly verified observing that <math>AA^+=UU^*</math> and <math>A^+ A=VV^*</math>, which are the projections onto image and support of <math>A</math>, respectively.

PropertiesEdit

Existence and uniquenessEdit

As discussed above, for any matrix Template:Tmath there is one and only one pseudoinverse Template:Tmath.<ref name="GvL1996"/>

A matrix satisfying only the first of the conditions given above, namely <math display="inline">A A^+ A = A</math>, is known as a generalized inverse. If the matrix also satisfies the second condition, namely <math display="inline">A^+ A A^+ = A^+</math>, it is called a generalized reflexive inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Basic propertiesEdit

Proofs for the properties below can be found at b:Topics in Abstract Algebra/Linear algebra.

If Template:Tmath has real entries, then so does Template:Tmath.
If Template:Tmath is invertible, its pseudoinverse is its inverse. That is, <math>A^+ = A^{-1}</math>.<ref name="SB2002">Template:Cite book.</ref>Template:Rp
The pseudoinverse of the pseudoinverse is the original matrix: <math>\left(A^+\right)^+ = A</math>.<ref name="SB2002" />Template:Rp
Pseudoinversion commutes with transposition, complex conjugation, and taking the conjugate transpose:<ref name="SB2002" />Template:Rp <math display="block">\left(A^\mathsf{T}\right)^+ = \left(A^+\right)^\mathsf{T}, \quad \left(\overline{A}\right)^+ = \overline{A^+}, \quad \left(A^*\right)^+ = \left(A^+\right)^* .</math>
The pseudoinverse of a scalar multiple of Template:Tmath is the reciprocal multiple of Template:Tmath:<math display="block">\left(\alpha A\right)^+ = \alpha^{-1} A^+</math> for Template:Tmath; otherwise, <math>\left(0 A\right)^+ = 0 A^+ = 0 A^\mathsf{T}</math>, or <math>0^+=0^\mathsf{T}</math>.
The kernel and image of the pseudoinverse coincide with those of the conjugate transpose: <math>\ker\left(A^+\right) = \ker\left(A^*\right)</math> and <math>\operatorname{ran}\left(A^+\right) = \operatorname{ran}\left(A^*\right)</math>.

IdentitiesEdit

The following identity formula can be used to cancel or expand certain subexpressions involving pseudoinverses: <math display="block"> A = {}A{}A^*{}A^{+*}{} = {}A^{+*}{}A^*{}A. </math> Equivalently, substituting <math>A^+</math> for <math>A</math> gives <math display="block"> A^+ ={}A^+{}A^{+*}{}A^*{} = {}A^*{}A^{+*}{}A^+, </math> while substituting <math>A^*</math> for <math>A</math> gives <math display="block"> A^* ={}A^*{}A{}A^+{}={}A^+{}A{}A^*. </math>

Reduction to Hermitian caseEdit

The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences: <math display="block">A^+ = \left(A^*A\right)^+ A^*,</math> <math display="block">A^+ = A^* \left(A A^*\right)^+,</math>

as Template:Tmath and Template:Tmath are Hermitian.

Pseudoinverse of productsEdit

The equality Template:Tmath does not hold in general. Rather, suppose Template:Tmath. Then the following are equivalent:<ref>Template:Cite journal</ref>

<math display="inline">(AB)^+ = B^+ A^+</math>
<math>A^+ A BB^* A^* = BB^* A^* </math> and <math>BB^+ A^* A B = A^* A B</math>
<math display="inline">\left(A^+ A BB^*\right)^* = A^+ A BB^*</math> and <math>\left(A^* A BB^+\right)^* = A^* A BB^+</math>
<math display="inline">A^+ A BB^* A^* A BB^+ = BB^* A^* A</math>
<math display="inline">A^+ A B = B (AB)^+ AB </math> and <math>BB^+ A^* = A^* A B (AB)^+</math>.

The following are sufficient conditions for Template:Tmath:

Template:Tmath has orthonormal columns (then <math>A^*A = A^+ A = I_n</math>), or
Template:Tmath has orthonormal rows (then <math>BB^* = BB^+ = I_n</math>), or
Template:Tmath has linearly independent columns (then <math>A^+ A = I</math> ) and Template:Tmath has linearly independent rows (then <math>BB^+ = I</math>), or
<math>B = A^*</math>, or
<math>B = A^+</math>.

The following is a necessary condition for Template:Tmath:

<math>(A^+ A) (BB^+) = (BB^+) (A^+ A)</math>

The fourth sufficient condition yields the equalities <math display="block">\begin{align} \left(A A^*\right)^+ &= A^{+*} A^+, \\ \left(A^* A\right)^+ &= A^+ A^{+*}. \end{align}</math>

Here is a counterexample where Template:Tmath:

<math display="block">\Biggl( \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix} \Biggr)^+ = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+ = \begin{pmatrix}

\tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} \quad \neq \quad \begin{pmatrix}
\tfrac14 & 0 \\ \tfrac14 & 0 \end{pmatrix} = \begin{pmatrix} 0 & \tfrac12 \\ 0 & \tfrac12 \end{pmatrix} \begin{pmatrix} \tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix}^+ \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+</math>

ProjectorsEdit

<math>P = A A^+</math> and <math>Q = A^+A</math> are orthogonal projection operators, that is, they are Hermitian (<math>P = P^*</math>, <math>Q = Q^*</math>) and idempotent (<math>P^2 = P</math> and <math>Q^2 = Q</math>). The following hold:

<math>PA = AQ = A</math> and <math>A^+ P = QA^+ = A^+</math>
Template:Tmath is the orthogonal projector onto the range of Template:Tmath (which equals the orthogonal complement of the kernel of Template:Tmath).
Template:Tmath is the orthogonal projector onto the range of Template:Tmath (which equals the orthogonal complement of the kernel of Template:Tmath).
<math>I - Q = I - A^+A</math> is the orthogonal projector onto the kernel of Template:Tmath.
<math>I - P = I - A A^+</math> is the orthogonal projector onto the kernel of Template:Tmath.<ref name="GvL1996"/>

The last two properties imply the following identities:

<math>A\,\ \left(I - A^+ A\right)= \left(I - A A^+\right)A\ \ = 0</math>
<math>A^*\left(I - A A^+\right) = \left(I - A^+A\right)A^* = 0</math>

Another property is the following: if Template:Tmath is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix Template:Tmath the following equation holds:<ref>Template:Cite journal</ref> <math display="block">A(BA)^+ = (BA)^+</math>

This can be proven by defining matrices <math>C = BA</math>, <math>D = A(BA)^+</math>, and checking that Template:Tmath is indeed a pseudoinverse for Template:Tmath by verifying that the defining properties of the pseudoinverse hold, when Template:Tmath is Hermitian and idempotent.

From the last property it follows that, if Template:Tmath is Hermitian and idempotent, for any matrix Template:Tmath <math display="block">(AB)^+A = (AB)^+</math>

Finally, if Template:Tmath is an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, that is, <math>A^+ = A</math>.

Geometric constructionEdit

If we view the matrix as a linear map Template:Tmath over the field Template:Tmath then Template:Tmath can be decomposed as follows. We write Template:Tmath for the direct sum, Template:Tmath for the orthogonal complement, Template:Tmath for the kernel of a map, and Template:Tmath for the image of a map. Notice that <math>\mathbb{K}^n = \left(\ker A\right)^\perp \oplus \ker A</math> and <math>\mathbb{K}^m = \operatorname{ran} A \oplus \left(\operatorname{ran} A\right)^\perp</math>. The restriction <math> A: \left(\ker A\right)^\perp \to \operatorname{ran} A</math> is then an isomorphism. This implies that Template:Tmath on Template:Tmath is the inverse of this isomorphism, and is zero on <math>\left(\operatorname{ran} A\right)^\perp .</math>

In other words: To find Template:Tmath for given Template:Tmath in Template:Tmath, first project Template:Tmath orthogonally onto the range of Template:Tmath, finding a point Template:Tmath in the range. Then form Template:Tmath, that is, find those vectors in Template:Tmath that Template:Tmath sends to Template:Tmath. This will be an affine subspace of Template:Tmath parallel to the kernel of Template:Tmath. The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer Template:Tmath we are looking for. It can be found by taking an arbitrary member of Template:Tmath and projecting it orthogonally onto the orthogonal complement of the kernel of Template:Tmath.

This description is closely related to the minimum-norm solution to a linear system.

Limit relationsEdit

The pseudoinverse are limits: <math display="block">A^+ = \lim_{\delta \searrow 0} \left(A^* A + \delta I\right)^{-1} A^* = \lim_{\delta \searrow 0} A^* \left(A A^* + \delta I\right)^{-1} </math> (see Tikhonov regularization). These limits exist even if Template:Tmath or Template:Tmath do not exist.<ref name="GvL1996"/>Template:Rp<ref>Template:Cite journal</ref>

ContinuityEdit

In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence Template:Tmath converges to the matrix Template:Tmath (in the maximum norm or Frobenius norm, say), then Template:Tmath need not converge to Template:Tmath. However, if all the matrices Template:Tmath have the same rank as Template:Tmath, Template:Tmath will converge to Template:Tmath.<ref name="rakocevic1997">Template:Cite journal</ref>

DerivativeEdit

Let <math>x \mapsto A(x)</math> be a real-valued differentiable matrix function with constant rank in a neighborhood of a point Template:Tmath. The derivative of <math>x \mapsto A^+(x)</math> at <math>x_0</math> may be calculated in terms of the derivative of <math>A</math> at <math>x_0</math>:<ref>Template:Cite journal</ref> <math display="block"> \left.\frac{\mathrm d}{\mathrm d x}\right|_{x = x_0\!\!\!\!\!\!\!} A^+ = -A^+ \left( \frac{\mathrm{d} A}{\mathrm d x} \right) A^+ ~+~ A^+ A^{+\top} \left(\frac{\mathrm{d} A^\top}{\mathrm{d} x} \right) \left(I - A A^+\right) ~+~ \left(I - A^+ A\right) \left(\frac{\mathrm{d} A^\top}{\mathrm{d} x} \right) A^{+\top} A^+, </math> where the functions <math>A</math>, <math>A^+</math> and derivatives on the right side are evaluated at <math>x_0</math> (that is, <math>A := A(x_0)</math>, <math>A^+ := A^+(x_0)</math>, etc.). For a complex matrix, the transpose is replaced with the conjugate transpose.<ref>Template:Cite book</ref> For a real-valued symmetric matrix, the Magnus-Neudecker derivative is established.<ref>Template:Cite journal</ref>

ExamplesEdit

Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.

For <math>A = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}.</math> The uniqueness of this pseudoinverse can be seen from the requirement <math>A^+ = A^+ A A^+</math>, since multiplication by a zero matrix would always produce a zero matrix.
For <math>A = \begin{pmatrix} 1 & 0 \\ 1 & 0 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{pmatrix}</math>.

Indeed, <math>A\,A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix}</math>, and thus <math>A\,A^+ A = \begin{pmatrix} 1 & 0 \\ 1 & 0\end{pmatrix} = A</math>. Similarly, <math>A^+A = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}</math>, and thus <math>A^+A\,A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{pmatrix} = A^+</math>.

Note that Template:Tmath is neither injective nor surjective, and thus the pseudoinverse cannot be computed via <math>A^+ = \left(A^* A\right)^{-1} A^*</math> nor <math>A^+ = A^* \left( A A^*\right)^{-1}</math>, as <math>A^* A</math> and <math>A A^*</math> are both singular, and furthermore <math>A^+</math> is neither a left nor a right inverse.

Nonetheless, the pseudoinverse can be computed via SVD observing that <math>A=\sqrt2 \left(\frac{\mathbf e_1+\mathbf e_2}{\sqrt2}\right) \mathbf e_1^*</math>, and thus <math>A^+=\frac{1}{\sqrt2} \,\mathbf e_1 \left(\frac{\mathbf e_1+\mathbf e_2}{\sqrt2}\right)^*</math>.

For <math>A = \begin{pmatrix} 1 & 0 \\ -1 & 0 \end{pmatrix},</math> <math>A^+ = \begin{pmatrix} \frac{1}{2} & -\frac{1}{2} \\ 0 & 0 \end{pmatrix}.</math>
For <math>A = \begin{pmatrix} 1 & 0 \\ 2 & 0 \end{pmatrix},</math> <math>A^+ = \begin{pmatrix} \frac{1}{5} & \frac{2}{5} \\ 0 & 0 \end{pmatrix}</math>. The denominators are here <math>5 = 1^2 + 2^2</math>.
For <math>A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix},</math> <math>A^+ = \begin{pmatrix} \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} \end{pmatrix}.</math>
For <math>A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 1 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \end{pmatrix}</math>.

For this matrix, the left inverse exists and thus equals <math>A^+</math>, indeed, <math>A^+A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}.</math>

Special casesEdit

ScalarsEdit

It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar Template:Tmath is zero if Template:Tmath is zero and the reciprocal of Template:Tmath otherwise: <math display="block">x^+ = \begin{cases} 0, & \mbox{if }x = 0; \\ x^{-1}, & \mbox{otherwise}. \end{cases}</math>

VectorsEdit

The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

<math display="block">\vec{x}^+ = \begin{cases} \vec{0}^\mathsf{T}, & \text{if } \vec{x} = \vec{0}; \\[4pt] \dfrac{\vec{x}^*}{(\vec{x}^* \vec{x})}, & \text{otherwise}. \end{cases}</math>

Diagonal matricesEdit

The pseudoinverse of a squared diagonal matrix is obtained by taking the reciprocal of the nonzero diagonal elements. Formally, if <math>D</math> is a squared diagonal matrix with <math>D=\tilde D\oplus \mathbf 0_{k\times k}</math> and <math>\tilde D>0</math>, then <math>D^+=\tilde D^{-1}\oplus \mathbf 0_{k\times k}</math>. More generally, if <math>A</math> is any <math>m\times n</math> rectangular matrix whose only nonzero elements are on the diagonal, meaning <math>A_{ij}=\delta_{ij} a_i</math>, <math>a_i\in\mathbb K</math>, then <math>A^+</math> is a <math>n\times m</math> rectangular matrix whose diagonal elements are the reciprocal of the original ones, that is, <math>A_{ii}\neq 0\implies A^+_{ii}=\frac{1}{A_{ii}}</math>.

Linearly independent columnsEdit

If the rank of Template:Tmath is identical to the number of columns, Template:Tmath, (for Template:Tmath,) there are Template:Tmath linearly independent columns, and Template:Tmath is invertible. In this case, an explicit formula is:Template:Sfn <math display="block">A^+ = \left(A^*A\right)^{-1}A^*.</math>

It follows that Template:Tmath is then a left inverse of Template:Tmath: <math>A^+ A = I_n</math>.

Linearly independent rowsEdit

If the rank of Template:Tmath is identical to the number of rows, Template:Tmath, (for Template:Tmath,) there are Template:Tmath linearly independent rows, and Template:Tmath is invertible. In this case, an explicit formula is: <math display="block">A^+ = A^*\left(A A^*\right)^{-1}.</math>

It follows that Template:Tmath is a right inverse of Template:Tmath: <math>A A^+ = I_m</math>.

Orthonormal columns or rowsEdit

This is a special case of either full column rank or full row rank (treated above). If Template:Tmath has orthonormal columns (<math>A^*A = I_n</math>) or orthonormal rows (<math>A A^* = I_m</math>), then: <math display="block">A^+ = A^* .</math>

Normal matricesEdit

If Template:Tmath is normal, that is, it commutes with its conjugate transpose, then its pseudoinverse can be computed by diagonalizing it, mapping all nonzero eigenvalues to their inverses, and mapping zero eigenvalues to zero. A corollary is that Template:Tmath commuting with its transpose implies that it commutes with its pseudoinverse.

EP matricesEdit

A (square) matrix Template:Tmath is said to be an EP matrix if it commutes with its pseudoinverse. In such cases (and only in such cases), it is possible to obtain the pseudoinverse as a polynomial in Template:Tmath. A polynomial <math>p(t)</math> such that <math>A^+=p(A)</math> can be easily obtained from the characteristic polynomial of Template:Tmath or, more generally, from any annihilating polynomial of Template:Tmath.<ref name="Bajo">Template:Cite journal</ref>

Orthogonal projection matricesEdit

This is a special case of a normal matrix with eigenvalues 0 and 1. If Template:Tmath is an orthogonal projection matrix, that is, <math>A = A^*</math> and <math>A^2 = A</math>, then the pseudoinverse trivially coincides with the matrix itself: <math display="block">A^+ = A.</math>

Circulant matricesEdit

For a circulant matrix Template:Tmath, the singular value decomposition is given by the Fourier transform, that is, the singular values are the Fourier coefficients. Let Template:Tmath be the Discrete Fourier Transform (DFT) matrix; then<ref name="Stallings1972">Template:Cite journal</ref> <math display="block">\begin{align} C &= \mathcal{F}\cdot\Sigma\cdot\mathcal{F}^*,\\ C^+ &= \mathcal{F}\cdot\Sigma^+\cdot\mathcal{F}^*. \end{align}</math>

ConstructionEdit

Rank decompositionEdit

Let Template:Tmath denote the rank of Template:Tmath. Then Template:Tmath can be (rank) decomposed as <math>A = BC</math> where Template:Tmath and Template:Tmath are of rank Template:Tmath. Then <math>A^+ = C^+B^+ = C^*\left(CC^*\right)^{-1}\left(B^*B\right)^{-1}B^*</math>.

The QR methodEdit

For <math>\mathbb{K} \in \{ \mathbb{R}, \mathbb{C}\}</math> computing the product Template:Tmath or Template:Tmath and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the QR decomposition of Template:Tmath may be used instead.

Consider the case when Template:Tmath is of full column rank, so that <math>A^+ = \left(A^*A\right)^{-1}A^*</math>. Then the Cholesky decomposition <math>A^*A = R^*R</math>, where Template:Tmath is an upper triangular matrix, may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides, <math display="block">A^+ = \left(A^*A\right)^{-1}A^* \quad \Leftrightarrow \quad \left(A^*A\right)A^+ = A^* \quad \Leftrightarrow \quad R^*RA^+ = A^*</math>

which may be solved by forward substitution followed by back substitution.

The Cholesky decomposition may be computed without forming Template:Tmath explicitly, by alternatively using the QR decomposition of <math>A = Q R</math>, where <math>Q</math> has orthonormal columns, <math>Q^*Q = I</math>, and Template:Tmath is upper triangular. Then <math display="block">A^*A\, =\, (Q R)^*(Q R) \,=\, R^*Q^*Q R \,=\, R^*R ,</math>

so Template:Tmath is the Cholesky factor of Template:Tmath.

The case of full row rank is treated similarly by using the formula <math>A^+ = A^*\left(A A^*\right)^{-1}</math> and using a similar argument, swapping the roles of Template:Tmath and Template:Tmath.

Using polynomials in matricesEdit

For an arbitrary Template:Tmath, one has that <math>A^*A</math> is normal and, as a consequence, an EP matrix. One can then find a polynomial <math>p(t)</math> such that <math>(A^*A)^+=p(A^*A)</math>. In this case one has that the pseudoinverse of Template:Tmath is given by<ref name="Bajo" /> <math display="block">A^+= p(A^*A)A^*= A^*p(AA^*).</math>

Singular value decomposition (SVD)Edit

A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition.Template:Sfn<ref name="GvL1996"/><ref name="SLEandPI">Linear Systems & Pseudo-Inverse</ref> If <math>A = U\Sigma V^*</math> is the singular value decomposition of Template:Tmath, then <math>A^+ = V\Sigma^+ U^*</math>. For a rectangular diagonal matrix such as Template:Tmath, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB or GNU Octave function Template:Mono, the tolerance is taken to be Template:Math, where ε is the machine epsilon.

The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.

The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix Template:Tmath has a singular value 0 (a diagonal entry of the matrix Template:Tmath above), then modifying Template:Tmath slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.

Block matricesEdit

Optimized approaches exist for calculating the pseudoinverse of block-structured matrices.

The iterative method of Ben-Israel and CohenEdit

Another method for computing the pseudoinverse (cf. Drazin inverse) uses the recursion <math display="block">A_{i+1} = 2A_i - A_i A A_i,</math>

which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of Template:Tmath if it is started with an appropriate Template:Tmath satisfying <math>A_0 A = \left(A_0 A\right)^*</math>. The choice <math>A_0 = \alpha A^*</math> (where <math>0 < \alpha < 2/\sigma^2_1(A)</math>, with Template:Tmath denoting the largest singular value of Template:Tmath)<ref>Template:Cite journal pdf</ref> has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before Template:Tmath enters the region of quadratic convergence.<ref>Template:Cite journal</ref> However, if started with Template:Tmath already close to the Moore–Penrose inverse and <math>A_0 A = \left(A_0 A\right)^*</math>, for example <math>A_0 := \left(A^* A + \delta I\right)^{-1} A^*</math>, convergence is fast (quadratic).

Updating the pseudoinverseEdit

For the cases where Template:Tmath has full row or column rank, and the inverse of the correlation matrix (Template:Tmath for Template:Tmath with full row rank or Template:Tmath for full column rank) is already known, the pseudoinverse for matrices related to Template:Tmath can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.<ref name="G1992">Template:Cite thesis</ref><ref name="EMTIYAZ2008">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.<ref>Template:Cite journal</ref><ref>Template:Cite journal</ref>

Software librariesEdit

High-quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

The Python package NumPy provides a pseudoinverse calculation through its functions matrix.I and linalg.pinv; its pinv uses the SVD-based algorithm. SciPy adds a function scipy.linalg.pinv that uses a least-squares solver.

The MASS package for R provides a calculation of the Moore–Penrose inverse through the ginv function.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> The ginv function calculates a pseudoinverse using the singular value decomposition provided by the svd function in the base R package. An alternative is to employ the pinv function available in the pracma package.

The Octave programming language provides a pseudoinverse through the standard package function pinv and the pseudo_inverse() method.

In Julia (programming language), the LinearAlgebra package of the standard library provides an implementation of the Moore–Penrose inverse pinv() implemented via singular-value decomposition.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

ApplicationsEdit

Linear least-squaresEdit

Template:See also

The pseudoinverse provides a least squares solution to a system of linear equations.<ref name="Penrose1956">Template:Cite journal</ref> For Template:Tmath, given a system of linear equations <math display="block">A x = b,</math>

in general, a vector Template:Tmath that solves the system may not exist, or if one does exist, it may not be unique. More specifically, a solution exists if and only if <math>b</math> is in the image of <math>A</math>, and is unique if and only if <math>A</math> is injective. The pseudoinverse solves the "least-squares" problem as follows:

Template:Tmath, we have <math>\left\|Ax - b\right\|_2 \ge \left\|Az - b\right\|_2</math> where <math>z = A^+b</math> and <math>\|\cdot\|_2</math> denotes the Euclidean norm. This weak inequality holds with equality if and only if <math>x = A^+b + \left(I - A^+A\right)w</math> for any vector Template:Tmath; this provides an infinitude of minimizing solutions unless Template:Tmath has full column rank, in which case Template:Tmath is a zero matrix.<ref name=Planitz>Template:Cite journal</ref> The solution with minimum Euclidean norm is Template:Tmath<ref name=Planitz/>

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let Template:Tmath.

Template:Tmath, we have <math>\|AX - B\|_{\mathrm{F}} \ge \|AZ -B\|_{\mathrm{F}}</math> where <math>Z = A^+B</math> and <math>\|\cdot\|_{\mathrm{F}}</math> denotes the Frobenius norm.

Obtaining all solutions of a linear systemEdit

If the linear system

has any solutions, they are all given by<ref name=James>Template:Cite journal</ref>

<math display="block">x = A^+ b + \left[I - A^+ A\right]w</math>

for arbitrary vector Template:Tmath. Solution(s) exist if and only if <math>A A^+ b = b</math>.<ref name=James/> If the latter holds, then the solution is unique if and only if Template:Tmath has full column rank, in which case Template:Tmath is a zero matrix. If solutions exist but Template:Tmath does not have full column rank, then we have an indeterminate system, all of whose infinitude of solutions are given by this last equation.

Minimum norm solution to a linear systemEdit

For linear systems <math>Ax = b,</math> with non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimum Euclidean norm <math>\|x\|_2</math> among all solutions.

If <math>Ax = b</math> is satisfiable, the vector <math>z = A^+b</math> is a solution, and satisfies <math>\|z\|_2 \le \|x\|_2</math> for all solutions.

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let Template:Tmath.

If <math>AX = B</math> is satisfiable, the matrix <math>Z = A^+B</math> is a solution, and satisfies <math>\|Z\|_{\mathrm{F}} \le \|X\|_{\mathrm{F}}</math> for all solutions.

Condition numberEdit

Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix: <math display="block">\mbox{cond}(A) = \|A\| \left\|A^+\right\|.</math>

A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of Template:Tmath can lead to huge errors in the entries of the solution.<ref name=hagen/>

GeneralizationsEdit

The weighted pseudoinverse <ref>Template:Cite journal</ref> generalizes the Moore-Penrose inverse between metric spaces with weight matrices in the domain and range. These weights are the identity for the standard Moore-Penrose inverse, which assumes an orthonormal basis in both spaces.

In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators Template:Tmath between two Hilbert spaces Template:Tmath and Template:Tmath, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.<ref name="hagen">Template:Cite book</ref> Those that do are precisely the ones whose range is closed in Template:Tmath.

A notion of pseudoinverse exists for matrices over an arbitrary field equipped with an arbitrary involutive automorphism. In this more general setting, a given matrix doesn't always have a pseudoinverse. The necessary and sufficient condition for a pseudoinverse to exist is that <math>\operatorname{rank}(A) = \operatorname{rank}\left(A^* A\right) = \operatorname{rank}\left(A A^*\right)</math>, where <math>A^*</math> denotes the result of applying the involution operation to the transpose of <math>A</math>. When it does exist, it is unique.<ref>Template:Cite journal</ref> Example: Consider the field of complex numbers equipped with the identity involution (as opposed to the involution considered elsewhere in the article); do there exist matrices that fail to have pseudoinverses in this sense? Consider the matrix <math>A = \begin{bmatrix}1 & i\end{bmatrix}^\mathsf{T}</math>. Observe that <math>\operatorname{rank}\left(A A^\mathsf{T}\right) = 1</math> while <math>\operatorname{rank}\left(A^\mathsf{T} A\right) = 0</math>. So this matrix doesn't have a pseudoinverse in this sense.

In abstract algebra, a Moore–Penrose inverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.

NotesEdit

Template:Reflist

ReferencesEdit

External linksEdit

Template:PlanetMath
Interactive program & tutorial of Moore–Penrose Pseudoinverse
Template:PlanetMath
{{#invoke:Template wrapper|{{#if:|list|wrap}}|_template=cite web

|_exclude=urlname, _debug, id |url = https://mathworld.wolfram.com/{{#if:Pseudoinverse%7CPseudoinverse.html}} |title = Pseudoinverse |author = Weisstein, Eric W. |website = MathWorld |access-date = |ref = Template:SfnRef }}

{{#invoke:Template wrapper|{{#if:|list|wrap}}|_template=cite web

|_exclude=urlname, _debug, id |url = https://mathworld.wolfram.com/{{#if:Moore-PenroseMatrixInverse%7CMoore-PenroseMatrixInverse.html}} |title = Moore–Penrose Inverse |author = Weisstein, Eric W. |website = MathWorld |access-date = |ref = Template:SfnRef }}

Template:Numerical linear algebra Template:Roger Penrose

Moore–Penrose inverse

NotationEdit

DefinitionEdit

PropertiesEdit

Existence and uniquenessEdit

Basic propertiesEdit

IdentitiesEdit

Reduction to Hermitian caseEdit

Pseudoinverse of productsEdit

ProjectorsEdit

Geometric constructionEdit

Limit relationsEdit

ContinuityEdit

DerivativeEdit

ExamplesEdit

Special casesEdit

ScalarsEdit

VectorsEdit

Diagonal matricesEdit

Linearly independent columnsEdit

Linearly independent rowsEdit

Orthonormal columns or rowsEdit

Normal matricesEdit

EP matricesEdit

Orthogonal projection matricesEdit

Circulant matricesEdit

ConstructionEdit

Rank decompositionEdit

The QR methodEdit

Using polynomials in matricesEdit

Singular value decomposition (SVD)Edit

Block matricesEdit

The iterative method of Ben-Israel and CohenEdit

Updating the pseudoinverseEdit

Software librariesEdit

ApplicationsEdit

Linear least-squaresEdit

Obtaining all solutions of a linear systemEdit

Minimum norm solution to a linear systemEdit

Condition numberEdit

GeneralizationsEdit

See alsoEdit

NotesEdit

ReferencesEdit

External linksEdit