Editing Moore–Penrose inverse (section)

==Properties==

===Existence and uniqueness===
As discussed above, for any matrix {{tmath| A }} there is one and only one pseudoinverse {{tmath| A^+ }}.<ref name="GvL1996"/>

A matrix satisfying only the first of the conditions given above, namely <math display="inline">A A^+ A = A</math>, is known as a generalized inverse. If the matrix also satisfies the second condition, namely <math display="inline">A^+ A A^+ = A^+</math>, it is called a [[generalized inverse#Types of generalized inverses|generalized ''reflexive'' inverse]]. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

===Basic properties===
Proofs for the properties below can be found at [[b:Topics in Abstract Algebra/Linear algebra]].
* If {{tmath| A }} has real entries, then so does {{tmath| A^+ }}.
* If {{tmath| A }} is [[invertible matrix|invertible]], its pseudoinverse is its inverse. That is, <math>A^+ = A^{-1}</math>.<ref name="SB2002">{{Cite book | last1=Stoer | first1=Josef | last2=Bulirsch | first2=Roland | title=Introduction to Numerical Analysis | publisher=[[Springer-Verlag]] | location=Berlin, New York | edition=3rd | isbn=978-0-387-95452-3 | year=2002}}.</ref>{{rp|243}}
* The pseudoinverse of the pseudoinverse is the original matrix: <math>\left(A^+\right)^+ = A</math>.<ref name="SB2002" />{{rp|245}}
* Pseudoinversion commutes with transposition, complex conjugation, and taking the conjugate transpose:<ref name="SB2002" />{{rp|245}} <!-- reference only mentions the last bit --> <math display="block">\left(A^\mathsf{T}\right)^+ = \left(A^+\right)^\mathsf{T}, \quad \left(\overline{A}\right)^+ = \overline{A^+}, \quad \left(A^*\right)^+ = \left(A^+\right)^* .</math>
* The pseudoinverse of a scalar multiple of {{tmath| A }} is the reciprocal multiple of {{tmath| A^+ }}:<math display="block">\left(\alpha A\right)^+ = \alpha^{-1} A^+</math> for {{tmath| \alpha \neq 0 }}; otherwise, <math>\left(0 A\right)^+ = 0 A^+ = 0 A^\mathsf{T}</math>, or <math>0^+=0^\mathsf{T}</math>.
* The kernel and image of the pseudoinverse coincide with those of the conjugate transpose: <math>\ker\left(A^+\right) = \ker\left(A^*\right)</math> and <math>\operatorname{ran}\left(A^+\right) = \operatorname{ran}\left(A^*\right)</math>.

====Identities====
The following identity formula can be used to cancel or expand certain subexpressions involving pseudoinverses:
<math display="block">
A = {}A{}A^*{}A^{+*}{} = {}A^{+*}{}A^*{}A.
</math>
Equivalently, substituting <math>A^+</math> for <math>A</math> gives
<math display="block">
A^+ ={}A^+{}A^{+*}{}A^*{} = {}A^*{}A^{+*}{}A^+,
</math>
while substituting <math>A^*</math> for <math>A</math> gives
<math display="block">
A^* ={}A^*{}A{}A^+{}={}A^+{}A{}A^*.
</math>

===Reduction to Hermitian case===
The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences:
<math display="block">A^+ = \left(A^*A\right)^+ A^*,</math>
<math display="block">A^+ = A^* \left(A A^*\right)^+,</math>

as {{tmath| A^*A }} and {{tmath| A A^* }} are Hermitian.

===Pseudoinverse of products===

The equality {{tmath|1= (AB)^+ = B^+ A^+ }} does not hold in general. Rather, suppose {{tmath| A \in \mathbb{K}^{m\times n},\ B \in \mathbb{K}^{n\times p} }}. Then the following are equivalent:<ref>{{Cite journal|last=Greville|first=T. N. E.|date=1966-10-01|title=Note on the Generalized Inverse of a Matrix Product|url=https://epubs.siam.org/doi/10.1137/1008107|journal=SIAM Review|volume=8|issue=4|pages=518–521|doi=10.1137/1008107|bibcode=1966SIAMR...8..518G |issn=0036-1445}}</ref>

# <math display="inline">(AB)^+ = B^+ A^+</math>
# <math>A^+ A BB^* A^*  = BB^* A^*  </math> and <math>BB^+ A^* A B = A^* A B</math>
# <math display="inline">\left(A^+ A BB^*\right)^*  = A^+ A BB^*</math> and <math>\left(A^* A BB^+\right)^*  = A^* A BB^+</math>
# <math display="inline">A^+ A BB^* A^* A BB^+ = BB^* A^* A</math>
# <math display="inline">A^+ A B  = B (AB)^+ AB </math> and <math>BB^+ A^*  = A^* A B (AB)^+</math>.

The following are sufficient conditions for {{tmath|1= (AB)^+ = B^+ A^+ }}:
# {{tmath| A }} has orthonormal columns (then <math>A^*A = A^+ A = I_n</math>), or
# {{tmath| B }} has orthonormal rows (then <math>BB^* = BB^+ = I_n</math>), or
# {{tmath| A }} has linearly independent columns (then <math>A^+ A = I</math> ) and {{tmath| B }} has linearly independent rows (then <math>BB^+ = I</math>), &nbsp; or
# <math>B = A^*</math>, or
# <math>B = A^+</math>.

The following is a necessary condition for {{tmath|1= (AB)^+ = B^+ A^+ }}:
# <math>(A^+ A) (BB^+) = (BB^+) (A^+ A)</math>

The fourth sufficient condition yields the equalities
<math display="block">\begin{align}
\left(A A^*\right)^+ &= A^{+*} A^+, \\
\left(A^* A\right)^+ &= A^+ A^{+*}.
\end{align}</math>

Here is a counterexample where {{tmath|1= (AB)^+ \neq B^+ A^+ }}:

<math display="block">\Biggl( \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix} \Biggr)^+ = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+ = \begin{pmatrix}
 \tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} \quad \neq \quad \begin{pmatrix}
 \tfrac14 & 0 \\ \tfrac14 & 0 \end{pmatrix} = \begin{pmatrix} 0 & \tfrac12 \\ 0 & \tfrac12 \end{pmatrix} \begin{pmatrix} \tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix}^+ \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+</math>

===Projectors===
<math>P = A A^+</math> and <math>Q = A^+A</math> are [[projection (linear algebra)|orthogonal projection operators]], that is, they are Hermitian (<math>P = P^*</math>, <math>Q = Q^*</math>) and idempotent (<math>P^2 = P</math> and <math>Q^2 = Q</math>). The following hold:
* <math>PA = AQ = A</math> and <math>A^+ P = QA^+ = A^+</math>
* {{tmath| P }} is the [[orthogonal projector]] onto the [[range of a function|range]] of {{tmath| A }} (which equals the [[orthogonal complement]] of the kernel of {{tmath| A^* }}).
* {{tmath| Q }} is the orthogonal projector onto the range of {{tmath| A^* }} (which equals the orthogonal complement of the kernel of {{tmath| A }}).
* <math>I - Q = I - A^+A</math> is the orthogonal projector onto the kernel of {{tmath| A }}.
* <math>I - P = I - A A^+</math> is the orthogonal projector onto the kernel of {{tmath| A^* }}.<ref name="GvL1996"/>

The last two properties imply the following identities:
* <math>A\,\ \left(I - A^+ A\right)= \left(I - A A^+\right)A\ \ = 0</math>
* <math>A^*\left(I - A A^+\right) = \left(I - A^+A\right)A^* = 0</math>

Another property is the following: if {{tmath| A \in \mathbb{K}^{n\times n} }} is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix {{tmath| B\in \mathbb{K}^{m\times n} }} the following equation holds:<ref>{{cite journal|first1=Anthony A.|last1=Maciejewski|first2=Charles A.|last2=Klein|title=Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments|journal=International Journal of Robotics Research|volume=4|issue=3|pages=109–117|year=1985|doi=10.1177/027836498500400308|hdl=10217/536|s2cid=17660144|hdl-access=free}}</ref>
<math display="block">A(BA)^+ = (BA)^+</math>

This can be proven by defining matrices <math>C = BA</math>, <math>D = A(BA)^+</math>, and checking that {{tmath| D }} is indeed a pseudoinverse for {{tmath| C }} by verifying that the defining properties of the pseudoinverse hold, when {{tmath| A }} is Hermitian and idempotent.

From the last property it follows that, if {{tmath| A \in \mathbb{K}^{n\times n} }} is Hermitian and idempotent, for any matrix {{tmath| B \in \mathbb{K}^{n\times m} }}
<math display="block">(AB)^+A = (AB)^+</math>

Finally, if {{tmath| A }} is an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, that is, <math>A^+ = A</math>.

===Geometric construction===
If we view the matrix as a linear map {{tmath| A:\mathbb{K}^n \to \mathbb{K}^m }} over the field {{tmath| \mathbb{K} }} then {{tmath| A^+: \mathbb{K}^m \to \mathbb{K}^n }} can be decomposed as follows. We write {{tmath| \oplus }} for the [[direct sum of modules|direct sum]], {{tmath| \perp }} for the [[orthogonal complement]], {{tmath| \ker }} for the [[kernel (linear algebra)|kernel]] of a map, and {{tmath| \operatorname{ran} }} for the image of a map. Notice that <math>\mathbb{K}^n = \left(\ker A\right)^\perp \oplus \ker A</math> and <math>\mathbb{K}^m = \operatorname{ran} A \oplus \left(\operatorname{ran} A\right)^\perp</math>. The restriction <math> A: \left(\ker A\right)^\perp \to \operatorname{ran} A</math> is then an isomorphism. This implies that {{tmath| A^+ }} on {{tmath| \operatorname{ran} A }} is the inverse of this isomorphism, and is zero on <math>\left(\operatorname{ran} A\right)^\perp .</math>

In other words: To find {{tmath| A^+b }} for given {{tmath| b }} in {{tmath| \mathbb{K}^m }}, first project {{tmath| b }} orthogonally onto the range of {{tmath| A }}, finding a point {{tmath| p(b) }} in the range. Then form {{tmath| A^{-1}(\{p(b)\}) }}, that is, find those vectors in {{tmath| \mathbb{K}^n }} that {{tmath| A }} sends to {{tmath| p(b) }}. This will be an affine subspace of {{tmath| \mathbb{K}^n }} parallel to the kernel of {{tmath| A }}. The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer {{tmath| A^+b }} we are looking for. It can be found by taking an arbitrary member of {{tmath| A^{-1}(\{p(b)\}) }} and projecting it orthogonally onto the orthogonal complement of the kernel of {{tmath| A }}.

This description is closely related to the [[#Minimum norm solution to a linear system|minimum-norm solution to a linear system]].

===Limit relations===
The pseudoinverse are limits:
<math display="block">A^+ = \lim_{\delta \searrow 0} \left(A^* A + \delta I\right)^{-1} A^*
= \lim_{\delta \searrow 0} A^* \left(A A^* + \delta I\right)^{-1}
</math>
(see [[Tikhonov regularization]]). These limits exist even if {{tmath| \left(A A^*\right)^{-1} }} or {{tmath| \left(A^*A\right)^{-1} }} do not exist.<ref name="GvL1996"/>{{rp|263}}<ref>{{cite journal | title = The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory | date = 2012 | doi = 10.1007/s13538-011-0052-z | arxiv = 1110.6882 | last1 = Barata | first1 = João Carlos Alves | last2 = Hussein | first2 = Mahir Saleh | journal = Brazilian Journal of Physics | volume = 42 | issue = 1–2 | pages = 146–165 | bibcode = 2012BrJPh..42..146B }}</ref>

===Continuity===
In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not [[continuous function|continuous]]: if the sequence {{tmath| \left(A_n\right) }} converges to the matrix {{tmath| A }} (in the [[matrix norm|maximum norm or Frobenius norm]], say), then {{tmath| (A_n)^+ }} need not converge to {{tmath| A^+ }}. However, if all the matrices {{tmath| A_n}} have the same rank as {{tmath| A }}, {{tmath| (A_n)^+ }} will converge to {{tmath| A^+ }}.<ref name="rakocevic1997">{{cite journal | last=Rakočević | first=Vladimir | title=On continuity of the Moore–Penrose and Drazin inverses | journal=Matematički Vesnik | volume=49 | pages=163–72 | year=1997 | url =http://elib.mi.sanu.ac.rs/files/journals/mv/209/mv973404.pdf }}</ref>

===Derivative===
Let <math>x \mapsto A(x)</math> be a real-valued differentiable matrix function with constant rank in a neighborhood of a point {{tmath| x_0 }}.
The derivative of <math>x \mapsto A^+(x)</math> at <math>x_0</math> may be calculated in terms of the derivative of <math>A</math> at <math>x_0</math>:<ref>{{cite journal|title=The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate|first1=G. H.|last1=Golub |author-link=Gene H. Golub |first2=V.|last2=Pereyra|journal=SIAM Journal on Numerical Analysis|volume=10|number=2|date=April 1973|pages=413–32|jstor=2156365|doi=10.1137/0710036|bibcode=1973SJNA...10..413G}}</ref> 
<math display="block">
\left.\frac{\mathrm d}{\mathrm d x}\right|_{x = x_0\!\!\!\!\!\!\!} A^+
= -A^+ \left( \frac{\mathrm{d} A}{\mathrm d x}  \right) A^+ 
~+~ A^+ A^{+\top} \left(\frac{\mathrm{d} A^\top}{\mathrm{d} x} \right) \left(I - A A^+\right) 
~+~ \left(I - A^+ A\right) \left(\frac{\mathrm{d} A^\top}{\mathrm{d} x} \right) A^{+\top} A^+,
</math>
where the functions <math>A</math>, <math>A^+</math> and derivatives on the right side are evaluated at <math>x_0</math> (that is, <math>A := A(x_0)</math>, <math>A^+ := A^+(x_0)</math>, etc.). 
For a complex matrix, the transpose is replaced with the conjugate transpose.<ref>{{cite book |last1=Hjørungnes |first1=Are |title=Complex-valued matrix derivatives: with applications in signal processing and communications |date=2011 |publisher=Cambridge university press |location=New York |isbn=9780521192644 |page=52}}</ref> For a real-valued symmetric matrix, the [[Magnus-Neudecker derivative]] is established.<ref>{{Cite journal| last1=Liu|first1=Shuangzhe|
last2= Trenkler|first2=Götz| last3=Kollo|first3=Tõnu| 
last4=von Rosen|first4=Dietrich| 
last5=Baksalary|first5=Oskar Maria| 
date= 2023|
title= Professor Heinz Neudecker and matrix differential calculus| 
journal= Statistical Papers|volume=65 |issue=4 |pages=2605–2639 | language=en|
doi= 10.1007/s00362-023-01499-w}}</ref>