Editing Projection (linear algebra) (section)

== Properties and classification ==
[[File:Oblique projection.svg|frame|right|The transformation ''T'' is the projection along ''k'' onto ''m''. The range of ''T'' is ''m'' and the kernel is ''k''.]]

===Idempotence===
By definition, a projection <math>P</math> is [[idempotent]] (i.e. <math>P^2 = P</math>).

===Open map===
Every projection is an [[open map]] onto its image, meaning that it maps each [[open set]] in the [[Domain of a function|domain]] to an open set in the [[subspace topology]] of the [[image (mathematics)|image]].{{cn|date=November 2022}}  That is, for any vector <math>\mathbf{x}</math> and any ball <math>B_\mathbf{x}</math> (with positive radius) centered on <math>\mathbf{x}</math>, there exists a ball <math>B_{P\mathbf{x}}</math> (with positive radius) centered on <math>P\mathbf{x}</math> that is wholly contained in the image <math>P(B_\mathbf{x})</math>.

===Complementarity of image and kernel===
Let <math>W</math> be a finite-dimensional vector space and <math>P</math> be a projection on <math>W</math>. Suppose the [[Linear subspace|subspace]]s <math>U</math> and <math>V</math> are the [[Image (mathematics)|image]] and [[Kernel (linear algebra)|kernel]] of <math>P</math> respectively. Then <math>P</math> has the following properties:

# <math>P</math> is the [[identity operator]] <math>I</math> on <math>U</math>: <math display="block">\forall \mathbf x \in U: P \mathbf x = \mathbf x.</math>
# We have a [[direct sum of vector spaces|direct sum]] <math>W = U \oplus V</math>. Every vector <math>\mathbf x \in W</math> may be decomposed uniquely as <math>\mathbf x = \mathbf u + \mathbf v</math> with <math>\mathbf u = P \mathbf x</math> and <math>\mathbf v = \mathbf x - P \mathbf x = \left(I-P\right) \mathbf x</math>, and where <math>\mathbf u \in U, \mathbf v \in V.</math>

The image and kernel of a projection are ''complementary'', as are <math>P</math> and <math>Q = I - P</math>. The operator <math>Q</math> is also a projection as the image and kernel of <math>P</math> become the kernel and image of <math>Q</math> and vice versa. We say <math>P</math> is a projection along <math>V</math> onto <math>U</math> (kernel/image) and <math>Q</math> is a projection along <math>U</math> onto <math>V</math>.

===Spectrum===
In infinite-dimensional vector spaces, the [[Spectrum (functional analysis)|spectrum]] of a projection is contained in <math>\{ 0, 1 \}</math> as
<math display="block">(\lambda I - P)^{-1} = \frac 1 \lambda I + \frac 1 {\lambda(\lambda-1)} P.</math>
Only 0 or 1 can be an [[eigenvalue]] of a projection. This implies that an orthogonal projection <math>P</math> is always a [[positive semi-definite matrix]]. In general, the corresponding [[eigenspace]]s are (respectively) the kernel and range of the projection. Decomposition of a vector space into direct sums is not unique. Therefore, given a subspace <math>V</math>, there may be many projections whose range (or kernel) is <math>V</math>.

If a projection is nontrivial it has [[minimal polynomial (linear algebra)|minimal polynomial]] <math>x^2 - x = x (x-1)</math>, which factors into distinct linear factors, and thus <math>P</math> is [[diagonalizable]].

===Product of projections===
The product of projections is not in general a projection, even if they are orthogonal. If two projections [[commuting matrices|commute]] then their product is a projection, but the [[converse (logic)|converse]] is false: the product of two non-commuting projections may be a projection. 

If two orthogonal projections commute then their product is an orthogonal projection. If the product of two orthogonal projections is an orthogonal projection, then the two orthogonal projections commute (more generally: two self-adjoint [[endomorphism]]s commute if and only if their product is self-adjoint).

===Orthogonal projections===
{{main article|Hilbert projection theorem|Complemented subspace}}

When the vector space <math>W</math> has an [[inner product]] and is complete (is a [[Hilbert space]]) the concept of [[orthogonality]] can be used. An '''orthogonal projection''' is a projection for which the range <math>U</math> and the kernel <math>V</math> are [[orthogonality|orthogonal subspaces]]. Thus, for every <math>\mathbf x</math> and <math>\mathbf y</math> in <math>W</math>, <math> \langle P \mathbf x, (\mathbf y - P \mathbf y) \rangle = \langle (\mathbf x - P \mathbf x) , P \mathbf y \rangle = 0</math>. Equivalently:
<math display="block"> \langle \mathbf x, P \mathbf y \rangle = \langle P \mathbf x, P \mathbf y \rangle = \langle P \mathbf x, \mathbf y \rangle. </math>

A projection is orthogonal if and only if it is [[self-adjoint operator|self-adjoint]]. Using the self-adjoint and idempotent properties of <math>P</math>, for any <math>\mathbf x</math> and <math>\mathbf y</math> in <math>W</math> we have <math>P\mathbf{x} \in U</math>, <math>\mathbf{y} - P\mathbf{y} \in V</math>, and
<math display="block"> \langle P \mathbf x, \mathbf y - P \mathbf y \rangle = \langle \mathbf x, \left(P-P^2\right) \mathbf y \rangle = 0</math>
where <math>\langle \cdot, \cdot \rangle</math> is the inner product associated with <math>W</math>. Therefore, <math>P </math> and <math>I - P </math> are orthogonal projections.<ref>Meyer, p. 433</ref> The other direction, namely that if <math>P</math> is orthogonal then it is self-adjoint, follows from the implication from <math>\langle (\mathbf x - P \mathbf x) , P \mathbf y \rangle =  \langle P \mathbf x, (\mathbf y - P \mathbf y) \rangle = 0</math> to
<math display="block"> \langle \mathbf x, P \mathbf y \rangle = \langle P \mathbf x, P\mathbf y \rangle = \langle P \mathbf x, \mathbf y \rangle = \langle \mathbf x, P^* \mathbf y \rangle </math>
for every <math>x</math> and <math>y</math> in <math>W</math>; thus <math>P=P^*</math>.

The existence of an orthogonal projection onto a closed subspace follows from the [[Hilbert projection theorem]].

====Properties and special cases====
An orthogonal projection is a [[bounded operator]]. This is because for every <math>\mathbf v</math> in the vector space we have, by the [[Cauchy–Schwarz inequality]]:
<math display="block">\left \| P \mathbf v\right\|^2 = \langle P \mathbf v, P \mathbf v \rangle = \langle P \mathbf v, \mathbf v \rangle \leq \left\|P \mathbf v\right\| \cdot \left\|\mathbf v\right\|</math>
Thus <math>\left\|P \mathbf v\right\| \leq \left\|\mathbf v\right\|</math>.

For finite-dimensional complex or real vector spaces, the [[standard inner product]] can be substituted for <math>\langle \cdot, \cdot \rangle</math>.

=====Formulas=====
A simple case occurs when the orthogonal projection is onto a line. If <math>\mathbf u</math> is a [[unit vector]] on the line, then the projection is given by the [[outer product]]
<math display="block"> P_\mathbf{u} = \mathbf u \mathbf u^\mathsf{T}.</math>
(If <math>\mathbf u</math> is complex-valued, the transpose in the above equation is replaced by a Hermitian transpose). This operator leaves '''u''' invariant, and it annihilates all vectors orthogonal to <math>\mathbf u</math>, proving that it is indeed the orthogonal projection onto the line containing '''u'''.<ref>Meyer, p. 431</ref> A simple way to see this is to consider an arbitrary vector <math>\mathbf x</math> as the sum of a component on the line (i.e. the projected vector we seek) and another perpendicular to it, <math>\mathbf x = \mathbf x_\parallel + \mathbf x_\perp</math>. Applying projection, we get
<math display="block">
  P_{\mathbf u} \mathbf x =
  \mathbf u \mathbf u^\mathsf{T} \mathbf x_\parallel + \mathbf u \mathbf u^\mathsf{T} \mathbf x_\perp =
  \mathbf u \left( \sgn\left(\mathbf u^\mathsf{T} \mathbf x_\parallel\right) \left \| \mathbf x_\parallel \right \| \right) + \mathbf u \cdot \mathbf 0 = \mathbf x_\parallel
</math>
by the properties of the [[dot product]] of parallel and perpendicular vectors.

This formula can be generalized to orthogonal projections on a subspace of arbitrary [[dimension (vector space)|dimension]]. Let <math>\mathbf u_1, \ldots, \mathbf u_k</math> be an [[orthonormal basis]] of the subspace <math>U</math>, with the assumption that the integer <math>k \geq 1</math>, and let <math>A</math> denote the <math>n \times k</math> matrix whose columns are <math>\mathbf u_1, \ldots, \mathbf u_k</math>, i.e., <math>A = \begin{bmatrix} \mathbf u_1 & \cdots & \mathbf u_k \end{bmatrix}</math>. Then the projection is given by:<ref>Meyer, equation (5.13.4)</ref>
<math display="block">P_A = A A^\mathsf{T}</math>
which can be rewritten as
<math display="block">P_A = \sum_i \langle \mathbf u_i, \cdot \rangle \mathbf u_i.</math>

The matrix <math>A^\mathsf{T}</math> is the [[partial isometry]] that vanishes on the [[orthogonal complement]] of <math>U</math>, and <math>A</math> is the isometry that embeds <math>U</math> into the underlying vector space. The range of <math>P_A</math> is therefore the ''final space'' of <math>A</math>. It is also clear that <math>A A^{\mathsf T}</math> is the identity operator on <math>U</math>.

The orthonormality condition can also be dropped. If <math>\mathbf u_1, \ldots, \mathbf u_k</math> is a (not necessarily orthonormal) [[basis (linear algebra)|basis]] with <math>k \geq 1</math>, and <math>A</math> is the matrix with these vectors as columns, then the projection is:<ref>{{Citation | last1 = Banerjee | first1 = Sudipto | last2 = Roy | first2 = Anindya | date = 2014 | title = Linear Algebra and Matrix Analysis for Statistics | series = Texts in Statistical Science | publisher = Chapman and Hall/CRC | edition =  1st | isbn =  978-1420095388 | url=https://books.google.com/books?id=iIOhAwAAQBAJ&q=projection}}</ref><ref>Meyer, equation (5.13.3)</ref>
<math display="block">P_A = A \left(A^\mathsf{T} A\right)^{-1} A^\mathsf{T}.</math>

The matrix <math>A</math> still embeds <math>U</math> into the underlying vector space but is no longer an isometry in general. The matrix <math>\left(A^\mathsf{T}A\right)^{-1}</math> is a "normalizing factor" that recovers the norm. For example, the [[rank of a linear operator|rank]]-1 operator <math>\mathbf u \mathbf u^\mathsf{T}</math> is not a projection if <math>\left\|\mathbf u \right\| \neq 1.</math> After dividing by <math>\mathbf u^\mathsf{T} \mathbf u = \left\| \mathbf u \right\|^2,</math> we obtain the projection <math>\mathbf u \left(\mathbf u^\mathsf{T} \mathbf u \right)^{-1} \mathbf u^\mathsf{T}</math> onto the subspace spanned by <math>u</math>.

In the general case, we can have an arbitrary [[positive definite]] matrix <math>D</math> defining an inner product <math>\langle x, y \rangle_D = y^\dagger Dx</math>, and the projection <math>P_A</math> is given by <math display="inline">P_A x = \operatorname{argmin}_{y \in \operatorname{range}(A)} \left\|x - y\right\|^2_D</math>. Then
<math display="block">P_A = A \left(A^\mathsf{T} D A\right)^{-1} A^\mathsf{T} D.</math>

When the range space of the projection is generated by a [[Frame of a vector space|frame]] (i.e. the number of generators is greater than its dimension), the formula for the projection takes the form: <math>P_A = A A^+</math>. Here <math>A^+</math> stands for the [[Moore–Penrose pseudoinverse]]. This is just one of many ways to construct the projection operator.

If <math>\begin{bmatrix} A & B \end{bmatrix}</math> is a non-singular matrix and <math>A^\mathsf{T}B = 0</math> (i.e., <math>B</math> is the [[null space]] matrix of <math>A</math>),<ref>See also [[Linear least squares (mathematics)#Properties of the least-squares estimators|Linear least squares (mathematics) § Properties of the least-squares estimators]].</ref> the following holds: 
<math display="block">\begin{align}
I &= \begin{bmatrix} A & B \end{bmatrix}
\begin{bmatrix} A & B \end{bmatrix}^{-1}\begin{bmatrix} A^\mathsf{T} \\ B^\mathsf{T} \end{bmatrix}^{-1}
\begin{bmatrix} A^\mathsf{T} \\ B^\mathsf{T} \end{bmatrix} \\
  &= \begin{bmatrix} A & B \end{bmatrix}
\left(
\begin{bmatrix} A^\mathsf{T} \\ B^\mathsf{T} \end{bmatrix}
\begin{bmatrix} A & B \end{bmatrix}
\right )^{-1}
\begin{bmatrix} A^\mathsf{T} \\B^\mathsf{T} \end{bmatrix} \\
  &= \begin{bmatrix} A & B \end{bmatrix} \begin{bmatrix}A^\mathsf{T}A&O\\O&B^\mathsf{T}B\end{bmatrix}^{-1}
\begin{bmatrix} A^\mathsf{T} \\ B^\mathsf{T} \end{bmatrix}\\[4pt]
  &= A \left(A^\mathsf{T}A\right)^{-1} A^\mathsf{T} + B \left(B^\mathsf{T}B\right)^{-1} B^\mathsf{T}
\end{align}</math>

If the orthogonal condition is enhanced to <math>A^\mathsf{T}W B = A^\mathsf{T}W^\mathsf{T}B = 0</math> with <math>W</math> non-singular<!-- and symmetric-->, the following holds:
<math display="block">I = \begin{bmatrix}A & B\end{bmatrix} \begin{bmatrix}\left(A^\mathsf{T} W A\right)^{-1} A^\mathsf{T} \\ \left(B^\mathsf{T} W B\right)^{-1} B^\mathsf{T} \end{bmatrix} W.</math>

All these formulas also hold for complex inner product spaces, provided that the [[conjugate transpose]] is used instead of the transpose. Further details on sums of projectors can be found in Banerjee and Roy (2014).<ref>{{Citation | last1 = Banerjee | first1 = Sudipto | last2 = Roy | first2 = Anindya | date = 2014 | title = Linear Algebra and Matrix Analysis for Statistics | series = Texts in Statistical Science | publisher = Chapman and Hall/CRC | edition =  1st | isbn =  978-1420095388 | url=https://books.google.com/books?id=iIOhAwAAQBAJ&q=projection}}</ref> Also see Banerjee (2004)<ref>{{Citation | last = Banerjee | first = Sudipto | date = 2004 | title = Revisiting Spherical Trigonometry with Orthogonal Projectors | journal = The College Mathematics Journal | volume = 35 | issue = 5 | pages =  375–381 | doi=10.1080/07468342.2004.11922099 | s2cid = 122277398 }}</ref> for application of sums of projectors in basic [[spherical trigonometry]].

=== Oblique projections ===
The term ''oblique projections'' is sometimes used to refer to non-orthogonal projections. These projections are also used to represent spatial figures in two-dimensional drawings (see [[oblique projection]]), though not as frequently as orthogonal projections. Whereas calculating the fitted value of an [[ordinary least squares]] regression requires an orthogonal projection, calculating the fitted value of an [[Instrumental_variable|instrumental variables regression]] requires an oblique projection.

A projection is defined by its kernel and the basis vectors used to characterize its range (which is a complement of the kernel). When these basis vectors are orthogonal to the kernel, then the projection is an orthogonal projection. When these basis vectors are not orthogonal to the kernel, the projection is an oblique projection, or just a projection. 

==== A matrix representation formula for a nonzero projection operator ====
Let <math>P \colon V \to V</math> be a linear operator such that <math>P^2 = P</math> and assume that <math>P</math> is not the zero operator. Let the vectors <math>\mathbf u_1, \ldots, \mathbf u_k</math> form a basis for the range of <math>P</math>, and assemble these vectors in the <math>n \times k</math> matrix <math>A</math>. Then <math>k \geq 1</math>, otherwise <math>k = 0</math> and <math>P</math> is the zero operator. The range and the kernel are complementary spaces, so the kernel has dimension <math>n - k</math>. It follows that the [[orthogonal complement]] of the kernel has dimension <math>k</math>. Let <math>\mathbf v_1, \ldots, \mathbf v_k</math> form a basis for the orthogonal complement of the kernel of the projection, and assemble these vectors in the matrix <math>B</math>. Then the projection <math>P</math> (with the condition <math>k \geq 1</math>) is given by
<math display="block"> P = A \left(B^\mathsf{T} A\right)^{-1} B^\mathsf{T}. </math>

This expression generalizes the formula for orthogonal projections given above.<ref>{{Citation | last1 = Banerjee | first1 = Sudipto | last2 = Roy | first2 = Anindya | date = 2014 | title = Linear Algebra and Matrix Analysis for Statistics | series = Texts in Statistical Science | publisher = Chapman and Hall/CRC | edition =  1st | isbn =  978-1420095388 | url=https://books.google.com/books?id=iIOhAwAAQBAJ&q=projection}}</ref><ref>Meyer, equation (7.10.39)</ref> A standard proof of this expression is the following. For any vector <math>\mathbf x</math> in the vector space <math>V</math>, we can decompose <math>\mathbf{x} = \mathbf{x}_1 + \mathbf{x}_2</math>, where vector <math>\mathbf{x}_1 = P(\mathbf{x})</math> is in the image of <math>P</math>, and vector <math>\mathbf{x}_2 = \mathbf{x} - P(\mathbf{x}).</math> So <math>P(\mathbf{x}_2) = P(\mathbf{x}) - P^2(\mathbf{x})= \mathbf{0}</math>, and then <math>\mathbf{x}_2</math> is in the kernel of <math>P</math>, which is the null space of <math>A.</math> In other words, the vector <math>\mathbf{x}_1</math> is in the column space of <math>A,</math> so <math>\mathbf{x}_1 = A \mathbf{w}</math> for some <math>k</math> dimension vector <math>\mathbf{w}</math> and the vector <math>\mathbf{x}_2</math> satisfies <math>B^\mathsf{T} \mathbf{x}_2=\mathbf{0}</math> by the construction of <math>B</math>. Put these conditions together, and we find a vector <math>\mathbf{w}</math> so that  <math>B^\mathsf{T} (\mathbf{x}-A\mathbf{w})=\mathbf{0}</math>. Since matrices <math>A</math> and <math>B</math> are of full rank <math>k</math> by their construction, the <math>k\times k</math>-matrix <math>B^\mathsf{T} A</math> is invertible. So the equation  <math>B^\mathsf{T} (\mathbf{x}-A\mathbf{w})=\mathbf{0}</math> gives the vector <math>\mathbf{w}= (B^{\mathsf{T}}A)^{-1} B^{\mathsf{T}} \mathbf{x}.</math> In this way, <math>P\mathbf{x} = \mathbf{x}_1 = A\mathbf{w}= A(B^{\mathsf{T}}A)^{-1} B^{\mathsf{T}} \mathbf{x}</math> for any vector <math>\mathbf{x} \in V</math> and hence <math>P = A(B^{\mathsf{T}}A)^{-1} B^{\mathsf{T}}</math>.

In the case that <math>P</math> is an orthogonal projection, we can take <math>A = B</math>, and it follows that <math>P=A \left(A^\mathsf{T} A\right)^{-1} A^\mathsf{T}</math>. By using this formula, one can easily check that <math>P=P^\mathsf{T}</math>. In general, if the vector space is over complex number field, one then uses the [[Hermitian transpose]] <math>A^*</math> and has the formula  <math>P=A \left(A^* A\right)^{-1} A^*</math>. Recall that one can express the [[Moore–Penrose inverse]] of the matrix <math>A</math> by <math>A^{+}= (A^*A)^{-1}A^*</math> since <math>A</math> has full column rank, so <math>P=A A^{+}</math>.

==== Singular values ====
<math>I-P</math> is also an oblique projection. The singular values of <math>P</math> and <math>I-P</math> can be computed by an [[orthonormal basis]] of <math>A</math>. Let 
<math>Q_A</math> be an orthonormal basis of <math>A</math> and let <math>Q_A^{\perp}</math> be the [[orthogonal complement]] of <math>Q_A</math>. Denote the singular values of the matrix
<math>Q_A^T A (B^T A)^{-1} B^T Q_A^{\perp} </math> by the positive values <math>\gamma_1 \ge \gamma_2 \ge \ldots \ge \gamma_k </math>. With this, the singular values for <math>P</math> are:<ref>{{Citation | last1 = Brust | first1 = J. J. | last2 = Marcia | first2 = R. F. | last3 = Petra | first3 = C. G. | date = 2020 | title = Computationally Efficient Decompositions of Oblique Projection Matrices | journal = SIAM Journal on Matrix Analysis and Applications | volume = 41 | issue = 2 | pages =  852–870 | doi=10.1137/19M1288115 | osti = 1680061 | s2cid = 219921214 }}</ref> 
<math display="block">\sigma_i = 
	\begin{cases}
		\sqrt{1+\gamma_i^2} & 1 \le i \le k \\
		0 & \text{otherwise}
	\end{cases} 
	</math>
and the singular values for <math>I-P</math> are
<math display="block">\sigma_i = 
\begin{cases}
		\sqrt{1+\gamma_i^2} 	& 1 \le i \le k \\
		1 				& k+1 \le i \le n-k \\
		0 & \text{otherwise}
	\end{cases} 
	</math>
This implies that the largest singular values of <math>P</math> and <math>I-P</math> are equal, and thus that the [[matrix norm]] of the oblique projections are the same. However, the [[condition number]] satisfies the relation <math>\kappa(I-P) = \frac{\sigma_1}{1} \ge \frac{\sigma_1}{\sigma_k} = \kappa(P)</math>, and is therefore not necessarily equal.

===Finding projection with an inner product===
Let <math>V</math> be a vector space (in this case a plane) spanned by orthogonal vectors <math>\mathbf u_1, \mathbf u_2, \dots, \mathbf u_p</math>. Let <math>y</math> be a vector. One can define a projection of <math>\mathbf y</math> onto <math>V</math> as
<math display="block"> \operatorname{proj}_V \mathbf y = \frac{\mathbf y \cdot \mathbf u^i}{\mathbf u^i \cdot \mathbf u^i } \mathbf u^i </math>
where repeated indices are summed over ([[Einstein notation|Einstein sum notation]]). The vector <math>\mathbf y</math> can be written as an orthogonal sum such that <math>\mathbf y = \operatorname{proj}_V \mathbf y + \mathbf z</math>. <math>\operatorname{proj}_V \mathbf y</math> is sometimes denoted as <math>\hat{\mathbf y}</math>. There is a theorem in linear algebra that states that this <math>\mathbf z</math> is the smallest distance (the ''[[orthogonal distance]]'') from <math>\mathbf y</math> to <math>V</math> and is commonly used in areas such as [[machine learning]].

[[File:Ortho projection.svg|thumb|''y'' is being projected onto the vector space ''V''.]]