Editing Projection (linear algebra) (section)

=== Oblique projections ===
The term ''oblique projections'' is sometimes used to refer to non-orthogonal projections. These projections are also used to represent spatial figures in two-dimensional drawings (see [[oblique projection]]), though not as frequently as orthogonal projections. Whereas calculating the fitted value of an [[ordinary least squares]] regression requires an orthogonal projection, calculating the fitted value of an [[Instrumental_variable|instrumental variables regression]] requires an oblique projection.

A projection is defined by its kernel and the basis vectors used to characterize its range (which is a complement of the kernel). When these basis vectors are orthogonal to the kernel, then the projection is an orthogonal projection. When these basis vectors are not orthogonal to the kernel, the projection is an oblique projection, or just a projection. 

==== A matrix representation formula for a nonzero projection operator ====
Let <math>P \colon V \to V</math> be a linear operator such that <math>P^2 = P</math> and assume that <math>P</math> is not the zero operator. Let the vectors <math>\mathbf u_1, \ldots, \mathbf u_k</math> form a basis for the range of <math>P</math>, and assemble these vectors in the <math>n \times k</math> matrix <math>A</math>. Then <math>k \geq 1</math>, otherwise <math>k = 0</math> and <math>P</math> is the zero operator. The range and the kernel are complementary spaces, so the kernel has dimension <math>n - k</math>. It follows that the [[orthogonal complement]] of the kernel has dimension <math>k</math>. Let <math>\mathbf v_1, \ldots, \mathbf v_k</math> form a basis for the orthogonal complement of the kernel of the projection, and assemble these vectors in the matrix <math>B</math>. Then the projection <math>P</math> (with the condition <math>k \geq 1</math>) is given by
<math display="block"> P = A \left(B^\mathsf{T} A\right)^{-1} B^\mathsf{T}. </math>

This expression generalizes the formula for orthogonal projections given above.<ref>{{Citation | last1 = Banerjee | first1 = Sudipto | last2 = Roy | first2 = Anindya | date = 2014 | title = Linear Algebra and Matrix Analysis for Statistics | series = Texts in Statistical Science | publisher = Chapman and Hall/CRC | edition =  1st | isbn =  978-1420095388 | url=https://books.google.com/books?id=iIOhAwAAQBAJ&q=projection}}</ref><ref>Meyer, equation (7.10.39)</ref> A standard proof of this expression is the following. For any vector <math>\mathbf x</math> in the vector space <math>V</math>, we can decompose <math>\mathbf{x} = \mathbf{x}_1 + \mathbf{x}_2</math>, where vector <math>\mathbf{x}_1 = P(\mathbf{x})</math> is in the image of <math>P</math>, and vector <math>\mathbf{x}_2 = \mathbf{x} - P(\mathbf{x}).</math> So <math>P(\mathbf{x}_2) = P(\mathbf{x}) - P^2(\mathbf{x})= \mathbf{0}</math>, and then <math>\mathbf{x}_2</math> is in the kernel of <math>P</math>, which is the null space of <math>A.</math> In other words, the vector <math>\mathbf{x}_1</math> is in the column space of <math>A,</math> so <math>\mathbf{x}_1 = A \mathbf{w}</math> for some <math>k</math> dimension vector <math>\mathbf{w}</math> and the vector <math>\mathbf{x}_2</math> satisfies <math>B^\mathsf{T} \mathbf{x}_2=\mathbf{0}</math> by the construction of <math>B</math>. Put these conditions together, and we find a vector <math>\mathbf{w}</math> so that  <math>B^\mathsf{T} (\mathbf{x}-A\mathbf{w})=\mathbf{0}</math>. Since matrices <math>A</math> and <math>B</math> are of full rank <math>k</math> by their construction, the <math>k\times k</math>-matrix <math>B^\mathsf{T} A</math> is invertible. So the equation  <math>B^\mathsf{T} (\mathbf{x}-A\mathbf{w})=\mathbf{0}</math> gives the vector <math>\mathbf{w}= (B^{\mathsf{T}}A)^{-1} B^{\mathsf{T}} \mathbf{x}.</math> In this way, <math>P\mathbf{x} = \mathbf{x}_1 = A\mathbf{w}= A(B^{\mathsf{T}}A)^{-1} B^{\mathsf{T}} \mathbf{x}</math> for any vector <math>\mathbf{x} \in V</math> and hence <math>P = A(B^{\mathsf{T}}A)^{-1} B^{\mathsf{T}}</math>.

In the case that <math>P</math> is an orthogonal projection, we can take <math>A = B</math>, and it follows that <math>P=A \left(A^\mathsf{T} A\right)^{-1} A^\mathsf{T}</math>. By using this formula, one can easily check that <math>P=P^\mathsf{T}</math>. In general, if the vector space is over complex number field, one then uses the [[Hermitian transpose]] <math>A^*</math> and has the formula  <math>P=A \left(A^* A\right)^{-1} A^*</math>. Recall that one can express the [[Moore–Penrose inverse]] of the matrix <math>A</math> by <math>A^{+}= (A^*A)^{-1}A^*</math> since <math>A</math> has full column rank, so <math>P=A A^{+}</math>.

==== Singular values ====
<math>I-P</math> is also an oblique projection. The singular values of <math>P</math> and <math>I-P</math> can be computed by an [[orthonormal basis]] of <math>A</math>. Let 
<math>Q_A</math> be an orthonormal basis of <math>A</math> and let <math>Q_A^{\perp}</math> be the [[orthogonal complement]] of <math>Q_A</math>. Denote the singular values of the matrix
<math>Q_A^T A (B^T A)^{-1} B^T Q_A^{\perp} </math> by the positive values <math>\gamma_1 \ge \gamma_2 \ge \ldots \ge \gamma_k </math>. With this, the singular values for <math>P</math> are:<ref>{{Citation | last1 = Brust | first1 = J. J. | last2 = Marcia | first2 = R. F. | last3 = Petra | first3 = C. G. | date = 2020 | title = Computationally Efficient Decompositions of Oblique Projection Matrices | journal = SIAM Journal on Matrix Analysis and Applications | volume = 41 | issue = 2 | pages =  852–870 | doi=10.1137/19M1288115 | osti = 1680061 | s2cid = 219921214 }}</ref> 
<math display="block">\sigma_i = 
	\begin{cases}
		\sqrt{1+\gamma_i^2} & 1 \le i \le k \\
		0 & \text{otherwise}
	\end{cases} 
	</math>
and the singular values for <math>I-P</math> are
<math display="block">\sigma_i = 
\begin{cases}
		\sqrt{1+\gamma_i^2} 	& 1 \le i \le k \\
		1 				& k+1 \le i \le n-k \\
		0 & \text{otherwise}
	\end{cases} 
	</math>
This implies that the largest singular values of <math>P</math> and <math>I-P</math> are equal, and thus that the [[matrix norm]] of the oblique projections are the same. However, the [[condition number]] satisfies the relation <math>\kappa(I-P) = \frac{\sigma_1}{1} \ge \frac{\sigma_1}{\sigma_k} = \kappa(P)</math>, and is therefore not necessarily equal.