Editing Singular value decomposition (section)

== Proof of existence ==
An eigenvalue {{tmath|\lambda}} of a matrix {{tmath|\mathbf M}} is characterized by the algebraic relation {{tmath|\mathbf M \mathbf u {{=}} \lambda \mathbf u.}} When {{tmath|\mathbf M}} is [[Hermitian matrix|Hermitian]], a variational characterization is also available. Let {{tmath|\mathbf M}} be a real {{tmath|n \times n}} [[symmetric matrix]]. Define

<math display=block> f : \left\{ \begin{align}
\R^n &\to \R \\
\mathbf{x} &\mapsto \mathbf{x}^\operatorname{T} \mathbf{M} \mathbf{x}
\end{align}\right.</math>

By the [[extreme value theorem]], this continuous function attains a maximum at some {{tmath|\mathbf u}} when restricted to the unit sphere <math>\{\|\mathbf x\| = 1\}.</math> By the [[Lagrange multipliers]] theorem, {{tmath|\mathbf u}} necessarily satisfies

<math display=block>\nabla \mathbf{u}^\operatorname{T} \mathbf{M} \mathbf{u} - \lambda \cdot \nabla \mathbf{u}^\operatorname{T} \mathbf{u} = 0</math>

for some real number {{tmath|\lambda.}} The nabla symbol, {{tmath|\nabla}}, is the [[del]] operator (differentiation with respect to {{nobr|{{tmath|\mathbf x}}).}} Using the symmetry of {{tmath|\mathbf M}} we obtain

<math display=block>\nabla \mathbf{x}^\operatorname{T} \mathbf{M} \mathbf{x} - \lambda \cdot \nabla \mathbf{x}^\operatorname{T} \mathbf{x} = 2(\mathbf{M}-\lambda \mathbf{I})\mathbf{x}.</math>

Therefore {{tmath|\mathbf M \mathbf u {{=}} \lambda \mathbf u,}} so {{tmath|\mathbf u}} is a unit length eigenvector of {{tmath|\mathbf M.}} For every unit length eigenvector {{tmath|\mathbf v}} of {{tmath|\mathbf M}} its eigenvalue is {{tmath|f(\mathbf v),}} so {{tmath|\lambda}} is the largest eigenvalue of {{tmath|\mathbf M.}} The same calculation performed on the orthogonal complement of {{tmath|\mathbf u}} gives the next largest eigenvalue and so on. The complex Hermitian case is similar; there {{tmath|f(\mathbf x) {{=}} \mathbf x^* \mathbf M \mathbf x}} is a real-valued function of {{tmath|2n}} real variables.

Singular values are similar in that they can be described algebraically or from variational principles. Although, unlike the eigenvalue case, Hermiticity, or symmetry, of {{tmath|\mathbf M}} is no longer required.

This section gives these two arguments for existence of singular value decomposition.

=== Based on the spectral theorem ===
Let <math>\mathbf{M}</math> be an {{tmath|m \times n}} complex matrix. Since <math>\mathbf{M}^* \mathbf{M}</math> is positive semi-definite and Hermitian, by the [[spectral theorem]], there exists an {{tmath|n \times n}} unitary matrix <math>\mathbf{V}</math> such that

<math display=block>
\mathbf V^* \mathbf M^* \mathbf M \mathbf V
= \bar\mathbf{D}
= \begin{bmatrix} \mathbf{D} & 0 \\ 0 & 0\end{bmatrix},
</math>

where <math>\mathbf{D}</math> is diagonal and positive definite, of dimension <math>\ell\times \ell</math>, with <math>\ell</math> the number of non-zero eigenvalues of <math>\mathbf{M}^* \mathbf{M}</math> (which can be shown to verify <math>\ell\le\min(n,m)</math>). Note that <math>\mathbf{V}</math> is here by definition a matrix whose <math>i</math>-th column is the <math>i</math>-th eigenvector of <math>\mathbf{M}^* \mathbf{M}</math>, corresponding to the eigenvalue <math>\bar{\mathbf{D}}_{ii}</math>. Moreover, the <math>j</math>-th column of <math>\mathbf{V}</math>, for <math>j>\ell</math>, is an eigenvector of <math>\mathbf{M}^* \mathbf{M}</math> with eigenvalue <math>\bar{\mathbf{D}}_{jj}=0</math>. This can be expressed by writing <math>\mathbf{V}</math>  as <math>\mathbf{V}=\begin{bmatrix}\mathbf{V}_1 &\mathbf{V}_2\end{bmatrix}</math>, where the columns of <math>\mathbf{V}_1</math> and <math>\mathbf{V}_2</math> therefore contain the eigenvectors of <math>\mathbf{M}^* \mathbf{M}</math> corresponding to non-zero and zero eigenvalues, respectively. Using this rewriting of <math>\mathbf{V}</math>, the equation becomes:

<math display=block>
\begin{bmatrix} \mathbf{V}_1^* \\ \mathbf{V}_2^* \end{bmatrix}
\mathbf{M}^* \mathbf{M}\, \begin{bmatrix} \mathbf{V}_1 & \!\! \mathbf{V}_2 \end{bmatrix}
= \begin{bmatrix}
  \mathbf{V}_1^* \mathbf{M}^* \mathbf{M} \mathbf{V}_1 & \mathbf{V}_1^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2 \\
  \mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_1 & \mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2
\end{bmatrix}
= \begin{bmatrix} \mathbf{D} & 0 \\ 0 & 0 \end{bmatrix}.</math>

This implies that

<math display=block>
\mathbf{V}_1^* \mathbf{M}^* \mathbf{M} \mathbf{V}_1
= \mathbf{D}, \quad \mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2
= \mathbf{0}.
</math>

Moreover, the second equation implies <math>\mathbf{M}\mathbf{V}_2 = \mathbf{0}</math>.<ref>To see this, we just have to notice that <math>\operatorname{Tr}(\mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2) = \|\mathbf{M} \mathbf{V}_2\|^2</math>, and remember that <math>\|A\| = 0 \Leftrightarrow A = 0</math>.</ref> Finally, the unitary-ness of <math>\mathbf{V}</math> translates, in terms of <math>\mathbf{V}_1</math> and <math>\mathbf{V}_2</math>, into the following conditions:

<math display=block>\begin{align} 
\mathbf{V}_1^* \mathbf{V}_1 &= \mathbf{I}_1, \\
\mathbf{V}_2^* \mathbf{V}_2 &= \mathbf{I}_2, \\
\mathbf{V}_1 \mathbf{V}_1^* + \mathbf{V}_2 \mathbf{V}_2^* &= \mathbf{I}_{12},
\end{align}</math>

where the subscripts on the identity matrices are used to remark that they are of different dimensions.

Let us now define

<math display=block>
\mathbf{U}_1 = \mathbf{M} \mathbf{V}_1 \mathbf{D}^{-\frac{1}{2}}.
</math>

Then,

<math display=block>
\mathbf{U}_1 \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* = \mathbf{M} \mathbf{V}_1 \mathbf{D}^{-\frac{1}{2}} \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* = \mathbf{M} (\mathbf{I} - \mathbf{V}_2\mathbf{V}_2^*) = \mathbf{M} - (\mathbf{M}\mathbf{V}_2)\mathbf{V}_2^* = \mathbf{M},
</math>

since <math>\mathbf{M}\mathbf{V}_2 = \mathbf{0}. </math> This can be also seen as immediate consequence of the fact that <math>\mathbf{M}\mathbf{V}_1\mathbf{V}_1^* = \mathbf{M}</math>. This is equivalent to the observation that if <math>\{\boldsymbol v_i\}_{i=1}^\ell</math> is the set of eigenvectors of <math>\mathbf{M}^* \mathbf{M}</math> corresponding to non-vanishing eigenvalues <math>\{\lambda_i\}_{i=1}^\ell</math>, then <math>\{\mathbf M \boldsymbol v_i\}_{i=1}^\ell</math> is a set of orthogonal vectors, and <math>\bigl\{\lambda_i^{-1/2}\mathbf M \boldsymbol v_i\bigr\}\vphantom|_{i=1}^\ell</math> is a (generally not complete) set of ''orthonormal'' vectors. This matches with the matrix formalism used above denoting with <math>\mathbf{V}_1</math> the matrix whose columns are <math>\{\boldsymbol v_i\}_{i=1}^\ell</math>, with <math>\mathbf{V}_2</math> the matrix whose columns are the eigenvectors of <math>\mathbf{M}^* \mathbf{M}</math> with vanishing eigenvalue, and <math>\mathbf{U}_1</math> the matrix whose columns are the vectors <math>\bigl\{\lambda_i^{-1/2}\mathbf M \boldsymbol v_i\bigr\}\vphantom|_{i=1}^\ell</math>.

We see that this is almost the desired result, except that <math>\mathbf{U}_1</math> and <math>\mathbf{V}_1</math> are in general not unitary, since they might not be square. However, we do know that the number of rows of <math>\mathbf{U}_1</math> is no smaller than the number of columns, since the dimensions of <math>\mathbf{D}</math> is no greater than <math>m</math> and <math>n</math>. Also, since

<math display=block>
\mathbf{U}_1^*\mathbf{U}_1 = \mathbf{D}^{-\frac{1}{2}}\mathbf{V}_1^*\mathbf{M}^*\mathbf{M} \mathbf{V}_1 \mathbf{D}^{-\frac{1}{2}}=\mathbf{D}^{-\frac{1}{2}}\mathbf{D}\mathbf{D}^{-\frac{1}{2}} = \mathbf{I_1},
</math>

the columns in <math>\mathbf{U}_1</math> are orthonormal and can be extended to an orthonormal basis. This means that we can choose <math>\mathbf{U}_2</math> such that <math>\mathbf{U} = \begin{bmatrix} \mathbf{U}_1 & \mathbf{U}_2 \end{bmatrix}</math> is unitary.

For {{tmath|\mathbf V_1}} we already have {{tmath|\mathbf V_2}} to make it unitary. Now, define

<math display=block>
\mathbf \Sigma =
\begin{bmatrix}
  \begin{bmatrix} \mathbf{D}^\frac{1}{2} & 0 \\ 0 & 0 \end{bmatrix} \\
  0
\end{bmatrix},
</math>

where extra zero rows are added '''or removed''' to make the number of zero rows equal the number of columns of {{tmath|\mathbf U_2,}} and hence the overall dimensions of <math>\mathbf \Sigma</math> equal to <math>m\times n</math>. Then

<math display=block>
\begin{bmatrix} \mathbf{U}_1 & \mathbf{U}_2 \end{bmatrix}
\begin{bmatrix}
  \begin{bmatrix} \mathbf{}D^\frac{1}{2} & 0 \\ 0 & 0 \end{bmatrix} \\
  0 \end{bmatrix}
\begin{bmatrix} \mathbf{V}_1 & \mathbf{V}_2 \end{bmatrix}^*
= \begin{bmatrix} \mathbf{U}_1 & \mathbf{U}_2 \end{bmatrix}
\begin{bmatrix} \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* \\ 0 \end{bmatrix}
= \mathbf{U}_1 \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* = \mathbf{M},
</math>

which is the desired result:

<math display=block>
\mathbf{M} = \mathbf{U} \mathbf \Sigma \mathbf{V}^*.
</math>

Notice the argument could begin with diagonalizing {{tmath|\mathbf M \mathbf M^*}} rather than {{tmath|\mathbf M^* \mathbf M}} (This shows directly that {{tmath|\mathbf M \mathbf M^*}} and {{tmath|\mathbf M^* \mathbf M}} have the same non-zero eigenvalues).

===  Based on variational characterization ===
{{anchor|vch}}The singular values can also be characterized as the maxima of {{tmath|\mathbf u^\mathrm{T} \mathbf M \mathbf v,}} considered as a function of {{tmath|\mathbf u}} and {{tmath|\mathbf v,}} over particular subspaces. The singular vectors are the values of {{tmath|\mathbf u}} and {{tmath|\mathbf v}} where these maxima are attained.

Let {{tmath|\mathbf M}} denote an {{tmath|m \times n}} matrix with real entries. Let {{tmath|S^{k-1} }} be the unit <math>(k-1)</math>-sphere in <math> \mathbb{R}^k </math>, and define <math>\sigma(\mathbf{u}, \mathbf{v}) = \mathbf{u}^\operatorname{T} \mathbf{M} \mathbf{v},</math> <math>\mathbf{u} \in S^{m-1},</math> <math>\mathbf{v} \in S^{n-1}.</math>

Consider the function {{tmath|\sigma}} restricted to {{tmath|S^{m-1} \times S^{n-1}.}} Since both {{tmath|S^{m-1} }} and {{tmath|S^{n-1} }} are [[compact space|compact]] sets, their [[Product topology|product]] is also compact.  Furthermore, since {{tmath|\sigma}} is continuous, it attains a largest value for at least one pair of vectors {{tmath|\mathbf u}} in {{tmath|S^{m-1} }} and {{tmath|\mathbf v}} in {{tmath|S^{n-1}.}} This largest value is denoted {{tmath|\sigma_1}} and the corresponding vectors are denoted {{tmath|\mathbf u_1}} and {{tmath|\mathbf v_1.}} Since {{tmath|\sigma_1}} is the largest value of {{tmath|\sigma(\mathbf u, \mathbf v)}} it must be non-negative. If it were negative, changing the sign of either {{tmath|\mathbf u_1}} or {{tmath|\mathbf v_1}} would make it positive and therefore larger.

'''Statement.''' {{tmath|\mathbf u_1}} and {{tmath|\mathbf v_1}} are left and right-singular vectors of {{tmath|\mathbf M}} with corresponding singular value {{tmath|\sigma_1.}}

'''Proof.''' Similar to the eigenvalues case, by assumption the two vectors satisfy the Lagrange multiplier equation:

<math display=block>
\nabla \sigma
= \nabla \mathbf{u}^\operatorname{T} \mathbf{M} \mathbf{v}
  - \lambda_1 \cdot \nabla \mathbf{u}^\operatorname{T} \mathbf{u}
  - \lambda_2 \cdot \nabla \mathbf{v}^\operatorname{T} \mathbf{v}
</math>

After some algebra, this becomes

<math display=block> \begin{align}
\mathbf{M} \mathbf{v}_1 &= 2 \lambda_1 \mathbf{u}_1 + 0, \\
\mathbf{M}^\operatorname{T} \mathbf{u}_1 &= 0 + 2 \lambda_2 \mathbf{v}_1.
\end{align}</math>

Multiplying the first equation from left by {{tmath|\mathbf u_1^\textrm{T} }} and the second equation from left by {{tmath|\mathbf v_1^\textrm{T} }} and taking <math> \| \mathbf u \| = \| \mathbf v \| = 1</math> into account gives

<math display=block>
\sigma_1 = 2\lambda_1 = 2\lambda_2.
</math>

Plugging this into the pair of equations above, we have

<math display=block>\begin{align}
\mathbf{M} \mathbf{v}_1 &= \sigma_1 \mathbf{u}_1, \\
\mathbf{M}^\operatorname{T} \mathbf{u}_1 &= \sigma_1 \mathbf{v}_1.
\end{align}</math>

This proves the statement.

More singular vectors and singular values can be found by maximizing {{tmath|\sigma(\mathbf u, \mathbf v)}} over normalized {{tmath|\mathbf u}} and {{tmath|\mathbf v}} which are orthogonal to {{tmath|\mathbf u_1}} and {{tmath|\mathbf v_1,}} respectively.

The passage from real to complex is similar to the eigenvalue case.