Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Singular value decomposition
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Proof of existence == An eigenvalue {{tmath|\lambda}} of a matrix {{tmath|\mathbf M}} is characterized by the algebraic relation {{tmath|\mathbf M \mathbf u {{=}} \lambda \mathbf u.}} When {{tmath|\mathbf M}} is [[Hermitian matrix|Hermitian]], a variational characterization is also available. Let {{tmath|\mathbf M}} be a real {{tmath|n \times n}} [[symmetric matrix]]. Define <math display=block> f : \left\{ \begin{align} \R^n &\to \R \\ \mathbf{x} &\mapsto \mathbf{x}^\operatorname{T} \mathbf{M} \mathbf{x} \end{align}\right.</math> By the [[extreme value theorem]], this continuous function attains a maximum at some {{tmath|\mathbf u}} when restricted to the unit sphere <math>\{\|\mathbf x\| = 1\}.</math> By the [[Lagrange multipliers]] theorem, {{tmath|\mathbf u}} necessarily satisfies <math display=block>\nabla \mathbf{u}^\operatorname{T} \mathbf{M} \mathbf{u} - \lambda \cdot \nabla \mathbf{u}^\operatorname{T} \mathbf{u} = 0</math> for some real number {{tmath|\lambda.}} The nabla symbol, {{tmath|\nabla}}, is the [[del]] operator (differentiation with respect to {{nobr|{{tmath|\mathbf x}}).}} Using the symmetry of {{tmath|\mathbf M}} we obtain <math display=block>\nabla \mathbf{x}^\operatorname{T} \mathbf{M} \mathbf{x} - \lambda \cdot \nabla \mathbf{x}^\operatorname{T} \mathbf{x} = 2(\mathbf{M}-\lambda \mathbf{I})\mathbf{x}.</math> Therefore {{tmath|\mathbf M \mathbf u {{=}} \lambda \mathbf u,}} so {{tmath|\mathbf u}} is a unit length eigenvector of {{tmath|\mathbf M.}} For every unit length eigenvector {{tmath|\mathbf v}} of {{tmath|\mathbf M}} its eigenvalue is {{tmath|f(\mathbf v),}} so {{tmath|\lambda}} is the largest eigenvalue of {{tmath|\mathbf M.}} The same calculation performed on the orthogonal complement of {{tmath|\mathbf u}} gives the next largest eigenvalue and so on. The complex Hermitian case is similar; there {{tmath|f(\mathbf x) {{=}} \mathbf x^* \mathbf M \mathbf x}} is a real-valued function of {{tmath|2n}} real variables. Singular values are similar in that they can be described algebraically or from variational principles. Although, unlike the eigenvalue case, Hermiticity, or symmetry, of {{tmath|\mathbf M}} is no longer required. This section gives these two arguments for existence of singular value decomposition. === Based on the spectral theorem === Let <math>\mathbf{M}</math> be an {{tmath|m \times n}} complex matrix. Since <math>\mathbf{M}^* \mathbf{M}</math> is positive semi-definite and Hermitian, by the [[spectral theorem]], there exists an {{tmath|n \times n}} unitary matrix <math>\mathbf{V}</math> such that <math display=block> \mathbf V^* \mathbf M^* \mathbf M \mathbf V = \bar\mathbf{D} = \begin{bmatrix} \mathbf{D} & 0 \\ 0 & 0\end{bmatrix}, </math> where <math>\mathbf{D}</math> is diagonal and positive definite, of dimension <math>\ell\times \ell</math>, with <math>\ell</math> the number of non-zero eigenvalues of <math>\mathbf{M}^* \mathbf{M}</math> (which can be shown to verify <math>\ell\le\min(n,m)</math>). Note that <math>\mathbf{V}</math> is here by definition a matrix whose <math>i</math>-th column is the <math>i</math>-th eigenvector of <math>\mathbf{M}^* \mathbf{M}</math>, corresponding to the eigenvalue <math>\bar{\mathbf{D}}_{ii}</math>. Moreover, the <math>j</math>-th column of <math>\mathbf{V}</math>, for <math>j>\ell</math>, is an eigenvector of <math>\mathbf{M}^* \mathbf{M}</math> with eigenvalue <math>\bar{\mathbf{D}}_{jj}=0</math>. This can be expressed by writing <math>\mathbf{V}</math> as <math>\mathbf{V}=\begin{bmatrix}\mathbf{V}_1 &\mathbf{V}_2\end{bmatrix}</math>, where the columns of <math>\mathbf{V}_1</math> and <math>\mathbf{V}_2</math> therefore contain the eigenvectors of <math>\mathbf{M}^* \mathbf{M}</math> corresponding to non-zero and zero eigenvalues, respectively. Using this rewriting of <math>\mathbf{V}</math>, the equation becomes: <math display=block> \begin{bmatrix} \mathbf{V}_1^* \\ \mathbf{V}_2^* \end{bmatrix} \mathbf{M}^* \mathbf{M}\, \begin{bmatrix} \mathbf{V}_1 & \!\! \mathbf{V}_2 \end{bmatrix} = \begin{bmatrix} \mathbf{V}_1^* \mathbf{M}^* \mathbf{M} \mathbf{V}_1 & \mathbf{V}_1^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2 \\ \mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_1 & \mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2 \end{bmatrix} = \begin{bmatrix} \mathbf{D} & 0 \\ 0 & 0 \end{bmatrix}.</math> This implies that <math display=block> \mathbf{V}_1^* \mathbf{M}^* \mathbf{M} \mathbf{V}_1 = \mathbf{D}, \quad \mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2 = \mathbf{0}. </math> Moreover, the second equation implies <math>\mathbf{M}\mathbf{V}_2 = \mathbf{0}</math>.<ref>To see this, we just have to notice that <math>\operatorname{Tr}(\mathbf{V}_2^* \mathbf{M}^* \mathbf{M} \mathbf{V}_2) = \|\mathbf{M} \mathbf{V}_2\|^2</math>, and remember that <math>\|A\| = 0 \Leftrightarrow A = 0</math>.</ref> Finally, the unitary-ness of <math>\mathbf{V}</math> translates, in terms of <math>\mathbf{V}_1</math> and <math>\mathbf{V}_2</math>, into the following conditions: <math display=block>\begin{align} \mathbf{V}_1^* \mathbf{V}_1 &= \mathbf{I}_1, \\ \mathbf{V}_2^* \mathbf{V}_2 &= \mathbf{I}_2, \\ \mathbf{V}_1 \mathbf{V}_1^* + \mathbf{V}_2 \mathbf{V}_2^* &= \mathbf{I}_{12}, \end{align}</math> where the subscripts on the identity matrices are used to remark that they are of different dimensions. Let us now define <math display=block> \mathbf{U}_1 = \mathbf{M} \mathbf{V}_1 \mathbf{D}^{-\frac{1}{2}}. </math> Then, <math display=block> \mathbf{U}_1 \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* = \mathbf{M} \mathbf{V}_1 \mathbf{D}^{-\frac{1}{2}} \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* = \mathbf{M} (\mathbf{I} - \mathbf{V}_2\mathbf{V}_2^*) = \mathbf{M} - (\mathbf{M}\mathbf{V}_2)\mathbf{V}_2^* = \mathbf{M}, </math> since <math>\mathbf{M}\mathbf{V}_2 = \mathbf{0}. </math> This can be also seen as immediate consequence of the fact that <math>\mathbf{M}\mathbf{V}_1\mathbf{V}_1^* = \mathbf{M}</math>. This is equivalent to the observation that if <math>\{\boldsymbol v_i\}_{i=1}^\ell</math> is the set of eigenvectors of <math>\mathbf{M}^* \mathbf{M}</math> corresponding to non-vanishing eigenvalues <math>\{\lambda_i\}_{i=1}^\ell</math>, then <math>\{\mathbf M \boldsymbol v_i\}_{i=1}^\ell</math> is a set of orthogonal vectors, and <math>\bigl\{\lambda_i^{-1/2}\mathbf M \boldsymbol v_i\bigr\}\vphantom|_{i=1}^\ell</math> is a (generally not complete) set of ''orthonormal'' vectors. This matches with the matrix formalism used above denoting with <math>\mathbf{V}_1</math> the matrix whose columns are <math>\{\boldsymbol v_i\}_{i=1}^\ell</math>, with <math>\mathbf{V}_2</math> the matrix whose columns are the eigenvectors of <math>\mathbf{M}^* \mathbf{M}</math> with vanishing eigenvalue, and <math>\mathbf{U}_1</math> the matrix whose columns are the vectors <math>\bigl\{\lambda_i^{-1/2}\mathbf M \boldsymbol v_i\bigr\}\vphantom|_{i=1}^\ell</math>. We see that this is almost the desired result, except that <math>\mathbf{U}_1</math> and <math>\mathbf{V}_1</math> are in general not unitary, since they might not be square. However, we do know that the number of rows of <math>\mathbf{U}_1</math> is no smaller than the number of columns, since the dimensions of <math>\mathbf{D}</math> is no greater than <math>m</math> and <math>n</math>. Also, since <math display=block> \mathbf{U}_1^*\mathbf{U}_1 = \mathbf{D}^{-\frac{1}{2}}\mathbf{V}_1^*\mathbf{M}^*\mathbf{M} \mathbf{V}_1 \mathbf{D}^{-\frac{1}{2}}=\mathbf{D}^{-\frac{1}{2}}\mathbf{D}\mathbf{D}^{-\frac{1}{2}} = \mathbf{I_1}, </math> the columns in <math>\mathbf{U}_1</math> are orthonormal and can be extended to an orthonormal basis. This means that we can choose <math>\mathbf{U}_2</math> such that <math>\mathbf{U} = \begin{bmatrix} \mathbf{U}_1 & \mathbf{U}_2 \end{bmatrix}</math> is unitary. For {{tmath|\mathbf V_1}} we already have {{tmath|\mathbf V_2}} to make it unitary. Now, define <math display=block> \mathbf \Sigma = \begin{bmatrix} \begin{bmatrix} \mathbf{D}^\frac{1}{2} & 0 \\ 0 & 0 \end{bmatrix} \\ 0 \end{bmatrix}, </math> where extra zero rows are added '''or removed''' to make the number of zero rows equal the number of columns of {{tmath|\mathbf U_2,}} and hence the overall dimensions of <math>\mathbf \Sigma</math> equal to <math>m\times n</math>. Then <math display=block> \begin{bmatrix} \mathbf{U}_1 & \mathbf{U}_2 \end{bmatrix} \begin{bmatrix} \begin{bmatrix} \mathbf{}D^\frac{1}{2} & 0 \\ 0 & 0 \end{bmatrix} \\ 0 \end{bmatrix} \begin{bmatrix} \mathbf{V}_1 & \mathbf{V}_2 \end{bmatrix}^* = \begin{bmatrix} \mathbf{U}_1 & \mathbf{U}_2 \end{bmatrix} \begin{bmatrix} \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* \\ 0 \end{bmatrix} = \mathbf{U}_1 \mathbf{D}^\frac{1}{2} \mathbf{V}_1^* = \mathbf{M}, </math> which is the desired result: <math display=block> \mathbf{M} = \mathbf{U} \mathbf \Sigma \mathbf{V}^*. </math> Notice the argument could begin with diagonalizing {{tmath|\mathbf M \mathbf M^*}} rather than {{tmath|\mathbf M^* \mathbf M}} (This shows directly that {{tmath|\mathbf M \mathbf M^*}} and {{tmath|\mathbf M^* \mathbf M}} have the same non-zero eigenvalues). === Based on variational characterization === {{anchor|vch}}The singular values can also be characterized as the maxima of {{tmath|\mathbf u^\mathrm{T} \mathbf M \mathbf v,}} considered as a function of {{tmath|\mathbf u}} and {{tmath|\mathbf v,}} over particular subspaces. The singular vectors are the values of {{tmath|\mathbf u}} and {{tmath|\mathbf v}} where these maxima are attained. Let {{tmath|\mathbf M}} denote an {{tmath|m \times n}} matrix with real entries. Let {{tmath|S^{k-1} }} be the unit <math>(k-1)</math>-sphere in <math> \mathbb{R}^k </math>, and define <math>\sigma(\mathbf{u}, \mathbf{v}) = \mathbf{u}^\operatorname{T} \mathbf{M} \mathbf{v},</math> <math>\mathbf{u} \in S^{m-1},</math> <math>\mathbf{v} \in S^{n-1}.</math> Consider the function {{tmath|\sigma}} restricted to {{tmath|S^{m-1} \times S^{n-1}.}} Since both {{tmath|S^{m-1} }} and {{tmath|S^{n-1} }} are [[compact space|compact]] sets, their [[Product topology|product]] is also compact. Furthermore, since {{tmath|\sigma}} is continuous, it attains a largest value for at least one pair of vectors {{tmath|\mathbf u}} in {{tmath|S^{m-1} }} and {{tmath|\mathbf v}} in {{tmath|S^{n-1}.}} This largest value is denoted {{tmath|\sigma_1}} and the corresponding vectors are denoted {{tmath|\mathbf u_1}} and {{tmath|\mathbf v_1.}} Since {{tmath|\sigma_1}} is the largest value of {{tmath|\sigma(\mathbf u, \mathbf v)}} it must be non-negative. If it were negative, changing the sign of either {{tmath|\mathbf u_1}} or {{tmath|\mathbf v_1}} would make it positive and therefore larger. '''Statement.''' {{tmath|\mathbf u_1}} and {{tmath|\mathbf v_1}} are left and right-singular vectors of {{tmath|\mathbf M}} with corresponding singular value {{tmath|\sigma_1.}} '''Proof.''' Similar to the eigenvalues case, by assumption the two vectors satisfy the Lagrange multiplier equation: <math display=block> \nabla \sigma = \nabla \mathbf{u}^\operatorname{T} \mathbf{M} \mathbf{v} - \lambda_1 \cdot \nabla \mathbf{u}^\operatorname{T} \mathbf{u} - \lambda_2 \cdot \nabla \mathbf{v}^\operatorname{T} \mathbf{v} </math> After some algebra, this becomes <math display=block> \begin{align} \mathbf{M} \mathbf{v}_1 &= 2 \lambda_1 \mathbf{u}_1 + 0, \\ \mathbf{M}^\operatorname{T} \mathbf{u}_1 &= 0 + 2 \lambda_2 \mathbf{v}_1. \end{align}</math> Multiplying the first equation from left by {{tmath|\mathbf u_1^\textrm{T} }} and the second equation from left by {{tmath|\mathbf v_1^\textrm{T} }} and taking <math> \| \mathbf u \| = \| \mathbf v \| = 1</math> into account gives <math display=block> \sigma_1 = 2\lambda_1 = 2\lambda_2. </math> Plugging this into the pair of equations above, we have <math display=block>\begin{align} \mathbf{M} \mathbf{v}_1 &= \sigma_1 \mathbf{u}_1, \\ \mathbf{M}^\operatorname{T} \mathbf{u}_1 &= \sigma_1 \mathbf{v}_1. \end{align}</math> This proves the statement. More singular vectors and singular values can be found by maximizing {{tmath|\sigma(\mathbf u, \mathbf v)}} over normalized {{tmath|\mathbf u}} and {{tmath|\mathbf v}} which are orthogonal to {{tmath|\mathbf u_1}} and {{tmath|\mathbf v_1,}} respectively. The passage from real to complex is similar to the eigenvalue case.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)